Usability Testing

What is Usability Testing?

Usability testing is watching real users attempt to complete tasks with your product. It reveals where users struggle, get confused, or fail—insights you can't get from analytics alone.

The Power of Watching Users

Jakob Nielsen's Rule: Testing with 5 users reveals 85% of usability problems

Why It Works: Most issues are encountered by multiple users. After 5 tests, you see diminishing returns.

Types of Usability Testing

Moderated vs Unmoderated

Moderated Testing

What: Facilitator guides user through tasks

Pros: Can ask follow-up questions, dig deeper, observe body language

Cons: Time-intensive, requires scheduling, facilitator bias

Example: In-person session where you watch user navigate your app while asking "What are you thinking?"

Unmoderated Testing

What: Users complete tasks independently, recorded

Pros: Fast, scalable, natural behavior, cheaper

Cons: Can't ask follow-ups, technical issues, less context

Example: UserTesting.com—users record themselves completing tasks, you review videos later

Remote vs In-Person

Example: Zoom's Remote Testing Advantage

Pre-COVID: Most usability testing was in-person

2020 Shift: Forced remote testing adoption

Discovery: Remote testing had unexpected benefits:

Users in natural environment (home/office)
Easier to recruit diverse participants
Lower cost (no lab rental)
Easier to record and share

Result: Many companies now prefer remote testing

Planning a Usability Test

Test Plan Components

Goals: What do you want to learn?
Participants: Who represents your users?
Tasks: What should users try to do?
Scenarios: Context for each task
Success Metrics: How do you measure usability?
Questions: What to ask after tasks

Writing Good Tasks

Realistic: Based on actual user goals
Specific: Clear end state
No UI Hints: Don't say "click the button"
Scenario-Based: Give context and motivation

Bad: "Click on the search icon and search for shoes"

Good: "You need running shoes for a marathon. Find a pair that fits your budget of $100"

Example: Airbnb's Booking Flow Test

Goal: Identify friction in booking process

Participants: 8 users who booked Airbnb in last 6 months

Task: "You're planning a weekend trip to San Francisco for 2 people. Find and book a place to stay."

Metrics: Time to complete, errors made, completion rate

Discovery: Users confused by cleaning fee appearing late in process

Fix: Showed total price upfront, completion rate increased 12%

Recruiting Participants

Recruitment Methods

User Research Panels: Pre-recruited pool of users
Recruitment Services: UserTesting, Respondent.io
Social Media: Post in relevant communities
Customer Lists: Email existing users
Intercepts: Recruit from your website/app
Guerrilla Testing: Coffee shops, public spaces

Screener Questions

Filter for right participants:

Demographics (age, location, occupation)
Behavior (frequency of use, experience level)
Attitudes (preferences, pain points)
Technology (devices, platforms)

Example: "How often do you order food delivery?" (Daily/Weekly/Monthly/Rarely/Never) → Filter for Weekly+ users

Recruitment Mistakes

Testing Coworkers: They know too much
Friends and Family: Too polite, biased
Professional Testers: Not representative
Wrong Segment: Testing power users when designing for beginners

Facilitating Sessions

Facilitator's Role

Set Expectations: Explain you're testing the product, not them
Think Aloud: Ask users to verbalize their thoughts
Don't Help: Let them struggle (that's the data)
Stay Neutral: Don't react to successes or failures
Probe Gently: "What are you thinking?" not "Why did you do that?"

Session Structure (60 minutes)

Introduction (5 min): Build rapport, explain process
Background (5 min): Learn about participant
Tasks (40 min): Observe task completion
Debrief (10 min): Overall impressions, questions

Example: Slack's Onboarding Test

Setup: New users, never used Slack

Task: "Your team wants to use Slack. Set up a workspace and invite a teammate."

Observation: 6 out of 8 users confused by "workspace" terminology

Facilitator Note: Didn't explain "workspace," let users struggle to understand

Fix: Changed language to "team" and added explainer, completion rate went from 40% to 85%

What NOT to Say

"That's not how you're supposed to do it" (judgmental)
"Try clicking there" (leading)
"Most people find this easy" (pressure)
"Let me show you" (defeats purpose)

Analyzing Results

What to Look For

Task Success Rate: Did they complete it?
Time on Task: How long did it take?
Error Rate: How many mistakes?
Paths Taken: Expected vs actual route
Verbal Feedback: Confusion, frustration, delight
Body Language: Hesitation, confidence

Severity Rating

Critical: Prevents task completion (fix immediately)
Serious: Causes significant delay or frustration (fix soon)
Minor: Small annoyance (fix if time permits)
Cosmetic: Doesn't affect usability (backlog)

Example: Instagram's Photo Upload

Test Finding: Users tapped "Next" multiple times thinking it wasn't working

Root Cause: No loading indicator while processing photo

Severity: Serious (caused frustration, duplicate uploads)

Fix: Added progress spinner and "Processing..." text

Result: Support tickets about upload issues dropped 60%

Affinity Mapping Findings

Process:

Write each observation on sticky note
Group similar issues together
Identify patterns across users
Prioritize by frequency and severity

Example: 5 users struggled with search → High priority. 1 user wanted dark mode → Lower priority.

Metrics & Benchmarking

Quantitative Usability Metrics

Task Success Rate: % who completed task
Time on Task: Average seconds to complete
Error Rate: Mistakes per task
Clicks to Complete: Number of interactions
SUS Score: System Usability Scale (0-100)

Example: Amazon's One-Click Patent

Baseline Test: Standard checkout flow

Average time: 90 seconds
Completion rate: 60%
Average clicks: 8

One-Click Test:

Average time: 3 seconds
Completion rate: 95%
Average clicks: 1

Impact: 30x faster, 58% higher completion—worth patenting

System Usability Scale (SUS)

What: 10-question survey, scored 0-100

Benchmark: 68 is average, 80+ is excellent

Use: Compare versions, track over time, benchmark against competitors

Example: Redesign increased SUS from 62 to 78, validating improvements

A/B Testing vs Usability Testing

When to Use Each

Usability Testing

Best For: Understanding WHY users struggle
Sample Size: 5-8 users
Data Type: Qualitative insights
Speed: Days to weeks

A/B Testing

Best For: Measuring WHICH version performs better
Sample Size: Thousands of users
Data Type: Quantitative metrics
Speed: Weeks to months

Example: Netflix's Artwork Testing

Usability Testing: Showed users different title artwork, asked which caught their attention

Insight: Users preferred images with faces, emotional expressions

A/B Testing: Tested face-focused vs scene-focused artwork with millions of users

Result: Face-focused artwork increased click-through 30%

Lesson: Usability testing generated hypothesis, A/B testing validated at scale

Guerrilla Testing

What is Guerrilla Testing?

Quick, informal testing with people in public places. Fast and cheap way to get feedback.

Where: Coffee shops, libraries, parks, malls

Time: 5-10 minutes per person

Incentive: Coffee, gift card, or just goodwill

Example: Starbucks Mobile Order Testing

Method: Approached customers in Starbucks, asked to try mobile ordering prototype

Setup: Laptop with clickable prototype

Task: "Order your usual drink"

Time: 5 minutes per person, tested 20 people in 2 hours

Discovery: Customization options were buried, users couldn't find them

Cost: $100 in gift cards vs $5,000 for formal lab testing

Guerrilla Testing Tips

Be Respectful: Ask permission, accept rejection gracefully
Keep It Short: 5-10 minutes max
Have Clear Task: One specific thing to test
Take Notes: Record or write down observations
Test in Context: Coffee shop for coffee app, gym for fitness app

Accessibility Testing

Testing with Assistive Technologies

Screen Readers: VoiceOver (iOS), TalkBack (Android), NVDA (Windows)
Keyboard Only: Navigate without mouse
Voice Control: Voice commands only
Screen Magnification: Zoom to 200%+
Color Blindness: Simulate different types

Example: Apple's Accessibility Testing

Practice: Every iOS feature tested with VoiceOver before shipping

Discovery: Many "obvious" interactions don't work for blind users

Example: Swipe gestures need audio feedback, buttons need descriptive labels

Impact: iOS became most accessible mobile OS, expanded market to millions of users with disabilities

Continuous Testing

Building a Testing Cadence

Weekly: Quick guerrilla tests on new features
Bi-weekly: Moderated sessions on in-progress work
Monthly: Comprehensive testing of major features
Quarterly: Benchmark testing, SUS scores

Example: GOV.UK's Testing Culture

Mandate: Every team must test with users every 2 weeks

Infrastructure:

Dedicated user research lab
Panel of 10,000 citizens for recruitment
Research ops team handles logistics
Designers conduct their own tests

Result: Usability improved dramatically, became model for government digital services worldwide

Usability Testing at Scale (Staff/Director Level)

Building Research Infrastructure

Research Ops: Team dedicated to recruiting, scheduling, logistics
Testing Labs: Dedicated spaces with recording equipment
User Panels: Pre-recruited participants for quick studies
Tools & Platforms: UserTesting, Lookback, Maze for scale
Democratization: Train all designers to conduct tests

Example: Microsoft's Usability Testing Program

Scale: 1,000+ usability tests per year across all products

Infrastructure:

15 dedicated usability labs worldwide
Research ops team of 30 people
100,000-person participant panel
Automated recruitment and scheduling
Centralized repository of all findings

Impact: Every product team can test weekly, usability issues caught early, customer satisfaction increased 40%

📅 Evolution of Usability Testing

Pre-2000: Lab Testing Only

Example: Nielsen Norman Group usability labs

Expensive lab setups ($100K+)
Think-aloud protocol established
5-10 participants per study
Weeks to schedule and conduct
VHS recordings and manual analysis

Pre-2023: Remote & Scalable

Example: UserTesting.com, Maze, Lookback

Remote unmoderated testing
Recruit globally in hours
Automated metrics and heatmaps
Continuous testing in agile sprints
Screen recordings with analytics

2023+: AI-Powered Analysis

Example: AI identifies usability issues automatically

AI watches sessions, flags problems
Synthetic users for rapid testing
Real-time accessibility scanning
Predictive usability scores
Automated test script generation

Fun Fact

The "think-aloud" protocol used in usability testing was actually borrowed from psychology research in the 1980s! Researchers studying problem-solving asked people to verbalize their thoughts. Jakob Nielsen adapted it for software testing. Interestingly, studies show that thinking aloud can actually CHANGE how people interact with interfaces—they're more careful and analytical than normal users. This is called the "observer effect"!

⚠️ When Theory Meets Reality: The Contradiction

Theory Says: Always test with real users before launching

Reality: Gmail launched as invite-only beta and stayed in "beta" for 5 years—millions used it without formal usability testing.

Example: Gmail's Perpetual Beta

Google launched Gmail in 2004 with minimal testing
Kept it in beta until 2009
Users found bugs and usability issues in production
Used real usage data instead of lab testing
Became the world's most popular email service

Lesson: Sometimes launching to real users IS the usability test. Beta programs and gradual rollouts can replace traditional testing. The key is having good analytics and fast iteration cycles.

📚 Resources & Further Reading

Books

Krug, Steve. Rocket Surgery Made Easy: The Do-It-Yourself Guide to Finding and Fixing Usability Problems. New Riders, 2009.
Rubin, Jeffrey, and Dana Chisnell. Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests. 2nd ed., Wiley, 2008.
Dumas, Joseph S., and Janice C. Redish. A Practical Guide to Usability Testing. Revised ed., Intellect Books, 1999.

Articles & Papers

Nielsen, Jakob. "Usability Testing 101." Nielsen Norman Group. https://www.nngroup.com/articles/usability-testing-101/
Krug, Steve. "Don't Make Me Think" principles. https://sensible.com/

Tools

UserTesting - Remote usability testing platform
Maze - Rapid testing and research
Lookback - Live user interviews
Hotjar - Session recordings and heatmaps