Usability Testing
What is Usability Testing?
Usability testing is watching real users attempt to complete tasks with your product. It reveals where users struggle, get confused, or fail—insights you can't get from analytics alone.
The Power of Watching Users
Jakob Nielsen's Rule: Testing with 5 users reveals 85% of usability problems
Why It Works: Most issues are encountered by multiple users. After 5 tests, you see diminishing returns.
Types of Usability Testing
Moderated vs Unmoderated
Moderated Testing
What: Facilitator guides user through tasks
Pros: Can ask follow-up questions, dig deeper, observe body language
Cons: Time-intensive, requires scheduling, facilitator bias
Example: In-person session where you watch user navigate your app while asking "What are you thinking?"
Unmoderated Testing
What: Users complete tasks independently, recorded
Pros: Fast, scalable, natural behavior, cheaper
Cons: Can't ask follow-ups, technical issues, less context
Example: UserTesting.com—users record themselves completing tasks, you review videos later
Remote vs In-Person
Example: Zoom's Remote Testing Advantage
Pre-COVID: Most usability testing was in-person
2020 Shift: Forced remote testing adoption
Discovery: Remote testing had unexpected benefits:
- Users in natural environment (home/office)
- Easier to recruit diverse participants
- Lower cost (no lab rental)
- Easier to record and share
Result: Many companies now prefer remote testing
Planning a Usability Test
Test Plan Components
- Goals: What do you want to learn?
- Participants: Who represents your users?
- Tasks: What should users try to do?
- Scenarios: Context for each task
- Success Metrics: How do you measure usability?
- Questions: What to ask after tasks
Writing Good Tasks
- Realistic: Based on actual user goals
- Specific: Clear end state
- No UI Hints: Don't say "click the button"
- Scenario-Based: Give context and motivation
Bad: "Click on the search icon and search for shoes"
Good: "You need running shoes for a marathon. Find a pair that fits your budget of $100"
Example: Airbnb's Booking Flow Test
Goal: Identify friction in booking process
Participants: 8 users who booked Airbnb in last 6 months
Task: "You're planning a weekend trip to San Francisco for 2 people. Find and book a place to stay."
Metrics: Time to complete, errors made, completion rate
Discovery: Users confused by cleaning fee appearing late in process
Fix: Showed total price upfront, completion rate increased 12%
Recruiting Participants
Recruitment Methods
- User Research Panels: Pre-recruited pool of users
- Recruitment Services: UserTesting, Respondent.io
- Social Media: Post in relevant communities
- Customer Lists: Email existing users
- Intercepts: Recruit from your website/app
- Guerrilla Testing: Coffee shops, public spaces
Screener Questions
Filter for right participants:
- Demographics (age, location, occupation)
- Behavior (frequency of use, experience level)
- Attitudes (preferences, pain points)
- Technology (devices, platforms)
Example: "How often do you order food delivery?" (Daily/Weekly/Monthly/Rarely/Never) → Filter for Weekly+ users
Recruitment Mistakes
- Testing Coworkers: They know too much
- Friends and Family: Too polite, biased
- Professional Testers: Not representative
- Wrong Segment: Testing power users when designing for beginners
Facilitating Sessions
Facilitator's Role
- Set Expectations: Explain you're testing the product, not them
- Think Aloud: Ask users to verbalize their thoughts
- Don't Help: Let them struggle (that's the data)
- Stay Neutral: Don't react to successes or failures
- Probe Gently: "What are you thinking?" not "Why did you do that?"
Session Structure (60 minutes)
- Introduction (5 min): Build rapport, explain process
- Background (5 min): Learn about participant
- Tasks (40 min): Observe task completion
- Debrief (10 min): Overall impressions, questions
Example: Slack's Onboarding Test
Setup: New users, never used Slack
Task: "Your team wants to use Slack. Set up a workspace and invite a teammate."
Observation: 6 out of 8 users confused by "workspace" terminology
Facilitator Note: Didn't explain "workspace," let users struggle to understand
Fix: Changed language to "team" and added explainer, completion rate went from 40% to 85%
What NOT to Say
- "That's not how you're supposed to do it" (judgmental)
- "Try clicking there" (leading)
- "Most people find this easy" (pressure)
- "Let me show you" (defeats purpose)
Analyzing Results
What to Look For
- Task Success Rate: Did they complete it?
- Time on Task: How long did it take?
- Error Rate: How many mistakes?
- Paths Taken: Expected vs actual route
- Verbal Feedback: Confusion, frustration, delight
- Body Language: Hesitation, confidence
Severity Rating
- Critical: Prevents task completion (fix immediately)
- Serious: Causes significant delay or frustration (fix soon)
- Minor: Small annoyance (fix if time permits)
- Cosmetic: Doesn't affect usability (backlog)
Example: Instagram's Photo Upload
Test Finding: Users tapped "Next" multiple times thinking it wasn't working
Root Cause: No loading indicator while processing photo
Severity: Serious (caused frustration, duplicate uploads)
Fix: Added progress spinner and "Processing..." text
Result: Support tickets about upload issues dropped 60%
Affinity Mapping Findings
Process:
- Write each observation on sticky note
- Group similar issues together
- Identify patterns across users
- Prioritize by frequency and severity
Example: 5 users struggled with search → High priority. 1 user wanted dark mode → Lower priority.
Metrics & Benchmarking
Quantitative Usability Metrics
- Task Success Rate: % who completed task
- Time on Task: Average seconds to complete
- Error Rate: Mistakes per task
- Clicks to Complete: Number of interactions
- SUS Score: System Usability Scale (0-100)
Example: Amazon's One-Click Patent
Baseline Test: Standard checkout flow
- Average time: 90 seconds
- Completion rate: 60%
- Average clicks: 8
One-Click Test:
- Average time: 3 seconds
- Completion rate: 95%
- Average clicks: 1
Impact: 30x faster, 58% higher completion—worth patenting
System Usability Scale (SUS)
What: 10-question survey, scored 0-100
Benchmark: 68 is average, 80+ is excellent
Use: Compare versions, track over time, benchmark against competitors
Example: Redesign increased SUS from 62 to 78, validating improvements
A/B Testing vs Usability Testing
When to Use Each
Usability Testing
- Best For: Understanding WHY users struggle
- Sample Size: 5-8 users
- Data Type: Qualitative insights
- Speed: Days to weeks
A/B Testing
- Best For: Measuring WHICH version performs better
- Sample Size: Thousands of users
- Data Type: Quantitative metrics
- Speed: Weeks to months
Example: Netflix's Artwork Testing
Usability Testing: Showed users different title artwork, asked which caught their attention
Insight: Users preferred images with faces, emotional expressions
A/B Testing: Tested face-focused vs scene-focused artwork with millions of users
Result: Face-focused artwork increased click-through 30%
Lesson: Usability testing generated hypothesis, A/B testing validated at scale
Guerrilla Testing
What is Guerrilla Testing?
Quick, informal testing with people in public places. Fast and cheap way to get feedback.
Where: Coffee shops, libraries, parks, malls
Time: 5-10 minutes per person
Incentive: Coffee, gift card, or just goodwill
Example: Starbucks Mobile Order Testing
Method: Approached customers in Starbucks, asked to try mobile ordering prototype
Setup: Laptop with clickable prototype
Task: "Order your usual drink"
Time: 5 minutes per person, tested 20 people in 2 hours
Discovery: Customization options were buried, users couldn't find them
Cost: $100 in gift cards vs $5,000 for formal lab testing
Guerrilla Testing Tips
- Be Respectful: Ask permission, accept rejection gracefully
- Keep It Short: 5-10 minutes max
- Have Clear Task: One specific thing to test
- Take Notes: Record or write down observations
- Test in Context: Coffee shop for coffee app, gym for fitness app
Accessibility Testing
Testing with Assistive Technologies
- Screen Readers: VoiceOver (iOS), TalkBack (Android), NVDA (Windows)
- Keyboard Only: Navigate without mouse
- Voice Control: Voice commands only
- Screen Magnification: Zoom to 200%+
- Color Blindness: Simulate different types
Example: Apple's Accessibility Testing
Practice: Every iOS feature tested with VoiceOver before shipping
Discovery: Many "obvious" interactions don't work for blind users
Example: Swipe gestures need audio feedback, buttons need descriptive labels
Impact: iOS became most accessible mobile OS, expanded market to millions of users with disabilities
Continuous Testing
Building a Testing Cadence
- Weekly: Quick guerrilla tests on new features
- Bi-weekly: Moderated sessions on in-progress work
- Monthly: Comprehensive testing of major features
- Quarterly: Benchmark testing, SUS scores
Example: GOV.UK's Testing Culture
Mandate: Every team must test with users every 2 weeks
Infrastructure:
- Dedicated user research lab
- Panel of 10,000 citizens for recruitment
- Research ops team handles logistics
- Designers conduct their own tests
Result: Usability improved dramatically, became model for government digital services worldwide
Usability Testing at Scale (Staff/Director Level)
Building Research Infrastructure
- Research Ops: Team dedicated to recruiting, scheduling, logistics
- Testing Labs: Dedicated spaces with recording equipment
- User Panels: Pre-recruited participants for quick studies
- Tools & Platforms: UserTesting, Lookback, Maze for scale
- Democratization: Train all designers to conduct tests
Example: Microsoft's Usability Testing Program
Scale: 1,000+ usability tests per year across all products
Infrastructure:
- 15 dedicated usability labs worldwide
- Research ops team of 30 people
- 100,000-person participant panel
- Automated recruitment and scheduling
- Centralized repository of all findings
Impact: Every product team can test weekly, usability issues caught early, customer satisfaction increased 40%
📅 Evolution of Usability Testing
Pre-2000: Lab Testing Only
Example: Nielsen Norman Group usability labs
- Expensive lab setups ($100K+)
- Think-aloud protocol established
- 5-10 participants per study
- Weeks to schedule and conduct
- VHS recordings and manual analysis
Pre-2023: Remote & Scalable
Example: UserTesting.com, Maze, Lookback
- Remote unmoderated testing
- Recruit globally in hours
- Automated metrics and heatmaps
- Continuous testing in agile sprints
- Screen recordings with analytics
2023+: AI-Powered Analysis
Example: AI identifies usability issues automatically
- AI watches sessions, flags problems
- Synthetic users for rapid testing
- Real-time accessibility scanning
- Predictive usability scores
- Automated test script generation
Fun Fact
The "think-aloud" protocol used in usability testing was actually borrowed from psychology research in the 1980s! Researchers studying problem-solving asked people to verbalize their thoughts. Jakob Nielsen adapted it for software testing. Interestingly, studies show that thinking aloud can actually CHANGE how people interact with interfaces—they're more careful and analytical than normal users. This is called the "observer effect"!
⚠️ When Theory Meets Reality: The Contradiction
Theory Says: Always test with real users before launching
Reality: Gmail launched as invite-only beta and stayed in "beta" for 5 years—millions used it without formal usability testing.
Example: Gmail's Perpetual Beta
- Google launched Gmail in 2004 with minimal testing
- Kept it in beta until 2009
- Users found bugs and usability issues in production
- Used real usage data instead of lab testing
- Became the world's most popular email service
Lesson: Sometimes launching to real users IS the usability test. Beta programs and gradual rollouts can replace traditional testing. The key is having good analytics and fast iteration cycles.
📚 Resources & Further Reading
Books
- Krug, Steve. Rocket Surgery Made Easy: The Do-It-Yourself Guide to Finding and Fixing Usability Problems. New Riders, 2009.
- Rubin, Jeffrey, and Dana Chisnell. Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests. 2nd ed., Wiley, 2008.
- Dumas, Joseph S., and Janice C. Redish. A Practical Guide to Usability Testing. Revised ed., Intellect Books, 1999.
Articles & Papers
- Nielsen, Jakob. "Usability Testing 101." Nielsen Norman Group. https://www.nngroup.com/articles/usability-testing-101/
- Krug, Steve. "Don't Make Me Think" principles. https://sensible.com/
Tools
- UserTesting - Remote usability testing platform
- Maze - Rapid testing and research
- Lookback - Live user interviews
- Hotjar - Session recordings and heatmaps