Doing User Research: Interviews, The Mom Test & Observation

By Pritesh Yadav 10 min read

You cannot design well for a user you have only imagined. Good product sense is not a gift you are born with — it is built from evidence about how real people actually behave. User research is simply the practice of gathering that evidence on purpose, instead of guessing. This chapter teaches you the cheap, beginner-friendly methods that build real empathy: how to interview people, how to watch them work, and how to read the signals you already have.

Key idea: Research replaces guessing with evidence about real behavior. The two mistakes it fights are (1) building for yourself, and (2) trusting what people say over what they actually do.

Analogy: Picture a developer who decides print-shop owners "probably want bulk discounts," builds it, and ships. Now picture sitting beside an owner at 9pm while she fumbles to quote a 500-business-card job from memory. The second person knows what to build. The first only has a hunch wearing a costume.

The Mom Test: how to talk to people who want to be nice

The single best book for beginners here is The Mom Test by Rob Fitzpatrick (2013). The title is the lesson. If you ask your mom "Do you like my business idea?", she will say yes — not to deceive you, but because she loves you and wants to be encouraging. People lie to be nice all the time. So the burden is on you to ask questions that even a loving, biased person cannot answer dishonestly. That means questions about facts, not feelings about your idea.

The three rules

  1. Talk about their life, not your idea. The moment you mention your solution, you contaminate the conversation. As Fitzpatrick puts it: they own the problem, you own the solution.
  2. Ask about specifics in the past, not generics or opinions about the future. What someone did matters infinitely more than what they say they will do. The past is concrete; the future is a wish.
  3. Talk less, listen more.

The three kinds of bad data to throw away

Compliments
"Great idea!" / "I'd totally buy that." A compliment costs nothing, so it is worth nothing. It feels like validation; it is noise.
Fluff
Generic claims ("I always…", "I never…"), future promises ("I would…", "I will…"), and hypothetical maybes ("I might…"). All speculation.
Ideas
Users tossing you feature requests. Note the problem or emotion behind the idea — do not just add it to a backlog. You are already drowning in ideas.
Bad question (biased)Good question (Mom-Test)
"Would you use an app that automated your quotes?""Walk me through the last time you quoted a big custom job. What did you do? How long did it take? What went wrong?"
"Do you think pricing is a problem for print shops?""When was the last time you lost a customer over a price quote? Tell me what happened."
Common mistake: Treating enthusiasm as a result. A real signal is not a smile — it is commitment: the person gives up something they value. Time (a real follow-up meeting), reputation (an intro to their boss or peers), or money (a deposit, a pre-order). Compliments are not commitment.

Generative vs evaluative: the master mental model

All research falls into two buckets, and beginners constantly confuse them.

Generative research (discovery — done EARLY)
Before any solution exists. Goal: discover problems, needs, and context. Answers "What should we even build?" Methods: interviews, field observation, diary studies. The Mom Test is generative-interview hygiene.
Evaluative research (assessment — done LATER)
On a design, prototype, or live product. Goal: test how well a specific solution works. Answers "Does this thing we built actually work — and where do they get stuck?" Methods: usability testing, A/B tests, feature analytics.

In one line: generative research reveals WHAT to test; evaluative research measures HOW WELL the solution works. They are partners, not rivals — use generative to find the right problem, evaluative to refine the solution.

Usability testing and the "5 users" rule

Usability testing means watching real people try to complete a real task with your product, and noting where they struggle. In 2000, Jakob Nielsen (Nielsen Norman Group) published the finding that you only need five users to find most problems. The math, built on earlier work with Tom Landauer, is:

  problems found = N x (1 - (1 - L)^n)

  N = total problems in the design
  L = share one average user finds = 0.31 (31%)
  n = number of users you test

  1 user  -> ~31% of problems
  5 users -> ~85% of problems
  15 users-> ~100%  (sharp diminishing returns)

After five users you mostly keep seeing the same problems. So Nielsen's real advice is not "run one study of 15" — it is to run three small studies of 5, iterating each time (test → fix → re-test). Iteration beats sample size.

Common mistake: Over-applying the rule. The "5 users" figure is for qualitative testing (watching people do tasks). For quantitative studies that produce statistics, NN/G recommends about 40 users. And if you have distinct user groups who behave very differently — a print-shop owner versus their walk-in customer — test ~5 from each group.

How to actually run one

  • Give a realistic task, not a tour: "Set up a product and get a price for 500 business cards."
  • Stay silent and watch. The hardest skill is not helping. Do not answer their questions, do not rescue them — their getting stuck is the data.
  • Use the think-aloud protocol (NN/G calls it the #1 usability tool): ask them to narrate their thoughts out loud as they work.
  • When they go quiet, wait ~10–20 seconds, then prompt neutrally: "What are you thinking?" Silence usually means reading or deciding — that is valuable.
  • Ask only behavioral questions: "What did you expect to happen there?" — never "Did you find that confusing?" or "What would you change?" (both are leading).

Observation & contextual inquiry: go to their world

The richest method is to watch people do their real work in their real environment. Contextual inquiry — formalized by Hugh Beyer & Karen Holtzblatt in Contextual Design (1998) — blends observation with a gentle interview, on site. It reveals the workarounds and habits people never think to mention.

It runs on a master–apprentice model: the user is the master of their craft; you are the apprentice who learns by watching and asking "why did you do that?" Its four principles are Context (go to the real workplace), Partnership (collaborate, don't interrogate), Interpretation (check your read of what you saw with them), and Focus (steer toward what matters).

Example: The famous milkshake study (work associated with Clayton Christensen and Bob Moesta) found that about half of a chain's milkshakes were bought before ~8:30am by solo drivers who took them to go. The "job" the shake was hired for was making a long, boring commute less dull and holding off hunger — a thick shake lasts ~20 minutes through a straw, beating a banana or bagel. Only observation revealed that. Asking "how do we make our milkshake better?" never would. (Sales reportedly rose after marketing to that job; treat the exact percentage as folklore.)
Best practice: For PF360, skip the video call once in a while and sit in a print shop for a morning. You will see the owner juggle the counter, the phone, a half-finished proof, and a paper price book — re-quoting repeat jobs from memory and losing orders to slow turnaround. That tells you the real opportunity is fast quoting and reorder, not a fancy online designer. No interview would surface it.

Surveys, analytics, tickets, and recordings: support, never lead

These tell you how many and where — but never why. Lead with talking to people; use numbers to confirm and size what you found.

  • Surveys scale well for confirming a pattern you already discovered. They are poor starting points: you can only ask about what you already thought of, there is no follow-up, and self-reports suffer the same "people misremember" flaw.
  • Analytics show behavior — a funnel where 70% abandon checkout proves a problem exists, not its cause. Pair it with watching five users to learn the cause.
  • Support tickets are a free, standing stream of real problems in the user's own words. Cluster them by theme; the most frequent and most emotional clusters point to the biggest pains. Steal that exact wording for your UI copy.
  • Session recordings (rage-clicks, repeated dead-end clicks) show where people get stuck without scheduling anything — but still need an interview for why.
Key idea: Qualitative methods (interviews, observation) discover and explain. Quantitative methods (surveys, analytics) confirm and size. Don't lead with numbers.

A simple 5-question interview script

Generative and Mom-Test-compliant — framed for a print-shop owner:

  1. "Walk me through the last time you had to quote and produce a big custom order. What happened, step by step?" (past + specific)
  2. "What was the hardest or most annoying part of that? Why?" (surfaces real pain)
  3. "What did you do to work around it?" (reveals current solution and its cost)
  4. "How much does that problem cost you — in time, money, or lost customers?" (tests if the pain is serious)
  5. "The last time it happened, what did you try to find or buy to fix it? What happened?" (tests real demand without pitching)

Closing move: ask for an introduction to one other shop owner — a small commitment that signals the conversation was real.

Common interview mistakes

  • Pitching your idea instead of asking about their life — it instantly triggers compliments.
  • Leading questions: "Wouldn't it be great if…?", "Don't you hate when…?", "Would you use X?"
  • Asking about the future ("Would you pay $X?") instead of the past ("What did you pay last time?").
  • Accepting compliments and fluff as validation; talking more than you listen; filling silences.
  • Only interviewing friendly, easy users; skipping distinct groups (owner vs. their customer).
  • Hearing what you want to hear (confirmation bias); not recording verbatim quotes.
  • In usability tests: helping, explaining the UI, or asking "did that confuse you?" instead of staying silent.
Best practice: Remember Don Norman's rule from The Design of Everyday Things: when a user struggles, the design is at fault, not the user. A "Norman door" that you push when you should pull is a lying signifier. When an owner can't find your "save price" button, that is a design bug to fix — not a user to blame.

Key takeaways

  • You cannot design for an imagined user — research swaps guessing for evidence about real behavior.
  • The Mom Test: ask about their past and specifics, never your idea or the future; discard compliments, fluff, and raw feature requests.
  • Generative research (early) finds the right problem; evaluative research (late) tests the solution. Use both.
  • Watch 5 users do a real task and stay silent — that finds ~85% of usability problems; iterate 3×5, and test each distinct user group.
  • Observe people in their real environment (master–apprentice); it reveals workarounds no interview surfaces — like the milkshake "job."
  • Surveys, analytics, tickets, and recordings confirm and size what you find — they support research, they never lead it.

Continue reading