Start free

Блог · · 8 min read

User research interview transcription: a Dovetail and Maze alternative for solo PMs

Dovetail and Maze charge per-seat plus per-interview. For a solo PM running 5 interviews a month, general transcription + a Notion doc covers 80% of the value at 10% the cost.

Dovetail and Maze are priced for research teams. A solo PM running 5 interviews a month doesn't need them.

If you run user research as one person inside a product team — 5 to 8 interviews a month, no dedicated UXR org behind you — a general transcription tool plus a tagged Notion database covers about 80% of what Dovetail and Maze ship, at roughly 10% of the cost. The seat fee and per-interview pricing on dedicated research platforms makes sense when 4+ people are tagging, theming, and synthesizing in parallel. For a solo PM, you're paying for collaboration features you'll never trigger.

We're a transcription engine, not a research repository. So this article is honest about where general transcription stops being enough — and where it's plenty. If you're spending $30-$375/month on Dovetail for occasional interview work, read on before the next renewal.

When dedicated research tools earn their seat price

Dovetail and Maze do specific things well that a transcript-plus-Notion stack does not:

  • Multi-researcher tagging with conflict resolution. When two researchers tag the same clip differently — one writes "pricing," the other writes "cost" — Dovetail surfaces the conflict. Critical for a 5-person UXR team. Irrelevant for one person.
  • Unmoderated study orchestration (Maze specifically). Recruiting, task flows, success metrics, heatmaps on prototypes — Maze runs the whole unmoderated leg. If you do prototype testing at scale, no transcription tool replaces it.
  • Insight repositories with cross-study search. Dovetail lets a team query "what have we learned about onboarding across the last 18 studies?" — compounding value at team scale.
  • Stakeholder-facing reports with embedded clips. Both tools render slick reports with video timestamps. You can rebuild this in Notion or Loom — it takes more clicks.

Where the seam shows for a solo PM: Dovetail's 2026 pricing puts Starter at $30/month (10 transcript hours, single user) and Team at $375/month for a small workspace — both step up sharply if you need cross-team search across many studies. Maze's Starter tier is roughly $99/seat/month billed annually with usage caps. Both prices assume collaboration features that only matter past 3 researchers.

There's a quieter cost too. Stakeholders don't log into research platforms. Your engineers live in Linear, your designers in Figma, your specs in Notion. A quote that requires a separate login to read is a quote that gets ignored. The most effective research repository is the doc your team already has open.

What a solo PM actually does with research transcripts

Strip the workflow down. A solo PM doing discovery or evaluative research does four things with each interview:

  1. Re-reads or skims the transcript within a day or two, while context is fresh.
  2. Pulls 3-8 standout quotes per interview — the lines that captured a pain point, a workaround, or a strong reaction.
  3. Tags those quotes by theme, feature area, or segment so they can be retrieved later.
  4. Writes a short readout — a Notion page, a Slack post, 5 slides — with quotes embedded as evidence.

That's it. Almost everything Dovetail's interface does sits on top of those four actions. The work is in deciding what's a real insight, not in formatting it.

DIY repository: transcripts + Notion + tags

Here's the stack we've seen solo PMs build. It survives org reshuffles better than Dovetail does — nobody loses access when a seat gets reassigned, and a CSV export is one click away the day you do hire a researcher.

The pieces

  • Audio capture: Zoom, Google Meet, or a local recorder. If the call is on Meet/Zoom/Teams, our meeting bot joins under a configurable name and captures the recording. Two-party consent disclosure is posted in chat on join.
  • Transcription: upload audio or video, get a speaker-labeled transcript back. We run AssemblyAI Universal-3 in production — WER ~7.88% on clean 16 kHz audio, solid for a 1:1 over a good headset.
  • Repository: a Notion database with one entry per interview. Properties: participant role, segment, study, date, themes (multi-select), product area (multi-select), recording link, transcript link.
  • Quote layer: a second Notion database — "Quotes" — linked via relation. Each row is one quote with its own tags. This is the part Dovetail charges for. Twenty minutes to set up.

Why two databases, not one

Tagging at the quote level is what makes a repository searchable 6 months later. If you only tag the parent interview as "onboarding pain," you can't find the specific line a user said about the empty-state without reopening the transcript.

A separate quotes database lets you filter "every quote tagged onboarding + frustration across all studies." That's the cross-study insight pattern Dovetail sells, rebuilt for free.

Where Notion struggles

Be honest about it. Notion's relation filtering gets slow past ~2,000 rows. There's no inline video clip player that plays at a specific timestamp. Past ~50 interviews, manual tagging starts to drag. If you accumulate 200+ interviews and want clip-level playback in the doc, you've outgrown the DIY setup — and Dovetail starts earning its price.

Speaker labels for moderator + participant

Most research interviews are two people on a video call. Diarization is usually clean, but the conditions matter.

Stereo vs mono

Zoom and Meet can record separate-track audio ("Record a separate audio file for each participant" in Zoom settings). That gives stereo: each speaker on their own channel. We channel-split — moderator left, participant right, no swapping, effectively perfect.

If your recording is mono — single mixed track, the default on most cloud recordings — we fall back to pyannote-3.1. On a clean 2-person call, that's solid. Where it slips:

  • Both speakers on the same physical microphone (in-person interview recorded on a phone).
  • Heavy interrupt patterns — when moderator and participant talk over each other for several seconds, the diarizer can merge them.
  • One speaker much quieter than the other — pyannote sometimes misses short utterances ("mm-hm", "yeah") from the quieter speaker.

For solo PM workflows — short questions, long participant answers — this is almost always fine. Label "Speaker A" → "Moderator" and "Speaker B" → "Participant" once at the top, find-and-replace, move on.

Try it on your audio

Start free →

30 minutes a month, no card.

Quote extraction — and what AI can't do here

The work where solo PMs actually spend time, and where AI helps or hurts depending on how you use it.

What works

Read the transcript at 2x speed while the audio plays. Mark quotes inline. A 45-minute interview takes 20-25 minutes this way — faster than re-listening at 1x without a transcript.

For each marked quote, grab the line plus 2-3 surrounding sentences (context matters to stakeholders), drop it in the Quotes database, tag it. Tag while it's fresh — themes you can name 10 minutes after a call are themes you actually heard, not themes you invented a week later.

Keep the contradicting quotes too. If four users want automated setup and one power user wants manual control, that exception may be the enterprise requirement hiding inside the theme. A DIY repository makes those exceptions easier to preserve than a dashboard that smooths them into a percentage.

What we don't ship

We don't ship automatic "insight extraction" or AI theming across multiple transcripts. That's a Dovetail feature. We give you the text — the synthesis is yours.

If you want AI-assisted theming, pasting the transcript into Claude or ChatGPT with "extract the 5 strongest quotes about [topic] with timestamps" works — but it hallucinates timestamps and occasionally invents quotes. Always verify against the source. We don't bake this in because the hallucination risk on research artifacts is too high for us to put our name on.

A broader limit: AI transcription doesn't replace research judgment. It can't tell you what to build, can't infer willingness to pay, can't catch when a participant is being polite. It also won't fix a leading interview guide — it'll just preserve the bias faster. For academic or IRB-bound work, manual coding remains the standard. AI summarization is a draft, not the record.

Consent and participant data

Treat interviews as confidential by default. Participants discuss internal tools, revenue, hiring, security, sometimes health.

If our meeting bot joins, consent disclosure posts in chat on join and the bot is visible in the participant list under our configurable name. An opt-out endpoint exists at /opt-out/{token}. We use HIPAA-grade data handling at rest, but we are not a HIPAA BAA-covered product yet — if your research touches PHI and your process requires a BAA, don't treat us as your covered vendor today. Email us if you're piloting in that space.

For most product research, the safer operating model: store participant names separately from transcript IDs, redact sensitive details before sharing readouts, attribute quotes by role and segment rather than name, and delete recordings you no longer need.

Cost math: minutes beat seats when volume is low

Dedicated suites price around seats, study volume, or respondent count. That fits when the product runs the full research lifecycle. It feels heavy when one PM runs a handful of moderated calls.

A transcription-first stack prices closer to the raw work — minutes of audio.

Dovetail: Starter $30/month (single user, 10 transcript hours), Team $375/month workspace, per dovetail.com/pricing (2026). Maze Pro (Starter): roughly $99/seat/month billed annually = ~$1,188/year, per maze.co/pricing. Stronger for unmoderated than moderated — different use case. Rev.com (human transcription): $1.99/min per rev.com/pricing. A 45-min interview is nearly $90 — breaks the budget for continuous discovery. Otter.ai: cheap, fast, excellent live collaborative notes — but data stays in Otter's workspace, which fights an export-to-Notion habit.

DIY stack with us:

  • Notion: free for personal use, or $10/month on a paid workspace.
  • Transcription: 5 interviews × ~45 min = 225 min/month. Comfortably inside our Pro tier (600 audio-minutes/month or 1,200 fungible credits, 2 GB file ceiling, 10 h/file max). See current pricing for exact numbers. Meeting bots use 2 credits/min, file uploads 1 credit/min.
  • Total: ~$20-30/month, $240-360/year.

You save $700-900/year and give up: insight dashboards, multi-researcher conflict resolution, integrated video clip players. If you're solo, none of those move the needle.

A side benefit: 99 languages, all priced the same. One price, every language. If you interview users in Spanish or Japanese next quarter, the credit math doesn't change.

A 3-interview pilot before committing

Before signing a Dovetail or Maze contract, run this. Use real interviews, not sample audio — the hard parts are your microphone setup, your participants, your terminology.

  1. Pick 3 recordings across audio quality. One clean headset 1:1, one normal laptop-mic call with some background noise, one hard case (accents, jargon, fast talker, or in-person on a phone).
  2. Upload to our 60-minute free tier — enough for one full interview to test diarization and accuracy on your actual audio.
  3. Build the two-database Notion setup. 30 minutes. Tag one interview end-to-end.
  4. Write the readout you'd actually send your team.

Measure four numbers: minutes from recording to usable transcript, minutes spent fixing speaker labels, minutes spent verifying quotes, and total time to readout. Compare to your current process.

If after three interviews the friction is real — you wanted multi-clip reels, automatic cross-study theming, three teammates tagging in parallel, governed retention — that's your signal to buy. Coordination cost is the trigger, not interview count.

What next

  • Upload one real interview to the free tier and check diarization on your actual audio.
  • If your calls happen in Zoom, Meet, or Teams, look at our meeting bot setup — joins the call, ships the transcript when the recording ends.
  • If your work is closer to academic or longitudinal research, the research interview workflow covers 60-90 min sessions and multi-speaker focus groups.
  • Build the two-database Notion setup before transcribing anything. The repository structure is what makes transcripts useful 6 months later — not the transcription tool you pick.