Interview transcription — transcribe an interview with speaker labels and timestamps

Interview transcription.Different recording, same result.

Phone memo, Zoom call, lavalier rig, or handheld field recorder — drop the interview recording and get speaker-labeled, timestamped text you can quote from.

Drop a file, or pick one

MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously

Paste a link, we’ll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

No card required~90s per 60-min fileSRT · VTT · DOCX · TXTFiles auto-deleted in 24h

Two voices in. Two voices out, labeled.

Most interviews are two people on one device — a phone on the table, a recorder between you. We separate the interview audio into reporter and source even from a single mono channel, then timestamp every turn for citation.

Field recorder · WAVREC 2 speakers · 38:42

auto-detected en-US48 kHz mono · 1411 kbps

~90s

Transcript · streaming94% accuracy

Can you walk me through what you saw the morning of the eighteenth?

I got there around six. The loading bay door was already open, which it shouldn't have been.

And you'd reported the door issue before — to whom?

To Diane Okafor in facilities, twice in March. I have the emails.

94% on field WAVDOCX · TXT · SRT · JSON

This is what loads when the job finishes.

Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.

app.transcription.solutions / interview-202.mp3Export

Summary 5Transcript 1,420Speakers 2Exports

interview-202.mp347:08128 kbps CBR2 speakersen-US auto-detected

Founders need post-call content, not just transcripts. Tools force them to stitch 5 apps together.

Sample preview from a founder interview about post-call workflow. Real transcripts look exactly like this — same tabs, same summary block, same key-points / action-items split, same auto-tag chips.

Key points

Gap exists between raw recordings and shippable content — tools stop at transcript.

Show notes, social clips, blog drafts all expected by call's end, not next-day.

Current tooling fragmented across 5 apps — no single pipeline.

Conversion-rate signal flipped a buyer-segment assumption at week 3.

40% of original hypothesis survived — the shape held, mechanics rebuilt.

Action items

Speaker 1Investigate single-pipeline approach to replace 5-app stitch.

Speaker 2Mock how show-notes draft could flow from the transcript.

Speaker 2Pull conversion-rate by segment, Monday EOD.

Speaker 1Map the 5-app stitch & list which steps actually need a human.

Auto-taggedfounder interviewpost-call contenttooling fragmentationsingle pipeline

Try it on your own file — it's free

Option 01

Rev human transcription

Real people typing your interview. Best on hostile audio, but you wait and you pay.

Turnaround12–24 hours typical

Accuracy on clean audio99% (claimed)

Speaker labelsManual, included

LanguagesEN human · 30+ AI

Cost · per min$1.50 human · $0.25 AI

PrivacyAudio sent to contractors

Best forCourt-bound or publication-critical interviews on bad audio where you need a human ear and have a day to wait.

Option 02

Transcription.Solutions

AI transcript, speaker-split, ready in minutes. Same engine for phone memo, Zoom, or field recorder.

Turnaround~3 min per hour of audio

Accuracy on clean audio94–96%

Speaker labelsAuto · rename in editor

Languages99, auto-detected

Cost · per min$0.03

PrivacyAudio deleted in 24h · no training

Best forJournalists, researchers, and producers doing multiple interviews a week who need fast, citable text without uploading to a contractor.

Option 03

Otter / Trint

AI transcription with a research-oriented editor. English-strong, locked to monthly plans.

TurnaroundReal-time to ~5 min

Accuracy on clean audio~90–93%

Speaker labelsYes · EN-tuned

LanguagesOtter EN-only · Trint 30+

Cost$17–80/user/mo (subscription)

PrivacyStored in account by default

Best forTeams who want a hosted library of every interview ever recorded and don't mind a monthly seat fee per user.

Pricing and feature flags accurate as of 2026. Human Rev turnaround varies by queue depth and audio length.

8 things people ask about interview transcription.

01Can I use these transcripts in a published article without verifying against the audio?+

For direct quotes — no, always verify against the audio. AI transcripts at 94% accuracy still misread one word in 17 on average, and the wrong word in a quote is a correction. The transcript is for navigation and drafting; the audio is the source of truth.

02My recorder saved a stereo WAV with one mic per speaker. What do I do?+

Upload that file directly — don't convert to mono first. We detect the two channels and route each to its own diarization track, which is the highest-accuracy path we have. Expect 96%+ on a quiet room.

03What about interviews recorded over a phone call?+

Phone audio is 8 kHz narrow-band, which caps accuracy around 88% even on a clean line. We still split the two parties using channel separation if your recorder app captured them separately (most do). VoIP calls over WhatsApp or Signal sound a bit better than PSTN.

04Can I redact off-the-record sections before sharing the transcript?+

Yes. In the editor, select the timestamp range and mark it `[REDACTED]`. The export replaces the text with a redaction marker but keeps the timestamps so the document still tracks the audio.

05Do you train models on my interview recordings?+

No. Source audio is deleted from our infrastructure within 24 hours of completion, and we don't use customer recordings for model training under any plan. The transcript text stays in your account until you delete it.

06Three or four people on a panel interview — does diarization still work?+

Up to about six distinct voices, yes, but accuracy on speaker assignment drops with each added person and gets worse when two speakers sound similar. Plan a 2–3 minute rename pass on the speaker chips after the transcript lands.

07Can you transcribe interviews in languages other than English?+

99 languages, auto-detected. Code-switching (English source slipping into Spanish mid-sentence) is handled in 12 language pairs. Accuracy varies by language — European languages match English; low-resource African and Central Asian languages run 5–10 points lower.

08I record on a Zoom call — should I use your Zoom page instead?+

Same engine, same result. The Zoom page covers cloud-recording specifics (per-participant audio, dial-in degradation). If you're conducting one interview at a time over Zoom, either path works — drop the MP4 here and the speaker labels come out the same.

Interview transcription.Different recording, same result.

Drop a file, or pick one

Paste a link, we’ll fetch the audio

Record straight from your browser

Two voices in. Two voices out, labeled.

This is what loads when the job finishes.

Founders need post-call content, not just transcripts. Tools force them to stitch 5 apps together.

Rev human. Otter or Trint. Or us.

Rev human transcription

Transcription.Solutions

Otter / Trint

Three things that bite people on generic transcription tools.

What goes wrong

What to flip here

Recommended job settings for interviews

96% on a good lav. Still readable on a cafe recording.

8 things people ask about interview transcription.

Drop your interview recording. See what comes out.