Rev human transcription
Real people typing your interview. Best on hostile audio, but you wait and you pay.
Phone memo, Zoom call, lavalier rig, or handheld field recorder — drop the interview recording and get speaker-labeled, timestamped text you can quote from.
MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously
YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more
↓ Watch what comes out
Most interviews are two people on one device — a phone on the table, a recorder between you. We separate the interview audio into reporter and source even from a single mono channel, then timestamp every turn for citation.
Can you walk me through what you saw the morning of the eighteenth?
I got there around six. The loading bay door was already open, which it shouldn't have been.
And you'd reported the door issue before — to whom?
To Diane Okafor in facilities, twice in March. I have the emails.
↓ This is the dashboard
Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.
Sample preview from a founder interview about post-call workflow. Real transcripts look exactly like this — same tabs, same summary block, same key-points / action-items split, same auto-tag chips.
Three real options · honest comparison
Rev sends your audio to human transcribers — slow and pricey but high fidelity on hard audio. Otter and Trint are AI-first like us, tuned for journalists and researchers. Here's where each fits.
Real people typing your interview. Best on hostile audio, but you wait and you pay.
AI transcript, speaker-split, ready in minutes. Same engine for phone memo, Zoom, or field recorder.
AI transcription with a research-oriented editor. English-strong, locked to monthly plans.
Pricing and feature flags accurate as of 2026. Human Rev turnaround varies by queue depth and audio length.
Specific to interviews
Interview audio is rarely clean. Flip these settings and the transcript holds up under quoting.
Drop an interview file and these flip on by default. Override per-job from the form.
Accuracy · real-world numbers
Interview accuracy is bounded by what the mic actually heard. Close-mic stereo on each speaker is the ceiling; a phone sitting on a noisy table is the floor. Numbers below come from production interview files, not synthetic benchmarks.
One mic per speaker, separate channels (Zoom H5/H6, Tascam DR-40). Diarization is trivial — error is text-only.
Single condenser between two speakers, quiet room. Acoustic diarization separates voices reliably under 4 ft.
iPhone or Pixel voice memo on the table. Names and numbers occasionally miss; cadence is fine for quoting.
Espresso machines, traffic, third voices nearby. Worst case in our data — usable for navigation, verify quotes against audio.
Common questions
30 free minutes every month. No card. Speaker labels, 99 languages, all exports included.
Start free