YouTube transcription. Better than auto-captions.Cheaper than human.

Paste a YouTube video URL. Get a 95%+ accurate transcript with speaker labels, chapter timestamps, and SRT/VTT captions you can re-upload — no Premium, no Chrome extension.

Drop a file, or pick one

MP3 · WAV · M4A · MP4 · MOV · MKV · OGG · OPUS · FLAC · WEBM — up to 100 MB anonymously

Paste a link, we’ll fetch the audio

YouTube · TikTok · Vimeo · Twitter · SoundCloud · Spotify · 50+ more

Record straight from your browser

Sign up takes 30 seconds — recording opens right after, in the dashboard.

No card required~90s per 60-min fileSRT · VTT · DOCX · TXTFiles auto-deleted in 24h

↓ Watch what comes out

URL in. Captions and clean transcript out. Clean transcript out.

Paste a youtu.be or youtube.com link. We resolve it, pull the highest-bitrate audio track server-side, run diarization, and hand back a timestamped transcript plus SRT/VTT ready to upload as community captions.

youtu.be/dQw4w9WgXcQREC Interview · 2 speakers · 28:14
auto-detected en-USopus 160 kbps · 48 kHz
~90s
Transcript · streaming96% accuracy
S1

So the channel hit 100k subs in eight months — what actually moved the needle?

S2

Honestly, posting Shorts daily for six weeks. The long-form watch time followed.

S1

And the thumbnail rework — was that A/B tested in YouTube Studio?

S2

Yeah, the new Test & Compare tool. Two of three winners had no face on them.

96% on talking-head audioSRT · VTT · DOCX · TXT · JSON

↓ This is the dashboard

This is what loads when the job finishes.

Same layout as the real dashboard — Summary, full Transcript, Speakers tab, Exports. Key points and action items extracted automatically. Auto-tags on every job.

Try it on your own file — it's free

Three real options · honest comparison

YouTube auto-captions. Rev human. Or us.

YouTube ships auto-captions on every video for free — they're just not very accurate and have no speaker labels. Rev sells human-typed transcripts at $1.50/min. We sit in the middle: AI at 95%+, speaker labels, three-minute turnaround.

Option 01

YouTube auto-captions

Free, baked into every public video. No punctuation pass, no speaker labels.

CostFree
Accuracy~80% on clean speech
Speaker labelsNone
PunctuationSparse, no paragraphs
ExportCopy-paste from transcript panel
Works onPublic videos only
Best forQuickly scanning a video you don't own when accuracy doesn't matter.
Option 02

Transcription.Solutions

Paste the URL. Three minutes later: clean transcript, SRT/VTT, AI summary with chapter links.

Cost · per min$0.03 on Pro
Accuracy95%+ on talking-head
Speaker labelsYes (Pro and Business)
PunctuationFull, with paragraphs
ExportSRT · VTT · DOCX · TXT · JSON
Works onPublic + unlisted URLs
Best forCreators re-uploading captions, podcasters repurposing video to blog, researchers pulling quotes from interviews.
Option 03

Rev human transcription

A human types it. Highest accuracy, slowest turnaround, priced per minute.

Cost · per min$1.50
Accuracy99%+ guaranteed
Speaker labelsYes
PunctuationFull, editorial-grade
Turnaround12-24 hours typical
Works onAny uploaded file
Best forCourt-admissible content, broadcast subtitles, or interviews where one missed word kills the quote.

Pricing accurate as of 2026. Rev rates reflect their standard service tier; AI-only tiers from competitors not compared here.

Specific to YouTube

Three things that bite people on generic transcription tools.

YouTube audio has quirks that off-the-shelf transcribers don't handle. Flip the right settings and the transcript comes back ready to re-upload as captions.

What goes wrong

  1. 1Music beds confuse the recognizer. Intro stings and background music get transcribed as garbled words. Generic AI doesn't know to ignore them.
  2. 2SRT line lengths don't match YouTube's caption rules. Subtitles overflow the safe area on mobile, or cut mid-word because the chunker wasn't tuned for video.
  3. 3Channel-specific names (sponsor brands, game titles, guest handles like @MKBHD) get spelled phonetically. One typo and the quote is unsearchable.

What to flip here

  1. 1Turn on Music-aware segmentation on the job form. We tag music regions with `[music]` instead of hallucinating lyrics, and resume transcription clean when the voice returns.
  2. 2Pick YouTube-safe SRT as the export. Lines cap at 42 characters, max two lines per cue, and breaks land on phrase boundaries — drop the file straight into YouTube Studio.
  3. 3Paste channel vocabulary (sponsor names, recurring guests, game titles) into Custom vocabulary. We feed it to the recognizer as a hint so brand spellings stay correct.

Recommended job settings for YouTube

Paste a YouTube URL and these flip on by default. Override per-job from the form.

Source
URL paste · auto-resolve youtu.be
Diarization
Acoustic · 1-4 speakers
Music handling
Tag [music], skip lyrics
Filler words
Removed by default
Summary
Chapter timestamps + key moments
Export
YouTube-safe SRT · VTT · DOCX

Accuracy · real-world numbers

95%+ on talking-head videos. Music and game audio cap lower.

YouTube content varies wildly — a studio podcast and a Fortnite stream are not the same problem. Lapel-mic talking-head is the best case; background music and overlapping game audio drag accuracy fastest. Numbers below are from real customer YouTube URLs in production.

97%
Studio podcast · per-guest mic

Joe Rogan-style setup: each guest on a separate boom mic, light room treatment, no music bed. Diarization is trivial when voices don't bleed.

95%
Single talking-head · lapel/USB mic

Standard tutorial or video essay. One speaker, indoor audio, intro music ducked under voice. Most YouTube uploads land here.

89%
Vlog with B-roll · outdoor audio

Wind, traffic, ambient music under voiceover. Words still usable; expect occasional misses on proper nouns and brand names.

84%
Gaming stream · voice over game audio

Game SFX, music, and chat-reading at variable volume. Streamer's voice usually clear; teammates on Discord drop fastest. Worst case in our data.

Common questions

8 things people ask about YouTube transcription.

01Do I just paste the URL, or do I download the video first?+
Just paste the URL. We accept youtube.com/watch, youtu.be short links, and unlisted video URLs. We resolve it server-side, pull the audio track only (not the video), and start transcribing — usually within 10 seconds of paste.
02Does it work on private or unlisted videos?+
Unlisted yes, private no. Unlisted URLs are publicly resolvable if you have the link, so we can fetch them. Private videos require being signed into your Google account — we can't impersonate you. Download the MP4 from YouTube Studio first, then upload the file.
03Why is your transcript so much better than YouTube's auto-captions?+
YouTube's auto-captions run a streaming model tuned for cost-at-scale across billions of videos. We run a larger model with full-context decoding, custom vocabulary, and a separate diarization pass. Result: ~95% vs ~80%, plus speaker labels and proper punctuation.
04Can I upload the SRT back to YouTube as community captions?+
Yes. Export as YouTube-safe SRT, open YouTube Studio → Subtitles → Add → Upload file. Our line lengths and timing match YouTube's display rules, so cues won't overflow on mobile or break mid-word.
05What about copyright — is it legal to transcribe someone else's video?+
Transcribing for personal use, research, journalism, or commentary is generally fair use in the US. Re-publishing the full transcript commercially is murkier. We don't host the audio or video, we hand you the text — what you do with it is your call. Not legal advice.
06Can you handle long videos like 4-hour podcast episodes?+
Yes. Our hard cap is 8 hours per file. A 4-hour Lex Fridman episode transcribes in roughly 8-12 minutes wall-clock and lands around $7.20 on Pro pricing. Speaker diarization holds up across the full length.
07Do you handle non-English YouTube videos?+
Yes — 99 languages auto-detected. Spanish, Hindi, Portuguese, and Japanese all land within 2-3 points of English accuracy on clean audio. Code-switching (English + Spanish in the same sentence) works but degrades by ~5 points.
08Can I get chapter timestamps like YouTube's auto-chapters?+
Yes. The AI summary includes chapter-style timestamps to topic transitions plus key-moment links. Paste them into your video description as `00:00 Intro / 03:42 Setup / …` — YouTube renders them as clickable chapters automatically.

Paste a YouTube URL. See what comes out.

30 free minutes every month. No card. Speaker labels, YouTube-safe SRT, AI summary with chapter timestamps — all included.

Start free