Start free

Блог · · 5 min read

Online course module transcription for Teachable, Kajabi, and Thinkific

Course creators sell modules — but transcripts and captions drive SEO, accessibility, and refund-reducing student support. How to add them without re-recording.

serve paying students who watch on mute (commute, open-plan office, sleeping kid in the room). Without captions, those students fall behind and stop opening the next module — drop-off after module 2 can be another refund indicator.

There's also a procurement angle if you sell into corporate L&D. Many enterprise buyers require captions that support ADA/WCAG or Section 508 accessibility requirements before they'll expense a seat, and auto-generated browser or device captions usually don't pass that bar on technical vocabulary. ESL students — often a sizable slice of paid course audiences — may read English more easily than they parse a spoken accent, so captions widen the comprehension funnel as well.

The takeaway: transcripts aren't an accessibility checkbox — they're the cheapest retention and reach tool you can ship to an existing course.

Where Teachable, Kajabi, and Thinkific accept caption files

All three accept standard caption formats on native video uploads, though the exact file type and UI vary by platform. Built-in or auto-generated captions, where available, may not hit the quality bar you'd want on a paid course, so bringing your own reviewed file is the safer move.

Teachable

Teachable accepts WebVTT (.vtt) on videos hosted in their player. In the lesson editor, upload the video, then attach the VTT under the captions/subtitles setting on that video. Multiple language-track support can vary by account and player version — when multiple tracks are available, the student picks from the CC menu.

Teachable may not accept SRT directly in every school UI. Convert SRT → VTT before upload; don't just rename the file, because a proper conversion also handles the WebVTT header and timestamp syntax. For the prose transcript, stack a text block under the video block in the lesson editor — don't paste raw SRT, because the timecodes will render as visible page text.

Kajabi

Kajabi supports native caption uploads, and its player is Wistia under the hood. The player accepts SRT and VTT. In the lesson editor, open the video's settings or captions area, then upload one file per language. If auto-generated captions are available in your Kajabi/Wistia setup, accuracy on technical or accented audio is still inconsistent — uploading your own reviewed SRT is the safer move on paid content.

For the prose transcript, an accordion block below the player ("Read the transcript") keeps a 9,000-word lesson from drowning the page layout while still making the text available on the page.

Thinkific

Thinkific's hosted player supports WebVTT (.vtt) caption upload for hosted videos; if you start with SRT, convert to VTT unless your account UI explicitly accepts SRT. If you host video on Wistia or Vimeo and embed into Thinkific, manage captions in the host's dashboard — they render through that player's CC menu.

To combine video and prose in one lesson, use the text area available in a Video Lesson where it appears in your builder; if you're embedding an outside player, use Thinkific's Multimedia Lesson type or a custom page layout.

For all three platforms: keep the SRT/VTT alongside the video in your source repo. When you re-edit a module, regenerate captions from the new cut — don't patch the old file.

Try it on your audio

Start free →

30 minutes a month, no card.

The SEO math: a 60-minute lesson is a 9,000-word page

English instructional video runs around 130–150 wpm. A 60-minute lesson transcribes to roughly 8,000–10,000 words; a 90-minute deep-dive lands closer to 12,000–14,000. Google can't reliably index the spoken content of your course video unless you publish text it can read — title, URL, structured data, and whatever transcript or summary you put next to the player.

That's a lot of indexable text per page. The public marketing page or the free preview lesson benefits most. Drop a 500-word lesson summary plus the full transcript on the sales page, and Google has something to rank on long-tail queries that match the actual content of the course.

Gated lesson pages inside the member area won't get indexed if they're properly behind login (good). The public preview should always carry its full transcript below the video. This is how creators on Teachable and Thinkific can pick up organic traffic for queries like "how to deglaze a pan technique" that they'd otherwise lose to YouTube.

One caveat: don't dump raw ASR output. A WER of 7.88% on clean studio audio is roughly one error every 13 words — visible enough that a careful reader notices. Run a 15-minute cleanup pass and break the text up with H2/H3 headers and bullet lists before publishing. The SRT/VTT under the video can stay closer to raw because viewers tolerate caption imperfection more than body-text typos.

Multi-language captions: cost vs reach

We transcribe in 99 languages at one price — every language, same rate — so the source-language transcript costs the same whether the lesson is in English or Japanese.

We don't auto-translate English audio into a Spanish transcript. The working pattern is: transcribe in the source language, export the SRT, run that SRT through an SRT-aware translator or a GPT-4-class workflow that preserves timecodes, then have a human pass over terminology. Upload both tracks; the player handles language selection where multiple tracks are supported.

Whether to translate at all depends on the course topic:

  • Technical / English-dominant fields (SaaS marketing, US tax law, AWS certification): English captions often cover most of the paying audience. Adding Spanish may not move revenue.
  • Universally relevant topics (fitness, language learning, design fundamentals, parenting): a Spanish or Portuguese track can lift conversions from LATAM audiences meaningfully. Worth piloting on top modules.
  • Locale-specific content (US bar exam prep, UK GCSE): don't bother translating.

Look at your analytics before picking a second language — countries sending paid or organic traffic, refund notes mentioning comprehension, YouTube comments asking for captions. Ship English captions on every module first, translate your top-2 most-watched into one second language, measure for 60 days. Translating a 20-module course into 5 languages up-front is how creators can burn $3,000+ for marginal return.

The workflow: record once, transcribe, export twice

You record the lesson once. The transcript splits into two artifacts with different cleanup requirements.

  1. Upload the video file (MP4, MOV) or pull audio out with ffmpeg. We accept files up to 2 GB on Pro and 5 GB on Business (as of May 2026), 10 hours max per file. A 60-minute 1080p lesson often runs roughly 500 MB–2 GB depending on bitrate.
  2. Run transcription. AssemblyAI Universal-