Cross-referencing becomes a search, not a listen.** When witness 17 says "I told Marcus about the Q3 numbers in October," you can grep the other 30 transcripts for "Marcus" and "Q3" in seconds. Manual review of a 60-minute interview to find a single name takes 20–40 minutes even with timestamped notes.

Drafting the interview memo speeds up roughly 3-4x. Investigators paste relevant transcript blocks directly into the work-product memo and annotate, instead of typing from memory or rewinding audio. The memo itself remains counsel's privileged/work-product document when prepared for legal advice or litigation — the transcript is a tool used to create it.

Discrepancy hunting works. When two witnesses describe the same meeting differently, you put the two transcript excerpts side by side. Without transcripts, you're triangulating between two sets of handwritten notes.

The realistic gain on a 30-interview matter: 80–120 hours of review time, freeing the lead investigator to do analysis. Teams that run an eDiscovery stack (Relativity, Everlaw, Logikcull) typically ingest the reviewed transcript downstream as searchable text; we sit upstream of that.

Accuracy: what AI gets right, and what it doesn't

We run AssemblyAI Universal-3 in production. On clean interview audio — quiet conference room, good lapel or USB mic, 16 kHz or higher — word error rate has benchmarked around 7.88%, which is roughly 92% word-level accuracy. On telephony audio at 8 kHz (witness called in from a cell phone, conference bridge over a bad line), WER can rise to about 17.7%. Whisper Large-v3 sits behind that as a fallback for transient errors only.

Practically: about 1 in 12 words is wrong on good audio, about 1 in 6 on bad audio. The errors cluster around:

Proper nouns — names of people, internal product codenames, locations
Acronyms and ticker symbols
Numbers and dates spoken quickly
Overlapping speech (two people interrupting each other)
Mumbling, throat-clearing, off-mic asides

The fix is a human pass. For an investigation, plan on 0.5–1.5 hours of cleanup per hour of audio if the transcript will be quoted in a memo or produced. That's still a 5–10x speedup over manual transcription, which runs 4–6 hours per audio hour.

What AI transcription is not: a certified record. If the matter goes to litigation and you need a certified transcript, commission a certified court reporter or a qualified certified transcription/electronic reporting service, such as an AAERT-certified CER/CET workflow, to produce a certified transcript. The AI transcript is the working draft your team uses to navigate the audio and write privileged memos.

Speaker labels for multi-party interviews

Most investigation interviews have two to four people in the room: the investigator, sometimes second-chair counsel, the witness, and occasionally the witness's personal attorney or a union rep. Speaker diarization — labeling who said what — is critical for review.

We handle two modes:

Stereo recordings (channel-split diarization): if the audio has the investigator on one channel and the witness on the other (common with two-mic setups or separate-track Zoom recordings), labeling is usually near-perfect. Each channel is a known speaker.

Mono recordings (pyannote.audio 3.1): a single mic captures everyone. The model clusters speakers by voice. Realistic performance on investigation audio:

2 speakers, clean room: 95%+ correct attribution
3-4 speakers, clean room: 85–92%
4+ speakers with overlap: drops fast — expect to relabel manually

Labels come out as "Speaker A," "Speaker B." You assign names after the fact. We do not auto-identify speakers from a voiceprint database — there's no enrollment step, and we wouldn't want one for a privileged interview anyway.

For sensitive HR matters where the complainant's identity needs to be protected in early review, leave the speaker labels generic and substitute names only in the final memo.

Try it on your audio

Start free →

30 minutes a month, no card.

Cross-border investigations and language

FCPA matters, global HR probes, and multinational compliance reviews routinely cross five or six languages in a single investigation. We support 99 languages at the same price — the marketing line is one price, every language — so the Portuguese interview from the São Paulo office costs the same per minute as the English one from New York.

Two cautions. First, transcription is not legal translation. If a foreign-language transcript will be quoted in a regulator submission, board memo, or disciplinary action, commission a qualified human translator for those passages and keep the original-language and translated versions paired.

Second, build a glossary before you transcribe. Company acronyms, product codenames, regional job titles, and entity names create false mismatches across transcripts when the model guesses spelling. Spell critical terms on the record during the interview, then sweep the transcripts with find-and-replace.

Cross-border investigations also pull in works-council notification rules, GDPR employee-data obligations, including Member State rules under Article 88, and local recording-consent law — none of which we adjudicate. Confirm with local counsel before the recorder turns on.

Production to regulators: when the transcript leaves your hands

DOJ, SEC, EEOC, OFAC, FINRA — any of these regulators or enforcement bodies might request interview materials in a parallel proceeding. What you produce, and in what form, depends on the request.

Many regulator requests target documents, recordings, or notes, not necessarily transcripts. If audio is requested and nonprivileged, you may need to produce the audio. The transcript may be optional and may be protected as work product, depending on who prepared it and why.

Export is not production. Generating a transcript export from our pipeline does not mean it's ready to hand to a regulator. A raw AI transcript may contain attorney mental impressions in the questioning, off-the-record asides, and recognition errors on key terms. Counsel decides what gets produced, in what form, with what redactions and privilege log.

If you produce a transcript voluntarily — for example, in a presentation to the DOJ Criminal Division to support a declination — it should be either a certified transcript or a transcript clearly labeled as a working draft prepared by counsel. Producing an AI-generated transcript without that caveat invites disputes about accuracy.

Preserve original recordings. Once an investigation or enforcement matter is reasonably anticipated, legal-hold and spoliation rules generally require preservation of relevant originals. Don't delete the recording after transcribing. Store both. Original WAV/MP3 plus the transcript file (JSON, SRT, DOCX) is the

Compliance interview transcription for internal investigations

Accuracy: what AI gets right, and what it doesn't

Speaker labels for multi-party interviews

Cross-border investigations and language

Production to regulators: when the transcript leaves your hands