:
- What phrases do my clients use right before they avoid a hard conversation?
- Which commitments are too vague to survive the week?
- What language shows up in the three sessions before a client is ready for a group program?
- What objections recur before a client invests in a longer engagement?
- Where do clients consistently confuse strategy with permission?
What comes back is the structure of your next group program. The chapter titles of your book. The five modules of a $2,000 cohort course.
Anonymize before you aggregate
Strip names, employers, identifying project details, and specific financial figures before any cross-client analysis. A find-and-replace pass before the data leaves your laptop is the minimum. Many coaches keep two copies — the raw transcript locked per-client, an anonymized copy in the program-development corpus.
Client insight is not automatically coach IP just because it happened in a paid session. Anonymization and permission are what make it ethically usable in a public artifact.
Where AI still misses the call
A transcript is the words. A coaching session is the words plus everything around them. The transcript will not catch:
- The seven-second pause before the client said "I'm fine with that"
- The shift in pacing when they got to the real topic 40 minutes in
- The sigh, the throat-clear, the sudden brightness when their kid's name came up
- Sarcasm. AI is unreliable at sarcasm
Standard speech-to-text strips paralinguistics by design. GPT-4o, released May 2024, introduced native, real-time multimodal audio that can process tone and pacing in principle — but enterprise compliance for that specific API is still rolling out and few coaching tools have adopted it yet. Treat the transcript as the lyrics, not the song. Your session notes should still capture what you noticed that the words alone won't carry.
Diarization is good, not perfect
Two-person coaching audio is the easiest case for "who said what." State-of-the-art diarization sits around 5-8% Diarization Error Rate on clean two-person conversational audio per recent SpeechBrain/Hugging Face benchmarks. Cross-talk degrades it sharply.
If you record with separate audio tracks per participant — for example, Zoom's separate audio file setting — our channel-split diarization is much more reliable because the model doesn't have to guess who's speaking, the channels say. For mono recordings we use pyannote/speaker-diarization-3.1, which is generally strongest with fewer speakers and degrades as speaker count and cross-talk increase.
Privacy, consent, and the three-way problem
Recording changes the contract. Even with verbal consent at the start of every call, you've now created a durable artifact that didn't exist before — and the client's relationship to that artifact is different from their relationship to your memory.
Your written consent should answer five questions:
- Are sessions recorded?
- Are recordings transcribed by an AI vendor, and which one?
- Who can access the recording, transcript, and summary?
- How long are they retained, and how does the client request deletion?
- Can anonymized excerpts or themes be used for program development, supervision, writing, or research?
Treat these as separate permissions. Consent to be coached is not consent to be recorded. Consent to receive a recap is not consent to have an anonymized transcript analyzed for a future course.
If a company sponsors the coaching, the three-way agreement matters more than the technology. An executive coaching engagement typically has a coach, a client, and a sponsor who pays. A transcript-driven recap can accidentally cross that boundary if it contains details the client never agreed to share upward. Default: client-facing notes go to the client, not the sponsor, unless the contract explicitly says otherwise.
"Deleting the transcript" doesn't always delete the data
Hitting delete in a vendor's UI does not always remove derived data. Unless the vendor offers contracted Zero Data Retention — common in enterprise APIs, rare in consumer apps — audio and text may be retained for model training, abuse review, or backup. Read the data processing addendum before you onboard a tool. Some free or consumer tiers of transcription products reserve training rights.
HIPAA is a separate question
If you're a licensed clinical practitioner doing therapy as a HIPAA-covered entity — not coaching — HIPAA applies and you need a signed Business Associate Agreement (BAA) with your transcription vendor. "Enterprise-grade" does not mean HIPAA-covered. Otter.ai requires its Enterprise plan to sign a BAA. Fireflies.ai requires the Enterprise tier at $39/user/month billed annually (as of May 2026). Standard ChatGPT is not HIPAA compliant and will not sign a BAA on consumer tiers.
We handle data with security controls at rest, but we are not BAA-covered yet. If you're operating as a therapist, use a vendor that is — Upheal ($69-$99/month) and Freed.ai ($99/month unlimited SOAP notes) are purpose-built for clinical workflows; Nabla Copilot is $119/month after 30 free encounters (pricing as of May 2026).
Pure coaching — non-clinical, ICF-credentialed work — doesn't itself require HIPAA. But the confidentiality bar from the ICF Code of Ethics is still high.
What ICF ethics actually permit
The ICF Code of Ethics doesn't ban recording or AI-assisted note generation. It requires informed consent for any recording or third-party processing, clear ownership and access rules, confidentiality "to the highest level appropriate," and disclosure when data leaves the coach-client dyad — including to a transcription vendor.
The American Psychological Association's February 2024 guidance is stricter and worth reading even if you're not licensed: clinicians retain full responsibility for AI-generated notes and must verify summaries against their own judgment. Translate that to coaching: the AI draft is a draft. You sign off, not the model.
If you're working toward MCC/PCC submission, the recording requirement is separate from any AI processing — the AI transcript is not a substitute for the audio file itself.
What we ship — and what we don't
For coaching workflows specifically:
- Audio and video upload through our audio-to-text pipeline; primary ASR is AssemblyAI Universal-3 with Whisper Large-v3 as a fallback for transient errors.
- 99 languages at one price — no language tiers.
- Meeting capture for Google Meet, Zoom, and Microsoft Teams via Recall.ai. The bot appears in the participant list under a configurable name, posts a recording disclosure in chat on join, and respects an opt-out endpoint at
/opt-out/{token}. See the meeting-notes workflow for