Live Transcription - Words on the Screen as You Speak
Chapter 27: Live Transcription — Words on the Screen as You Speak
Think of a court reporter typing as the session unfolds—every word captured the moment it's spoken, no waiting until after the meeting ends. That's exactly what SeaMeet's live transcription does for your recordings. While you're talking, the transcript panel fills up in real time: speaker labels, timestamps, and the actual words, all appearing as the conversation happens.
No waiting. No upload step. Just words on the screen.
Chapter Objectives
After reading this chapter, you will be able to:
- Understand what live transcription does and when to use it
- Set up the prerequisites before starting
- Start a recording session with live transcription active
- Read and interpret the transcript panel while recording
- Understand how automatic speaker detection works
- Troubleshoot the most common connection and display issues
What Is Live Transcription?
Live transcription converts the audio from your recording into text while you record, producing a timestamped, speaker-labelled transcript in real time.
Think of it like this: Imagine a typist sitting beside you in every meeting, instantly writing down everything said—labelling each person's words and noting the exact time they spoke. That transcript is available the moment the meeting ends. No transcription delay. No "processing your audio" spinner.
Live transcription runs alongside your recording session. The moment you start recording:
- An AI engine begins listening
- Words appear in the Transcript panel within seconds of being spoken
- Speaker labels ("Speaker 1", "Speaker 2") are assigned automatically
- Timestamps mark where in the recording each segment falls
When you stop recording, the complete transcript is saved automatically alongside the audio/video file.
Before You Begin
Live transcription requires two things to be configured before your first session:
1. AI Features Enabled
- Open Settings (gear icon ⚙️ in the top-right corner)
- Navigate to the AI category
- Confirm the AI Features toggle is on (blue)
If the toggle is grey or the AI category is missing, contact your account administrator—AI features may require an active subscription.
2. API Key Configured
Still in Settings → AI:
- Look for the API Key field
- Enter your Gemini API key (see Chapter 31 for how to obtain one)
- Click Save
A green checkmark confirms the key is valid. A red warning means the key is incorrect or has expired.
Note: You need an active internet connection during the recording. Live transcription cannot run offline.
How to Start a Live Transcription Session
Starting live transcription is identical to starting any recording—there is no separate "transcription mode" to enable. If AI Features are on and an API key is configured, live transcription activates automatically.
Step-by-step:
-
Click the red record button 🔴 (or use your keyboard shortcut:
Ctrl+Alt+Aon Windows,Cmd+Shift+Aon macOS)- What you see: The button pulses red. The recording timer starts counting up.
-
Watch the Transcript panel appear
- What you see: A panel slides into view on the right side of the main window (or below the player, depending on your layout). It shows "Connecting…" briefly.
-
Speak normally
- What you see: After 2–5 seconds, text begins appearing. The most recent phrase shows a subtle animation while it's still being processed.
-
Continue your meeting or recording as usual
- What you see: Completed segments stack up chronologically, each tagged with a speaker label and a timestamp.
-
Stop recording when you're done
- What you see: The button returns to its idle state. A "Saving transcript…" notice flashes briefly, then disappears. The transcript is stored.
What You See While Recording
The transcript panel has three main areas:
┌─────────────────────────────────────────────┐
│ Transcript 🟢 Connected │
├─────────────────────────────────────────────┤
│ Speaker 1 0:00:12 │
│ "Good morning everyone, let's get started" │
│ │
│ Speaker 2 0:00:24 │
│ "Thanks for joining on short notice" │
│ │
│ Speaker 1 0:00:31 │
│ "Of course. First item on the agenda…" │
├─────────────────────────────────────────────┤
│ Now Speaking… ████████░░░░ │
│ "…is the Q3 budget review" │
└─────────────────────────────────────────────┘
What each element means:
| Element | Meaning |
|---|---|
| Speaker label | Who is speaking — assigned automatically ("Speaker 1", "Speaker 2") |
| Timestamp | When in the recording this segment starts (hours:minutes:seconds) |
| Completed text | Finalised words — these do not change |
| "Now Speaking…" preview | The current utterance still being processed — may change slightly |
| Status indicator | 🟢 Connected · 🟡 Connecting · 🔴 Error |
Connection Status Indicator
The indicator in the top-right corner of the panel tells you whether the AI engine is reachable:
- 🟢 Connected — Transcription is running normally
- 🟡 Connecting — Establishing connection (normal at startup, takes 2–5 seconds)
- 🔴 Error — Connection lost (see Troubleshooting below)
If you see 🔴 Error, the recording itself continues safely—only the live transcription is affected.
Automatic Speaker Detection
The AI engine attempts to distinguish between different voices and assign each a label.
How it works:
Recording timeline:
0:00 ──────────────────────────────────────────────────► time
│ │ │ │
Speaker 1 Speaker 2 Speaker 1 Speaker 2
"Morning" "Hello" "Agenda…" "Agreed"
▼ ▼ ▼ ▼
[Seg. 1] [Seg. 2] [Seg. 3] [Seg. 4]
Each time the speaker changes, the system creates a new segment. Segments from the same speaker get the same label.
Initial labels: The first speaker to talk is "Speaker 1", the second new voice is "Speaker 2", and so on. These are placeholders—you can rename them later (see Chapter 29).
Speaker refinement: As the recording progresses, the AI may refine earlier assignments if it becomes confident that two segments belong to the same voice. This is normal. Text does not change—only the speaker attribution on past segments.
Tip: For the most accurate speaker separation, use headphones rather than speakers. Speaker output picked up by your microphone can confuse the detector.
After the Recording Stops
When you click stop:
- The "Now Speaking…" preview finalises any in-progress sentence
- The complete transcript is saved alongside your recording file automatically
- No manual action is required
Where to find the transcript:
- Open the recording in your Recording Library
- Click AI Insights in the detail panel
- Select the Transcript tab
The transcript is also available for export as SRT (subtitle format) or JSON from the AI Insights tab. See Chapter 28 for export details.
Limitations
Understanding these limitations helps set realistic expectations:
| Limitation | Detail |
|---|---|
| Requires internet | Live transcription cannot run offline. The audio is processed by an AI engine over the network. |
| Timestamp accuracy | Timestamps are approximate (±3 seconds). Use them for navigation, not legal documentation. |
| Pauses in recording | If you pause the recording, transcription also pauses. Paused segments are not transcribed. |
| Accuracy varies | Accuracy is highest with clear speech, one speaker at a time, and a good microphone. Heavy accents, background noise, or cross-talk reduce accuracy. |
| Language | Transcription language can be set to Auto Detect (recommended) or a specific language in Settings → AI → SeaMeet Integration. Auto Detect handles multilingual meetings automatically. |
| No real-time editing | You cannot edit the transcript while recording. Editing is available after the recording stops. |
Caption Overlay During Playback
When you play back a recording that has a live transcript, SeaMeet can display captions directly on the video — like closed captions on a TV.
How captions work:
- Caption text is overlaid on the video preview at the bottom of the frame
- Each segment shows the speaker name (colour-coded per speaker) and the spoken text
- Captions are synced to the playback position — they advance as the recording plays
- Captions automatically use the Gemini Live transcript from the session
Speaker colours: Each speaker is assigned a consistent colour across all captions and transcript panels. The colours are determined automatically and remain consistent throughout the recording.
Caption format:
[Speaker 1]: Good morning everyone, let's get started.
Captions appear and disappear as the matching transcript segment plays.
Two-Column Video Layout
When watching a video recording with a live transcript available, SeaMeet uses a two-column layout:
┌─────────────────────────────────────────────────────┐
│ Video Preview │ Transcript Panel │
│ │ │
│ [video with captions] │ Speaker 1 0:00:12 │
│ │ "Good morning..." │
│ │ │
│ │ Speaker 2 0:00:24 │
│ │ "Thanks for joining" │
│ │ [⤢ Max] │
└─────────────────────────────────────────────────────┘
- Left column: Fixed-width video with caption overlay
- Right column: Scrolling transcript panel, synced to playback position
- Maximize button (⤢): Expands the transcript panel to full-screen overlay for easier reading during long recordings
The two-column layout only appears for video recordings with live transcripts. Audio-only recordings and recordings without transcripts use the standard single-column layout.
Language Settings for Transcription
You can configure which language SeaMeet expects during live transcription:
- Open Settings (⚙️)
- Navigate to AI → SeaMeet Integration
- Find the Meeting Language selector
- Choose your language:
- Auto Detect (default, recommended) — SeaMeet automatically identifies the spoken language. Best for multilingual meetings or when language varies.
- Manual selection — Choose from 20+ specific languages including English (US/UK), Spanish, French, German, Japanese, Mandarin, Cantonese, Korean, and more.
Tip: Leave language set to Auto Detect unless you have a specific reason to force a language. Auto detection handles accents and mixed-language meetings better than a manually forced setting.
Troubleshooting
"Transcript panel not appearing"
Symptom: You start recording but the transcript panel never shows.
Check these in order:
- Go to Settings → AI and confirm the AI Features toggle is on
- Confirm your API key is valid (green checkmark in Settings → AI)
- Check your internet connection — try loading a web page
- Restart SeaMeet and try again
If the panel still doesn't appear after all four steps, the AI service may be temporarily unavailable. The recording itself is unaffected—try again later.
"Connection dropped mid-recording"
Symptom: The status indicator turns 🔴 red during a recording.
What happened: The connection to the AI engine was interrupted. This can happen due to:
- Temporary network interruption
- Wi-Fi switching access points
- The AI service briefly going offline
What to do:
- Don't stop the recording—it continues safely
- Check your internet connection
- The connection usually recovers automatically within 30 seconds
- Words spoken during the disconnection period are not recovered—they are lost for the live transcript (but the audio remains in the recording file, so you can run AI Extraction after the fact — see Chapter 28)
"Speakers not labelled correctly"
Symptom: Multiple people are labelled as "Speaker 1", or one person appears as two different speakers.
What's happening: Speaker detection uses voice characteristics. Accuracy drops when:
- Multiple people talk at the same time
- A speaker's voice changes significantly (laughing, raised voice, poor audio)
- Background noise interferes
What to do:
- After the recording, rename speakers in the Speakers panel (see Chapter 29)
- Use the Merge feature to combine two labels that belong to the same person (Chapter 29)
Best Practices
Follow these practices for the best live transcription results:
One speaker at a time Cross-talk (two people speaking simultaneously) confuses speaker detection and produces garbled text in the transcript. Encourage participants to take turns.
Quiet recording environment Background noise—HVAC systems, typing, street noise—is picked up by the microphone and reduces transcription accuracy. A headset microphone placed close to the mouth gives far better results than a built-in laptop microphone.
Good microphone placement For in-person meetings with multiple participants, position a microphone near the centre of the table, or use individual microphones for each participant.
Stable internet connection Use a wired connection or a strong Wi-Fi signal. Avoid hotspots or networks with high packet loss—they cause connection drops.
Rename speakers promptly Do speaker renaming immediately after the recording while you remember who said what. See Chapter 29 for instructions.
Quick Reference
┌────────────────────────────────────────────────────────────┐
│ LIVE TRANSCRIPTION │
│ Quick Reference │
├────────────────────────────────────────────────────────────┤
│ Start │ Record normally — auto-activates │
│ Status: green │ 🟢 Transcription running │
│ Status: yellow │ 🟡 Connecting (wait 5 s) │
│ Status: red │ 🔴 Disconnected — recording safe │
├────────────────────────────────────────────────────────────┤
│ Transcript panel │ Right side of main window │
│ Preview line │ "Now Speaking…" — in progress │
│ Completed lines │ Final — won't change │
├────────────────────────────────────────────────────────────┤
│ After stopping │ Transcript saved automatically │
│ Find it │ Recording → AI Insights → Transcript │
├────────────────────────────────────────────────────────────┤
│ Requires │ Internet + AI Features on + API key │
│ Timestamps │ Approximate ±3 seconds │
│ Pauses │ Not transcribed │
└────────────────────────────────────────────────────────────┘
Last updated: 2026-03-20
← Chapter 26: Glossary of Terms | Chapter 28: AI Extraction →
Published: