Chapter 27: Live Transcription — Words on the Screen as You Speak

Think of a court reporter typing as the session unfolds—every word captured the moment it's spoken, no waiting until after the meeting ends. That's exactly what SeaMeet's live transcription does for your recordings. While you're talking, the transcript panel fills up in real time: speaker labels, timestamps, and the actual words, all appearing as the conversation happens.

No waiting. No upload step. Just words on the screen.

Chapter Objectives

After reading this chapter, you will be able to:

Understand what live transcription does and when to use it
Set up the prerequisites before starting
Start a recording session with live transcription active
Read and interpret the transcript panel while recording
Understand how automatic speaker detection works
Troubleshoot the most common connection and display issues

What Is Live Transcription?

Live transcription converts the audio from your recording into text while you record, producing a timestamped, speaker-labelled transcript in real time.

Think of it like this: Imagine a typist sitting beside you in every meeting, instantly writing down everything said—labelling each person's words and noting the exact time they spoke. That transcript is available the moment the meeting ends. No transcription delay. No "processing your audio" spinner.

Live transcription runs alongside your recording session. The moment you start recording:

An AI engine begins listening
Words appear in the Transcript panel within seconds of being spoken
Speaker labels ("Speaker 1", "Speaker 2") are assigned automatically
Timestamps mark where in the recording each segment falls

When you stop recording, the complete transcript is saved automatically alongside the audio/video file.

Before You Begin

Live transcription requires two things to be configured before your first session:

1. AI Features Enabled

Open Settings (gear icon ⚙️ in the top-right corner)
Navigate to the AI category
Confirm the AI Features toggle is on (blue)

If the toggle is grey or the AI category is missing, contact your account administrator—AI features may require an active subscription.

2. API Key Configured

Still in Settings → AI:

Look for the API Key field
Enter your Gemini API key (see Chapter 31 for how to obtain one)
Click Save

A green checkmark confirms the key is valid. A red warning means the key is incorrect or has expired.

Note: You need an active internet connection during the recording. Live transcription cannot run offline.

How to Start a Live Transcription Session

Starting live transcription is identical to starting any recording—there is no separate "transcription mode" to enable. If AI Features are on and an API key is configured, live transcription activates automatically.

Step-by-step:

Click the red record button 🔴 (or use your keyboard shortcut: Ctrl+Alt+A on Windows, Cmd+Shift+A on macOS)
- What you see: The button pulses red. The recording timer starts counting up.
Watch the Transcript panel appear
- What you see: A panel slides into view on the right side of the main window (or below the player, depending on your layout). It shows "Connecting…" briefly.
Speak normally
- What you see: After 2–5 seconds, text begins appearing. The most recent phrase shows a subtle animation while it's still being processed.
Continue your meeting or recording as usual
- What you see: Completed segments stack up chronologically, each tagged with a speaker label and a timestamp.
Stop recording when you're done
- What you see: The button returns to its idle state. A "Saving transcript…" notice flashes briefly, then disappears. The transcript is stored.

What You See While Recording

The transcript panel has three main areas:

┌─────────────────────────────────────────────┐
│  Transcript                    🟢 Connected  │
├─────────────────────────────────────────────┤
│  Speaker 1   0:00:12                        │
│  "Good morning everyone, let's get started" │
│                                             │
│  Speaker 2   0:00:24                        │
│  "Thanks for joining on short notice"       │
│                                             │
│  Speaker 1   0:00:31                        │
│  "Of course. First item on the agenda…"    │
├─────────────────────────────────────────────┤
│  Now Speaking…  ████████░░░░                │
│  "…is the Q3 budget review"                 │
└─────────────────────────────────────────────┘

What each element means:

Element	Meaning
Speaker label	Who is speaking — assigned automatically ("Speaker 1", "Speaker 2")
Timestamp	When in the recording this segment starts (hours:minutes:seconds)
Completed text	Finalised words — these do not change
"Now Speaking…" preview	The current utterance still being processed — may change slightly
Status indicator	🟢 Connected · 🟡 Connecting · 🔴 Error

Connection Status Indicator

The indicator in the top-right corner of the panel tells you whether the AI engine is reachable:

🟢 Connected — Transcription is running normally
🟡 Connecting — Establishing connection (normal at startup, takes 2–5 seconds)
🔴 Error — Connection lost (see Troubleshooting below)

If you see 🔴 Error, the recording itself continues safely—only the live transcription is affected.

Automatic Speaker Detection

The AI engine attempts to distinguish between different voices and assign each a label.

How it works:

Recording timeline:

0:00 ──────────────────────────────────────────────────► time
        │           │           │           │
      Speaker 1   Speaker 2   Speaker 1   Speaker 2
      "Morning"   "Hello"     "Agenda…"   "Agreed"
          ▼           ▼           ▼           ▼
      [Seg. 1]    [Seg. 2]    [Seg. 3]    [Seg. 4]

Each time the speaker changes, the system creates a new segment. Segments from the same speaker get the same label.

Initial labels: The first speaker to talk is "Speaker 1", the second new voice is "Speaker 2", and so on. These are placeholders—you can rename them later (see Chapter 29).

Speaker refinement: As the recording progresses, the AI may refine earlier assignments if it becomes confident that two segments belong to the same voice. This is normal. Text does not change—only the speaker attribution on past segments.

Tip: For the most accurate speaker separation, use headphones rather than speakers. Speaker output picked up by your microphone can confuse the detector.

After the Recording Stops

When you click stop:

The "Now Speaking…" preview finalises any in-progress sentence
The complete transcript is saved alongside your recording file automatically
No manual action is required

Where to find the transcript:

Open the recording in your Recording Library
Click AI Insights in the detail panel
Select the Transcript tab

The transcript is also available for export as SRT (subtitle format) or JSON from the AI Insights tab. See Chapter 28 for export details.

Limitations

Understanding these limitations helps set realistic expectations:

Limitation	Detail
Requires internet	Live transcription cannot run offline. The audio is processed by an AI engine over the network.
Timestamp accuracy	Timestamps are approximate (±3 seconds). Use them for navigation, not legal documentation.
Pauses in recording	If you pause the recording, transcription also pauses. Paused segments are not transcribed.
Accuracy varies	Accuracy is highest with clear speech, one speaker at a time, and a good microphone. Heavy accents, background noise, or cross-talk reduce accuracy.
Language	Transcription language can be set to Auto Detect (recommended) or a specific language in Settings → AI → SeaMeet Integration. Auto Detect handles multilingual meetings automatically.
No real-time editing	You cannot edit the transcript while recording. Editing is available after the recording stops.

Caption Overlay During Playback

When you play back a recording that has a live transcript, SeaMeet can display captions directly on the video — like closed captions on a TV.

How captions work:

Caption text is overlaid on the video preview at the bottom of the frame
Each segment shows the speaker name (colour-coded per speaker) and the spoken text
Captions are synced to the playback position — they advance as the recording plays
Captions automatically use the Gemini Live transcript from the session

Speaker colours: Each speaker is assigned a consistent colour across all captions and transcript panels. The colours are determined automatically and remain consistent throughout the recording.

Caption format:

[Speaker 1]: Good morning everyone, let's get started.

Captions appear and disappear as the matching transcript segment plays.

Two-Column Video Layout

When watching a video recording with a live transcript available, SeaMeet uses a two-column layout:

┌─────────────────────────────────────────────────────┐
│  Video Preview             │  Transcript Panel       │
│                            │                         │
│  [video with captions]     │  Speaker 1   0:00:12   │
│                            │  "Good morning..."     │
│                            │                         │
│                            │  Speaker 2   0:00:24   │
│                            │  "Thanks for joining"  │
│                            │              [⤢ Max]   │
└─────────────────────────────────────────────────────┘

Left column: Fixed-width video with caption overlay
Right column: Scrolling transcript panel, synced to playback position
Maximize button (⤢): Expands the transcript panel to full-screen overlay for easier reading during long recordings

The two-column layout only appears for video recordings with live transcripts. Audio-only recordings and recordings without transcripts use the standard single-column layout.

Language Settings for Transcription

You can configure which language SeaMeet expects during live transcription:

Open Settings (⚙️)
Navigate to AI → SeaMeet Integration
Find the Meeting Language selector
Choose your language:
- Auto Detect (default, recommended) — SeaMeet automatically identifies the spoken language. Best for multilingual meetings or when language varies.
- Manual selection — Choose from 20+ specific languages including English (US/UK), Spanish, French, German, Japanese, Mandarin, Cantonese, Korean, and more.

Tip: Leave language set to Auto Detect unless you have a specific reason to force a language. Auto detection handles accents and mixed-language meetings better than a manually forced setting.

Troubleshooting

"Transcript panel not appearing"

Symptom: You start recording but the transcript panel never shows.

Check these in order:

Go to Settings → AI and confirm the AI Features toggle is on
Confirm your API key is valid (green checkmark in Settings → AI)
Check your internet connection — try loading a web page
Restart SeaMeet and try again

If the panel still doesn't appear after all four steps, the AI service may be temporarily unavailable. The recording itself is unaffected—try again later.

"Connection dropped mid-recording"

Symptom: The status indicator turns 🔴 red during a recording.

What happened: The connection to the AI engine was interrupted. This can happen due to:

Temporary network interruption
Wi-Fi switching access points
The AI service briefly going offline

What to do:

Don't stop the recording—it continues safely
Check your internet connection
The connection usually recovers automatically within 30 seconds
Words spoken during the disconnection period are not recovered—they are lost for the live transcript (but the audio remains in the recording file, so you can run AI Extraction after the fact — see Chapter 28)

"Speakers not labelled correctly"

Symptom: Multiple people are labelled as "Speaker 1", or one person appears as two different speakers.

What's happening: Speaker detection uses voice characteristics. Accuracy drops when:

Multiple people talk at the same time
A speaker's voice changes significantly (laughing, raised voice, poor audio)
Background noise interferes

What to do:

After the recording, rename speakers in the Speakers panel (see Chapter 29)
Use the Merge feature to combine two labels that belong to the same person (Chapter 29)

Best Practices

Follow these practices for the best live transcription results:

One speaker at a time Cross-talk (two people speaking simultaneously) confuses speaker detection and produces garbled text in the transcript. Encourage participants to take turns.

Quiet recording environment Background noise—HVAC systems, typing, street noise—is picked up by the microphone and reduces transcription accuracy. A headset microphone placed close to the mouth gives far better results than a built-in laptop microphone.

Good microphone placement For in-person meetings with multiple participants, position a microphone near the centre of the table, or use individual microphones for each participant.

Stable internet connection Use a wired connection or a strong Wi-Fi signal. Avoid hotspots or networks with high packet loss—they cause connection drops.

Rename speakers promptly Do speaker renaming immediately after the recording while you remember who said what. See Chapter 29 for instructions.

Quick Reference

┌────────────────────────────────────────────────────────────┐
│                  LIVE TRANSCRIPTION                        │
│                   Quick Reference                          │
├────────────────────────────────────────────────────────────┤
│  Start             │ Record normally — auto-activates      │
│  Status: green     │ 🟢 Transcription running              │
│  Status: yellow    │ 🟡 Connecting (wait 5 s)              │
│  Status: red       │ 🔴 Disconnected — recording safe      │
├────────────────────────────────────────────────────────────┤
│  Transcript panel  │ Right side of main window             │
│  Preview line      │ "Now Speaking…" — in progress         │
│  Completed lines   │ Final — won't change                  │
├────────────────────────────────────────────────────────────┤
│  After stopping    │ Transcript saved automatically        │
│  Find it           │ Recording → AI Insights → Transcript  │
├────────────────────────────────────────────────────────────┤
│  Requires          │ Internet + AI Features on + API key   │
│  Timestamps        │ Approximate ±3 seconds                │
│  Pauses            │ Not transcribed                       │
└────────────────────────────────────────────────────────────┘

Last updated: 2026-03-20

← Chapter 26: Glossary of Terms | Chapter 28: AI Extraction →

Live Transcription - Words on the Screen as You Speak