Lumen's transcription is powered by Scribe v2 — the same engine ElevenLabs uses for real-time captioning. It handles 90+ languages with state-of-the-art accuracy.

Pick the right format

Timestamps

Sentence-level is enough for editing and summarising. Word-level is what you need for karaoke captions, music videos, or alignment work.

Speaker diarisation

Turn this on for interviews, panels and any audio with multiple voices. Lumen labels each speaker (Speaker 1, Speaker 2…) — you can rename them in the project after.

Punctuation

Always on, unless you're processing transcripts further with your own NLP.