Upload the video file
Drag & drop MP4, MOV, MKV, WMV, AVI or WEBM — up to 1 GB and 3 hours. Audio is extracted automatically.
Drop an MP4, MOV, or AVI (up to 1 GB) — transcribe video to text with speaker labels and a frame-aligned SRT for Premiere, DaVinci, or Final Cut.
Upload the video file, let the AI work on the audio track, then tune the export.
Drag & drop MP4, MOV, MKV, WMV, AVI or WEBM — up to 1 GB and 3 hours. Audio is extracted automatically.
The model converts speech to text with 95–98% accuracy, timestamps every line, and assigns speaker labels.
Set paragraph length for editorial review and frame-aligned phrase timestamps for the timeline, then drop the SRT into Premiere or DaVinci or send DOCX to a translator.
Video files often carry confidential interviews, depositions, medical sessions, or internal recordings. Here's exactly how we handle the upload.
1 GB MP4 uploads ride the same TLS pipe banks use for transactions. The audio extraction and transcript fetch all stay on HTTPS — no plaintext anywhere on the wire.
Your MP4 plus the derived transcript are scheduled for deletion 3 days after upload. Hit Delete on the project page and both vanish on the next sweep.
Your video footage doesn't feed any model. We extract audio, transcribe, and discard — no archival of footage, no inclusion in training sets, no behavioural fingerprinting.
EU clients have full data rights on every upload. Each video transcript lives at an unguessable URL accessible only from the project's owning account.
Your video content stays private, encrypted, and entirely under your control.
The audio gets extracted, transcribed, and segmented to caption-friendly line lengths — your video editor reads the SRT directly.
Any common container — MP4, MOV, MKV, WMV, AVI, WEBM, QT. Up to 1 GB and 3 hours per file. Resolution doesn't matter — we only read the audio track.
We strip the audio (no re-encode of the picture), run it through the model with 95–98% accuracy, label up to 8 speakers, and generate timestamps tied to the original clip's timeline.
Lines pre-segmented to caption-friendly length (≤ 42 chars). Frame-aligned timestamps in HH:MM:SS,ms SRT or HH:MM:SS.ms VTT — both round to your edit's frame rate without drift.
Frame-rate aware — works with 24, 25, 29.97, and 30 fps timelines without timestamp drift.
Every common video container — every common transcript format on the way out.
MP4MOVMKVWMVAVIWEBMQT
up to 1 GBup to 3 hoursany resolution
TXTDOCXPDFSRTVTTCSVClipboard
Most video transcript generators dump a single wall of text. Ours emits frame-aligned SRT and VTT for any NLE plus paragraph-tuned DOCX for editorial review.
The biggest win for video. Paragraph timestamps for editorial cut points; phrase timestamps frame-aligned to the picture for line-by-line conforming. Both when scripting; off for translator hand-off.
Paragraphs → cut pointsPhrases → frame-alignedBothOffCritical for documentary interviews and panel footage. Auto-labelled by voice, then rename per shot in the editor — Director, Subject A, Subject B — or merge consecutive turns when one speaker dominates a take.
Speaker namesMerge by speakerHideAuto-detect or fix every paragraph at 1, 2, 3, 4, or 8 lines. Tight rhythm for caption drafts; longer paragraphs for editorial review or for handing the transcript to a translator.
Auto1 line2 lines3 lines4 lines8 linesOne toggle strips timestamps, labels, and formatting. The result drops cleanly into translation memory tools (Trados, MemoQ), a scriptwriter's draft, or an AI summarizer.
Plain text modeTwo more controls — pause-threshold breaks and one-click clipboard — round out the panel. See all 6 on the hub.
These languages return reliably across documentary, interview, conference, and field-recorded video. Auto-detect picks the right one; multilingual cuts work too.
Field-recorded multilingual interview? Run a short clip on the free tier before committing the full footage.
One uploaded MP4 — every downstream workflow that needs the words on screen.
Drop the Zoom or Teams recording, get a searchable transcript with speaker labels — perfect for action items and minutes.
Transcribe recorded lectures, seminars, and on-demand courses into study notes — with timestamps for quick reference.
A video transcriber that extracts dialogue and B-roll narration from interviews and documentaries — straight from MP4 to text on the editorial timeline.
Transcribe depositions, hearings, and recorded testimony with timestamps — line-citable, audit-ready.
The video transcript generator converts qualitative research video sessions into coding-ready transcripts — speakers separated, ready for analysis tools.
Generate caption-ready SRT/VTT for any video player — Premiere, DaVinci, Final Cut, YouTube Studio.
Free video transcription on every account — test the engine on your own footage before committing. No credit card. Top up only when you need more minutes.
The questions we hear most from people transcribing video files — answered straight.