Formerly NovaScribe — same team, same product, refreshed name. Read the announcement →

SRT Generator

Generate accurate .srt subtitle files from any audio or video file. AI transcription with word-level timestamps. Edit, sync, and export in minutes.

VexaScribe (formerly NovaScribe) generates SRT subtitle files from any audio or video automatically using OpenAI's Whisper Large-v3 model. Upload an MP3, WAV, MP4, MOV, or 13 other formats up to 5 GB. The AI transcribes with word-level timestamps accurate to the millisecond, then you can review and edit timing in the built-in subtitle editor before downloading the .srt file. Works in 99 languages. A 60-minute video typically completes in 5–10 minutes. Free tier includes 30 minutes; paid plans start at $2/month for 200 minutes. SRT files are universally compatible — upload to YouTube, Vimeo, Premiere Pro, Final Cut Pro, DaVinci Resolve, or any video platform that supports caption files.

30 minutes freeNo credit card99 languagesWord-level timestamps

How to Generate an SRT File

Three steps from upload to finished .srt file. No software to install, works in any browser.

  1. 1

    Upload audio or video

    Drag and drop an MP3, WAV, M4A, MP4, MOV, or any of 17 supported formats. Up to 5 GB and 10 hours per file. We extract audio from video automatically.

  2. 2

    AI transcribes with timestamps

    VexaScribe runs Whisper Large-v3 to transcribe and align each word to the millisecond. Multi-speaker recordings get speaker labels automatically.

  3. 3

    Edit and download .srt

    Review subtitles in the editor, adjust timing if needed, then click Download as SRT. The file is ready for YouTube, Premiere Pro, Final Cut, or DaVinci Resolve.

What Is an SRT File?

SRT (SubRip Subtitle) is a plain-text file with the .srt extension that pairs numbered cues to start/end timecodes and dialogue text, allowing video players to display synchronized captions. The format originated with the SubRip Linux DVD-ripping tool in the late 1990s and has since become the de facto interchange format for captioning.

Every major video platform supports SRT — YouTube, Vimeo, Twitch, TikTok, Instagram, Facebook, plus desktop video editors like Adobe Premiere Pro, Final Cut Pro, DaVinci Resolve, OBS, and Camtasia. YouTube's official help lists SRT alongside SBV, VTT, and TTML as the supported caption upload formats, with UTF-8 encoding required.

Anatomy of an SRT file

Every SRT cue has four parts: a sequential index, a timecode line in HH:MM:SS,mmm --> HH:MM:SS,mmm format, one or more dialogue lines, and a blank line that ends the cue. Here's a 3-cue sample:

1
00:00:01,200 --> 00:00:04,500
Welcome to the VexaScribe SRT generator demo.

2
00:00:04,800 --> 00:00:08,100
Drop a video here, and we'll auto-transcribe it
in any of 99 languages.

3
00:00:08,400 --> 00:00:12,000
You can edit every cue before exporting.
  • Index (line 1): integer starting at 1, incrementing by 1 per cue.
  • Timecode (line 2): zero-padded HH:MM:SS,mmm for both start and end, separated by space-arrow-arrow-space ( --> ). SRT uses a comma decimal separator; VTT uses a period.
  • Dialogue (lines 3-N): one or more lines of subtitle text. Per the Netflix Timed Text Style Guide, keep to 2 lines maximum, ≤42 characters each.
  • Blank line: a single empty line terminates each cue and signals the start of the next one.

SRT vs VTT vs SBV vs SCC vs ASS

Use SRT for the broadest player support, VTT for HTML5 and HLS streaming, SBV for fast YouTube uploads, SCC for U.S. broadcast TV, and ASS when you need styled or positioned subtitles.

FormatDecimal separatorBest forStylingEncoding
SRT
SubRip Text
comma (,)Universal — YouTube, social media, video editorsBasic only (<i>, <b>)UTF-8
VTT
WebVTT (W3C)
period (.)HTML5 <track>, HLS streamingCSS classes, positioning, voice tagsUTF-8
SBV
SubViewer
period (.)Quick YouTube uploads (legacy)NoneUTF-8
SCC
Scenarist Closed Caption
drop-frame timecodeU.S. broadcast TV (CEA-608)Color, positioningASCII
ASS
Advanced SubStation Alpha
period (.)Anime, karaoke, fan-subsFull (font, color, animation)UTF-8

VexaScribe exports SRT, VTT, and TXT on every paid plan. Pick SRT for universal compatibility; VTT when you need HTML5 styling or HLS streaming. The decimal separator difference is the most common source of "my subtitle file doesn't load" errors — see common encoding pitfalls below.

WebVTT (.vtt) sample

WEBVTT

1
00:00:01.200 --> 00:00:04.500
Welcome to the VexaScribe demo.

2
00:00:04.800 --> 00:00:08.100
Drop a video and we'll generate
subtitles in 99 languages.

Note the WEBVTT header line and period decimal separators — required by the W3C WebVTT 1 Recommendation.

SubViewer (.sbv) sample

0:00:01.200,0:00:04.500
Welcome to the VexaScribe demo.

0:00:04.800,0:00:08.100
Drop a video and we'll generate
subtitles in 99 languages.

SBV is a YouTube-specific quick-upload format — comma-separated start/end on a single line, no cue indexes.

Subtitle Timing Standards: CPS, WPM, and Reading Speed

Industry standards limit subtitle reading speed to roughly 17 characters per second for adult content and 13 CPS for children's content, with each cue lasting between 5/6 of a second and 7 seconds. These numbers come from the Netflix Timed Text Style Guide and the BBC Subtitle Guidelines — the two most cited references for caption timing in professional production.

StandardRecommended value
Adult reading speed (Netflix)Max 17 CPS
Children's reading speed (Netflix)Max 13 CPS
Minimum cue duration (Netflix)5/6 second (≈ 833 ms)
Maximum cue duration (Netflix)7 seconds
Conversational speech (BBC)160–180 WPM

VexaScribe's built-in editor flags any cue that exceeds 17 CPS so you can split or shorten it before export. Cues shorter than the 833 ms minimum are also surfaced — they're too quick for most viewers to read. For accuracy on a wider range of audio conditions, see Whisper transcription accuracy.

Translate an SRT file into another language

You can translate an existing SRT into a different language while preserving the original timing. VexaScribe handles two paths:

  • 1.Translate the source audio. Upload your audio or video, transcribe in the original language, and translate the transcript into 80+ target languages via the integrated translation step. Export the translated transcript as a new SRT — timing carries over from the source.
  • 2.Translate an existing SRT. If you already have an English SRT (or any other source) and just need it in another language, upload the SRT and the translator preserves cue numbers and timecodes while replacing the dialogue text.

Whisper Large-v3 transcribes 99 source languages; the LLM-based translator supports 80+ target languages. For full details on the workflow and a tradeoff comparison with human translators, see transcribe and translate audio.

Honest note: machine translation of subtitles is fine for personal viewing, draft cuts, and internal review. For broadcast, theatrical release, or any context where misreading would matter, hire a human subtitler. Common gotchas: idioms (“break a leg”, “piece of cake”), proper nouns, and cultural references all benefit from human review.

Timestamp generator

When you only need timestamps — not the full SRT subtitle pipeline — VexaScribe outputs word-level and segment-level timestamps from any uploaded audio or video. Use these for podcast chapter markers, video editor cue points, YouTube chapter timestamps, lecture notes, or feeding into your own downstream tooling.

Output options

  • SRT-style00:00:01,500. Drop straight into a video editor cue list.
  • Short form1:30, 14:25. Ideal for podcast show notes and YouTube chapter descriptions.
  • Seconds90.5, 865.250. Machine-readable for spreadsheets and APIs.
  • JSON — per-word timestamps with millisecond precision, language metadata, and speaker IDs. Best for custom workflows.

Word-level precision typically lands within ±50 ms on clean studio audio. For longer or noisier recordings, segment-level (sentence) timestamps tend to be more reliable than per-word.

What is open captioning?

Open captions are burned directly into the video pixels — they're always visible and can't be turned off. Closed captions, by contrast, live in a separate file (SRT, VTT) and viewers toggle them on or off in the player.

Use open captions when the viewer has no way to enable them — social media silent autoplay (Instagram, TikTok, LinkedIn), cinema accessibility screenings, in-store displays, conference rooms. Use closed captions for streaming, broadcast TV, YouTube, and any context where regulation (ADA, FCC, CRTC, Ofcom, EAA) requires viewers to control captions independently.

AspectOpen captionsClosed captions (SRT/VTT)
Toggle on/offNoYes
Where they liveBurned into the video file itselfSeparate sidecar file or embedded as a track
File sizeNo change (text is pixels)Small (~few KB)
Style flexibilityLocked at render timeUser-configurable (color, size, font)
Best forSocial autoplay, cinema, public displaysStreaming, broadcast, accessibility compliance

VexaScribe generates the SRT/VTT files used for closed captions. To create open captions, generate the SRT here, then burn it into the video using a tool like FFmpeg, Premiere Pro, DaVinci Resolve, or CapCut. For the full burn-in workflow, see how to add subtitles to a video. For the broader accessibility vs translation distinction, see captions vs subtitles.

Create a video transcript (without the SRT)

An SRT is a video transcript with timecodes formatted for a video player. If you just want the plain spoken text — for show notes, a blog post, citation, search-indexing, or feeding into an LLM — you don't need the SRT step.

VexaScribe exports the same transcript as TXT (plain text, no timestamps), DOCX (Word with speaker labels and timestamp markers), or JSON (machine-readable, per-word precision). The underlying transcription is identical — only the output format differs.

For the full video-transcript workflow including format support, accuracy expectations, and use cases, see video to text.

Who Uses VexaScribe to Generate SRT Files?

Anyone publishing video benefits from subtitles. Most common workflows:

YouTube creators

Auto-caption every upload. Captioned videos rank better in YouTube search and improve average watch time.

Podcasters with video

Add subtitles to YouTube and Spotify Video versions of episodes. See dedicated podcast transcription workflow for show notes + SRT.

Social media creators

TikTok, Instagram Reels, YouTube Shorts. Subtitles increase completion rate by 40-60% on social.

Course creators

Udemy, Teachable, Skillshare, and Thinkific require accessibility-compliant subtitles. SRT is standard.

Video editors

Drop SRT files directly into Adobe Premiere Pro, Final Cut Pro, DaVinci Resolve, or any video editor.

Newsrooms & journalists

Subtitle interview clips for web publication. Translate to other languages for international audiences.

How to Embed Subtitles: HTML5, YouTube, Vimeo, and HLS

HTML5 video players consume .vtt via the <track> element, YouTube and Vimeo accept .srt directly through their CC upload UI, and HLS streams reference WebVTT segments from a subtitle playlist inside the master .m3u8.

HTML5 <video> with <track>

The W3C WebVTT 1 Recommendation defines the <track> element for native browser subtitle support. Use VTT (not SRT) — HTML5 players won't parse SRT directly.

<video controls width="720" preload="metadata">
  <source src="/videos/demo.mp4" type="video/mp4" />
  <track
    kind="subtitles"
    src="/videos/demo.en.vtt"
    srclang="en"
    label="English"
    default
  />
  <track
    kind="subtitles"
    src="/videos/demo.es.vtt"
    srclang="es"
    label="Espanol"
  />
  Your browser does not support HTML5 video.
</video>

YouTube CC Upload

YouTube accepts .srt, .sbv, .vtt, and .ttml per the official YouTube help article. UTF-8 encoding is required.

  1. 1Open YouTube Studio → Content → select your video.
  2. 2Click Subtitles → Add Language → choose your video's spoken language.
  3. 3Click Upload File → "With timing" → select your .srt file.
  4. 4Click Publish — captions go live within seconds.

For multi-language uploads, name files with BCP-47 tags: my-video.en.srt, my-video.es.srt, my-video.pt-BR.srt.

Vimeo Subtitle Upload

On Vimeo: open your video → Settings → Distribution → Subtitles → click the + button → choose language and upload your .srt file. Vimeo regenerates its player within ~30 seconds with the new caption track available behind the CC button.

HLS WebVTT Subtitle Track

HTTP Live Streaming uses segmented WebVTT referenced from the master playlist. Per the Apple HLS Authoring Specification, segments are typically ~30 seconds, UTF-8 encoded, served with text/vtt MIME type.

Master playlist (master.m3u8):

#EXTM3U
#EXT-X-VERSION:6

#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="English",DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE="en",URI="subs/en/index.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="Spanish",DEFAULT=NO,AUTOSELECT=YES,FORCED=NO,LANGUAGE="es",URI="subs/es/index.m3u8"

#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720,CODECS="avc1.4d401f,mp4a.40.2",SUBTITLES="subs"
video/720p/index.m3u8

Multi-Language Subtitle Workflow

Generate the source-language SRT first, then auto-translate the cue text into 99 languages while preserving the timecodes — and export each language as its own .srt file using the BCP-47 language tag in the filename.

  1. 1Generate the source SRT in the spoken language (e.g., English).
  2. 2Run auto-translation, which preserves timecodes and only swaps the dialogue text.
  3. 3Export each target as video.<lang>.srt using BCP-47 tags (video.es.srt, video.de.srt, video.pt-BR.srt).
  4. 4Reference all language tracks in your HTML5 player, HLS manifest, or YouTube upload.
File-naming tip: YouTube auto-detects the subtitle language from the BCP-47 suffix, so a file uploaded as my-video.es.srt is automatically labeled Spanish without you needing to pick from a dropdown. See translate the transcript for the full 133-language list.

Common SRT Encoding Pitfalls (and How to Fix Them)

The five most common SRT mistakes are wrong file encoding, using a period instead of a comma in timecodes, missing blank lines between cues, BOM characters at the file start, and overlapping timestamps. Here's what each looks like and how to fix it.

1. Non-UTF-8 encoding (mojibake)

Symptom: Accented characters appear as ’ or ß in the player.

Fix: Re-save the .srt as UTF-8 (without BOM) using VS Code or Notepad++. VexaScribe always exports UTF-8 by default.

2. Period instead of comma in timecodes

Symptom: Player shows the file as broken or skips cues.

Fix: SRT requires a comma decimal separator (00:00:01,200). VTT uses a period. Don't mix them.

3. Missing blank line between cues

Symptom: Multiple cues display merged on screen.

Fix: Every cue must end with a single blank line. A trailing CRLF is fine; a missing line break is not.

4. UTF-8 BOM at file start

Symptom: First cue index appears as 1 instead of 1; some players reject the file.

Fix: Save without BOM. In VS Code: bottom-right encoding label → 'Save with Encoding' → 'UTF-8'.

5. Overlapping or out-of-order timestamps

Symptom: Two cues compete for the same moment, or cues display in the wrong order.

Fix: Each cue's start time must be greater than or equal to the previous cue's end time. VexaScribe's editor catches this automatically.

Generate SRT Files for Cents Per Minute

All paid plans include unlimited SRT export. No per-export fees, no hidden charges.

Free trial

$0

30 min total

No credit card

Starter

$2/month

200 min/month

Solo creators

Basic

$5/month

1,000 min/month

Regular publishers

Frequently Asked Questions

How do I generate an SRT file from audio?

Upload your audio or video file (MP3, WAV, M4A, MP4, MOV, etc.) to VexaScribe (formerly NovaScribe). The AI transcribes the speech and automatically generates word-level timestamps. Review the transcript in the built-in subtitle editor, adjust any timing if needed, then click Download as SRT. The whole process takes 5-10 minutes for a one-hour file.

What is an SRT file?

SRT (SubRip Text) is the most universally supported subtitle file format. It's a plain text file with the .srt extension containing numbered subtitle blocks, each with a start time, end time (HH:MM:SS,mmm), and the subtitle text. Every major video platform supports SRT — YouTube, Vimeo, Twitch, TikTok, Instagram, plus desktop video editors like Premiere Pro, Final Cut Pro, and DaVinci Resolve.

How accurate are the auto-generated timestamps?

Timestamps are accurate to the millisecond. VexaScribe uses OpenAI's Whisper Large-v3 model to align each word in the transcript to the exact moment it was spoken. For best results, use clear audio with minimal background noise. You can fine-tune timestamps in the built-in editor before downloading the .srt file.

Is the SRT generator free to use?

Yes, you get 30 minutes of free transcription with no credit card required — generate SRT files for short videos at no cost. Paid plans start at $2/month for 200 minutes (Starter), $5/month for 1,000 minutes (Basic), $10/month for 2,500 minutes (Pro). All paid plans include SRT export.

What's the difference between SRT and VTT?

SRT (SubRip Text) uses a comma decimal separator (00:00:01,200) and supports basic HTML tags only. VTT (WebVTT, a W3C standard) uses a period decimal separator (00:00:01.200), requires a 'WEBVTT' header line, and supports CSS classes, positioning, and voice tags — making it the format of choice for HTML5 video and HLS streams. VexaScribe exports both formats on every paid plan.

Can I edit the SRT file before downloading?

Yes, VexaScribe includes a full subtitle editor. You can correct any text, adjust start and end timestamps, split or merge subtitle entries, and preview the timing against the audio. Changes are saved automatically. When you're satisfied, click Download as SRT to get the final file.

How do I add SRT subtitles to YouTube?

In YouTube Studio, go to Content, select your video, then Subtitles. Click Add Language, choose your language, then Upload File → 'With timing' → select your .srt file. YouTube applies the subtitles immediately. Captioned videos rank better in YouTube search and improve watch time.

What audio and video formats can I upload?

VexaScribe accepts MP3, WAV, M4A, FLAC, OGG, AAC, AIFF, WMA, AMR, OPUS for audio, and MP4, MOV, AVI, MKV, WebM, FLV, WMV for video. Files can be up to 5 GB and 10 hours long. Video files have their audio extracted automatically; the video itself is not retained after transcription.

Can I generate SRT files in non-English languages?

Yes, VexaScribe transcribes in 99 languages with automatic language detection. Upload audio in Spanish, French, German, Japanese, Arabic, or any of the 99 supported languages and download the SRT file in that language. You can also translate the transcript to a different language using the built-in translation feature (133 languages via Google Translate).

What's the maximum reading speed for subtitles?

The Netflix Timed Text Style Guide recommends a maximum of 17 characters per second (CPS) for adult content and 13 CPS for children's content. The BBC Subtitle Guidelines suggest a conversational reading speed of 160-180 words per minute. VexaScribe's editor flags any cue that exceeds 17 CPS so you can split or shorten it before export.

How long should each subtitle cue stay on screen?

Per Netflix's Timed Text Style Guide, the minimum duration is 5/6 of a second (approximately 833 ms) and the maximum is 7 seconds per cue. Cues shorter than 833 ms are too quick to read; cues longer than 7 seconds drift out of sync with the on-screen action. VexaScribe automatically respects these bounds when generating cues.

Do I legally need captions on my videos?

For prerecorded video on most public-facing websites, yes — under WCAG 2.1 Success Criterion 1.2.2 (Level A). U.S. state and local government sites have binding ADA Title II deadlines: April 26, 2027 for entities serving 50,000+ people, and April 26, 2028 for those under 50,000. Higher education and large enterprises are also commonly covered by Section 508 and equivalent international rules.

Why does my SRT file show strange characters in some players?

Encoding mismatch — the file was likely saved as Windows-1252 or another non-UTF-8 encoding. YouTube and most modern players require UTF-8. Re-save the file as UTF-8 (without BOM) using a code editor like VS Code or Notepad++. VexaScribe always exports UTF-8 by default, so this typically only happens after manual edits in older tools.

Will the SRT file work in Premiere Pro, Final Cut Pro, and DaVinci Resolve?

Yes. SRT is the universal subtitle format supported by all major video editors. Just import the .srt file using your editor's caption or subtitle import workflow. The file will sync to your video timeline using the embedded timestamps.

What is open captioning?

Open captions are burned directly into the video pixels — always visible, can't be turned off. Closed captions (SRT/VTT) live in a separate file and viewers toggle them on or off. Use open captions for social media silent autoplay, cinema accessibility, and public displays where viewers can't enable captions themselves. Use closed captions for streaming, broadcast TV, YouTube, and any context where regulation (ADA, FCC, EAA) requires user-controlled captions. VexaScribe generates SRT/VTT files; burn them in with FFmpeg, Premiere, or CapCut if you need open captions.

Can I translate an SRT file into another language?

Yes. Two paths: (1) Upload your source audio or video, transcribe in the original language, then translate the transcript into 80+ target languages and export as a new SRT — timing preserved. (2) Upload an existing SRT and the translator preserves cue numbers and timecodes while replacing the dialogue text. Honest note: machine translation of subtitles is fine for personal viewing, draft cuts, and internal review. For broadcast or theatrical release, hire a human subtitler — idioms, proper nouns, and cultural references benefit from human review.

Can I generate just timestamps without the full SRT file?

Yes. VexaScribe outputs timestamps in four formats: SRT-style (00:00:01,500), short form (1:30), seconds (90.5), and JSON with per-word millisecond precision. Use these for podcast chapter markers, YouTube chapter timestamps, video editor cue points, or custom downstream tooling. Word-level precision typically lands within ±50 ms on clean studio audio.

How do I just get a video transcript without the SRT formatting?

An SRT is a video transcript with timecodes formatted for a video player. If you only want the plain spoken text — for show notes, blog repurposing, citations, or feeding into an LLM — export as TXT (plain text), DOCX (Word with speaker labels), or JSON instead. The underlying transcription is identical; only the output format differs.

Where can I read about the SRT format specification?

The original SRT format started as the output of the SubRip software in the late 1990s and has no formal spec — it's a de facto standard. Wikipedia's SubRip article is the most widely cited reference. The format is plain text: numbered cue, timecode line (HH:MM:SS,mmm --> HH:MM:SS,mmm), one or more text lines, blank line. VTT (WebVTT), in contrast, has a formal W3C specification at w3.org/TR/webvtt1/.

Generate Your First SRT File in 30 Seconds

30 minutes of free transcription, no credit card required. Upload a video and download a finished .srt file in minutes.