One balance — spend on any voice tier or transcription. Higher quality uses more limits per character. Valid for 1 year — resets on every top-up.
Standard1 limit = 2 chars
Pro1 limit = 1 char
HD1 limit = 0.5 chars
What You Get
One pool of credits — spend on any voice tier or transcription. All features included in every pack.
25K
65K
200K
500K
Standard voices
50,000
130,000
400,000
1,000,000
Pro voices
25,000
65,000
200,000
500,000
HD voices
12,500
32,500
100,000
250,000
Transcription
180 min
467 min
1,437 min
3,592 min
Included in every pack— Text to Speech
AI voices
✓ 5,000+ voices available
Languages & accents
✓ 150+ with regional variants
Commercial license
✓ Monetize your content
Smart Cache
✓ Free repeats, zero cost
Multi-speaker dialogues
✓ Multiple voices in one file
SSML editor
✓ Fine-tune pauses & intonation
Export formats
✓ MP3, WAV, OGG
PDF & DOCX to speech
✓ Upload & convert
API access
✓ Automate at scale
7 features included
File upload
✓ Up to 1 GB / 3 hours
Languages
✓ 150+ supported
Speaker diarization
✓ Who said what
Timestamps
✓ Word-level timing
Subtitle export
✓ SRT, VTT
Bulk export
✓ Multiple formats at once
Input formats
✓ MP3, WAV, YouTube, video
How to read this table: Each pack gives you a fixed number of limits — think of them as universal credits. You can spend them on any voice tier, transcription, or a mix of everything. Higher-quality voices cost more limits per character:
Standard voices are the most efficient (1 limit = 2 characters), Pro voices are the baseline (1 limit = 1 character), and HD voices deliver the best quality at 2 limits per character.
The numbers above show the maximum you get if you use only one voice tier.
In practice, most users mix tiers — and the balance carries over for up to 1 year.
Audio duration is approximate, based on average English text at ~140 words per minute.
Why pay monthly when you don't use it monthly?
Other text-to-speech services charge $22–$99/month — whether you use them or not. SpeechGen packs stay active for up to 1 year. Pay only when you need more.
SpeechGen
Monthly Subscription
Monthly fee
None — buy when needed
$22–$99/month
Unused limits
Carry over for 1 year
Lost every month
Smart Cache
Free repeats, zero cost
Counts against quota
Commitment
None
Monthly lock-in
PayPal
Yes
Usually no
What our users say
★★★★★
"My audience thought I hired a professional narrator. It was a Pro voice from SpeechGen — the 65K pack cost me less than a coffee."
Alex R.Podcaster
★★★★★
"Smart Cache is a game-changer. I regenerate the same IVR prompts weekly and it costs zero extra."
Maria S.Business Owner
★★★★★
"The pay-as-you-go model is perfect. I only need voiceovers for YouTube once a month — why would I pay monthly?"
James K.Content Creator
700M+ files generated1M+ users150+ languages4.8★ average rating
Pricing FAQ
You pay once for a limit pack — no subscription, no auto-renewal. Your limits remain available for up to 1 year. When you run out, simply buy another pack. No monthly fees, ever.
All three tiers draw from the same limit balance — the difference is quality and cost. Standard voices are the most economical (1 limit = 2 chars) and cover the widest language range. Pro voices use neural networks for more natural speech (1 limit = 1 char). HD voices are the most expressive and lifelike (1 limit = 0.5 chars). Choose the tier that fits your project — or mix all three in one pack.
Yes — invoices are available in your profile after payment. You can customize company name, address, and VAT number for accounting purposes.
SpeechGen converts text sentence by sentence. Every generated sentence is cached. When you edit your project and add new text, only the new sentences cost limits — everything already generated is served from cache at zero cost. This means you can freely revise, extend, and re-export projects without re-paying for unchanged parts.
The "Audio" tab in the table above shows approximate duration based on average English speech at ~140 words per minute (~840 characters per minute). Actual duration varies by voice, language, speed settings, and content. Billing is always per character, not per minute — the editor shows the exact count before each generation.
Limits are valid for 1 year from the date of purchase. If you buy a new pack before the year is up, your remaining balance is added to the new pack and the 1-year timer resets from the new purchase date. So if you top up regularly, nothing ever expires — your balance simply grows with each purchase.
Billing is per character (including spaces and punctuation). For English text, the average word is about 5 characters plus a space — so 1 word ≈ 6 characters. Example: 1,000 characters ≈ 170 words, 65,000 characters ≈ 10,800 words. Other languages may differ. The character counter in the editor always shows the exact count before you generate.
Yes — limits are universal. You can spend them on text-to-speech, audio-to-text transcription, or any combination. There's no separate balance for each service.
We use cookies to ensure you get the best experience on our website. Learn more: Privacy Policy