30-11--0001 , 16-09-2025
Open the language dropdown and select the language of your text. Supported languages: Over 150 languages (AI voices library).
After selecting the language, a list of voices will appear. Listen to samples and choose your favorite
Copy your text into the text box or upload a file (DOCX, PDF). For converting subtitles to speech, use the dedicated SRT to voice page.
Wait for processing and download your ready audio file
That's it! Your first voiceover is ready in just a couple of minutes.
💡 Tip: When copying from PDF files, pay special attention to the text — invisible characters may appear that will ruin the audio!
Maximum per generation: 2,000,000 characters (≈ 285,000-330,000 words) - this is the impressive amount of text you can convert to speech in a single generation, making it ideal for long-form content like entire books or extensive documentation.
⚠️ Important: First select the correct language for your text
After selecting the language, a list of available voices will open. Listen to samples by clicking the play button for each voice to find the one that best suits your needs. You'll see different voice types available: Regular voices offer standard quality, PRO voices provide improved quality and naturalness, and Multi-language voices (marked with language codes like Ava_US, Ava_ES) allow you to maintain voice consistency across different languages. Take time to preview each voice as they vary significantly in tone, emotion, and character.
Below the text box, above the generate button, you can adjust the pause settings:
Click the "Generate Speech" button below the text box to start the conversion process. The processing time depends on your text length - shorter texts complete in seconds while longer documents may take a few minutes. Once generation is complete, you'll be able to listen to the result directly in the browser to ensure it meets your expectations.
After generation completes, a "Download" button will appear. By default, you can simply download the file as MP3. However, if you need a different format (WAV or OPUS) or want to change the audio quality (sample rate from 8000 to 44000 Hz), you'll need to first select these options from the dropdown menus, regenerate the speech with your chosen settings, and then download the file with your preferred specifications.
Speed scale:
Why this scale: Fractional values less than 1 slow down speech, greater than 1 speed up. This allows precise tempo selection for your audience.
Speed recommendations:
Pitch range: from -20 to +20 with step 2
Why step 2: A step of 2 units provides noticeable but not sharp pitch change. Smaller steps would be unnoticeable, larger steps too dramatic.
Pitch influence:
Applications:
Pauses between sentences: 300ms (default)
Pauses between paragraphs: 400ms (default)
These settings can be changed in dropdown menus from 150ms to 30 seconds.
Through interface:
Through tags:
Insert tag <break time="200ms"/> or <break time="2s"/> at the desired location
Pause rules:
When to use pauses:
The dialogue function allows using different voices in one text.
The multi-voice dialogue feature opens up creative possibilities beyond just character voices. Foreign language teachers, for instance, can use this function to demonstrate the same phrase at multiple speeds for language learning, helping students grasp pronunciation at different comprehension levels. For detailed techniques and classroom applications, see our guide on using text-to-speech for foreign language teaching.
Voices with language codes (e.g., Ava_US, Ava_ES, Ava_DE) are designed to maintain consistent voice recognition across different languages. These multi-language voices enable you to create a unified style for multilingual content, ensuring that the same voice character can speak multiple languages seamlessly. This feature is particularly useful in dialogue mode, where you can switch between languages while keeping the same recognizable voice personality throughout your audio project.
SpeechGen allows you to split your generated audio into multiple segments within a single synthesis project, making it perfect for video editors who need separate audio files for different scenes or chapters. This feature is particularly useful for creating voiceovers for YouTube videos, online courses, or any project requiring precise audio synchronization.
To split your audio, simply place your cursor where you want to divide the text and click the cut button in the menu panel. This inserts a <cut/> ag at that position. You can also manually type or copy-paste this tag throughout your text. For custom filenames, use this format:
<cut name="your-filename"/>
This feature helps you organize segments with meaningful names like:
<cut name="intro"/>
<cut name="chapter-1"/>
Once you've added at least one segment tag, a "download segments" button appears after generation. Click it to download all segments at once, or use the "more" button on the audio player to access individual segments. Each file is automatically named with a unique ID, sequence number, and descriptive title (e.g., "7054789_1_first-sentence"), making it easy to identify and organize your audio files in your editing software.
For larger projects, split into multiple generations. For comprehensive instructions, advanced techniques, and video tutorials, visit our complete audio segmentation documentation.
Some voices have intonation graphs:
Intonation graphs are available on voices that display a settings icon next to the voice name - this feature is found on more than half of the voices in the library, including both regular and PRO options
Select the sentence in which you want to adjust the intonation and press the intonation button. This interface will appear.
SpeechGen. uses an intelligent caching system that significantly saves your limits. The system works by saving each sentence (up to 100,000 characters) in cache for 7 days. When you regenerate your audio, any unchanged sentences are automatically retrieved from the cache for free - you only pay for new or edited sentences. This means you can make incremental edits to your text without consuming your entire character allowance each time. Project history is stored for 30 days, and files you add to favorites are kept permanently.
Storage periods:
Voice sounds unnatural:
Incorrect pronunciation:
Unnatural pauses:
SSML errors:
For expert voice control, use SSML tags:
⚠️ Attention: Different voices support different sets of SSML tags. Test functionality for each specific voice.
API is available for developers to integrate SpeechGen.io into their own applications and services.
First, check that your file is in a supported format (DOCX, PDF, or TXT). Make sure the file isn't corrupted and try uploading again. If the issue persists, copy the text manually and paste it directly into the text box. Also verify that your file size doesn't exceed the platform limits.
Your project history is automatically saved for 30 days. The smart cache (for sentence-level savings) lasts 7 days. To keep files permanently, add them to your favorites. This ensures your important audio projects are never lost and remain accessible in your profile.
Yes! SpeechGen offers multi-voice audio generation (dialogue mode). You can assign different voices to different text sections, making it perfect for audiobooks with multiple characters, educational dialogues, or podcasts with multiple speakers. You can even use multi-language voices to switch between languages while maintaining character consistency.
PRO voices offer superior quality and naturalness compared to regular voices. They typically have better emotional expression, more accurate pronunciation, and some support advanced features like intonation graphs. For professional projects like audiobooks, courses, or business presentations, PRO voices are recommended.
It depends on which settings you change. Adjusting speech speed or pitch requires full regeneration and will consume your character limits, as these changes affect the entire voice synthesis. However, you can freely modify pauses between sentences and paragraphs without any limit consumption. Additionally, SpeechGen uses smart caching: if you generate a large text, then edit just one sentence and regenerate, the system will only charge you for that single changed sentence, not the entire text. This caching system saves your unchanged sentences for 7 days, making iterative editing very economical.
Get help from our community! Ask your questions in our Telegram chat: https://t.me/speechgen