Synthesize Voice from Text Without Extra Costs Thanks to Exclusive Smart Caching Technology

, 24-07-2024

Speechgen offers a unique economical caching feature that significantly reduces time and costs for text-to-speech conversion. In this article, we will explore how this feature works, its benefits, and how it helps you save during voiceovers.

How Economical Caching Works

When you synthesize speech, Speechgen remembers the result of each sentence. For example:

  • You voiced 30 sentences.
  • Then you added 10 more and voiced them again.
  • Speechgen will take the 30 already voiced sentences from memory, voice the 10 new ones, and combine them into one file.


  • Less time for voiceover
  • Savings on limits for already voiced sentences

Usage Example

Imagine you are working on voicing an educational course with 20 lessons. After completing the work, you decide to add a brief introduction to each lesson. With a regular service, you would have to voice the entire material again, leading to significant costs. With Speechgen, you will only pay for voicing the new introductions, saving resources and time.

Here’s a comparison of Speechgen with other services:


Other TTS


Example #1: 30 sentences

100% cost

100% cost

Example #2: 30 sentences + 10 new

100% cost

25% cost

With other speech synthesis services, each voiceover incurs a 100% cost of everything you voiced. With Speechgen, only new or changed sentences are voiced. As seen in the table, with a repeated voiceover, Speechgen used only 25% of the total character count instead of 100%, since 75% of the text was taken from previously voiced content.

This means you don't need to worry about repeated costs when revising your text. You can return to your text later and work with it.

Terms and Limitations

  • Text Volume: Up to 100,000 characters for the same settings and speaker.
  • Storage Duration: Economical cache is stored for 1 week.
  • Caching Unit: Whole sentences are saved, not individual words.

Detailed Operation

Text up to 100,000 characters

Above this, a book mode for faster voicing of large texts is used, processing by large text blocks instead of sentences. Speechgen can voice up to 2,000,000 characters at once, but economical caching works up to 100,000 characters.

Economical Cache Stored for 1 Week

Voiced sentences are stored in memory for only 1 week. You have 7 days to supplement or revise the voiceover.

Additionally, in your profile, the complete voiceover history is stored for 30 days. This means that within 30 days you can download the text and file in their entirety. However, the cache itself will be stored for only 7 days.

If you decide, for example, to add to the voiceover after 25 days, the limits will be deducted again for the entire project. By saving the voiceover to favorites, you can keep the audio with the text forever, but the cache will still only be stored for 7 days.

Your text and audio file are saved in your profile, but not the cache, so please keep this in mind when working.

What Constitutes a Text Change

Cache works only for unchanged sentences. If you change even one letter or remove a comma in a sentence, it is considered new by the system.


Adding a New Sentence:

Original Text:

  • Artificial intelligence is changing the world.
  • Technology is advancing at incredible speed.
  • The future, which we awaited, has arrived..

Adding a new sentence:

  • We must be ready for changes.

Result: Speechgen takes the first three sentences from the cache and voices only the fourth one. Costs are incurred only for the fourth sentence.

Changing One Word:

Original Text:

  • Artificial intelligence is changing the world.
  • Technology is advancing at incredible speed.
  • The future, which we awaited, has arrived.

Changing one word in the second sentence:

  • Technology is advancing at a surprising speed.

Result: Speechgen takes the first and third sentences from the cache but voices the second one again.

Removing a Comma:

Original Text:

  • Artificial intelligence is changing the world.
  • Technology is advancing at incredible speed.
  • The future, which we awaited, has arrived.

Removing the commas in the third sentence:

  • The future which we awaited has arrived.

Result: Speechgen will re-voice the third sentence, and take the first and second sentences from the cache. The third sentence is considered changed due to the removal of commas.

Additional Changes

Adding <break>

If you add a new pause tag, such as break, it is also considered a change to the sentence. The system will reanalyze and revoice it.

<break time="200ms"/>

In fact, sentences are retrieved from the economical cache based on a complete match, character by character. If there is any new character or if a character is missing in the sentence, the program will not be able to match it exactly.

Changing Speed, Tone, and Speaker

If you change the speed or tone settings, it will be a completely new voiceover, and the economical cache will not work. When you change the speed or tone, the neural network revoices the text with these new parameters. This is not a software speed-up or tone change; it is a full revoice.

Changing the speaker also results in a complete revoice. Here, the neural network does all the work again. Therefore, if you are adjusting the voice, do this for 1-2 sentences, and once you are satisfied with the speed and tone, voice the entire desired text.

What Can Be Changed

Speeding Up and Slowing Down Voice in Subtitles

On this special page, you can voice subtitles. To fit the timing, it is often necessary to speed up speech to meet the required timing. In this case, the economical cache works, as Speechgen first voices and then programmatically speeds up the subtitle.

Changing Pauses in Settings

You can change the pauses in the settings under the voicing field, and the cache will work perfectly. We save entire sentences to memory, and the system then combines them into audio. This way, you can adjust pauses between sentences or paragraphs without additional costs.

Changing Format

If you select a different format—ogg, wav, opus—and press revoice, the system will not charge you any limits. This is free. If you voiced and then realized you needed a different format, change it without fearing double costs.

Changing Sample Rate

If you change the Sample Rate in the settings and press revoice again, the system will not charge you any limits. This is free.


Speechgen's economical caching system offers significant advantages:

  • Resource Savings: Pay only for new content, not the entire text again.
  • Faster Work: Repeat voiceovers are much quicker.
  • Flexibility: Experiment with your text without worrying about additional costs.

Speechgen saves your resources and provides tools for more efficient work with audio content, making it an ideal choice for those who value efficiency and quality in speech synthesis.

We use cookies to ensure you get the best experience on our website. Learn more: Privacy Policy

Accept Cookies