Skip to editor

Chinese Text to Speech

Convert text to natural Mandarin speech — 100+ AI voices, free MP3 download.

cmn-CN
Style
speed:1.0
pitch:0
Volume:100%
File
Pause
Clear
Step backward
Step forward
Ssml
Cut
Sound Selection

100+ Mandarin Voices — Tones, Pinyin & Simplified Script

Turn any Chinese text into natural Mandarin speech with correct tones and pinyin handled automatically. The engine reads four lexical tones plus the neutral tone, applies third-tone sandhi where two falling-rising syllables meet, and voices the retroflex initials (zh, ch, sh, r) that define standard Putonghua. Pick a narrator like Yunyang (Pro Neural, male) or Xiaoxiao (Pro Neural, female) and download your audio in seconds.

The catalogue spans Pro Neural and HD tiers across male and female speakers. HD voices like Yunxi HD and Xiaoxiao HD unlock emotional styles — cheerful, newscast, angry, customer-service — selectable from the style dropdown in the editor. Useful for Douyin and YouTube voiceover, audiobook narration of classical novels, business presentations to Mandarin-speaking partners, and tone-drilling for learners preparing for proficiency exams. First 1,000 characters free, no account needed.

  • 100+ Mandarin voices — Pro Neural & HD
  • Four tones + neutral, auto sandhi
  • Adjustable speed & pitch
  • Download MP3, WAV, FLAC, OGG
  • Free — 1,000 chars, no signup

Chinese Voice Gallery — Mandarin Speakers

Click to preview · 100+ voices total

These are 4 featured speakers. Browse all 100+ on the voices page — filter by zh-CN.

Voice Styles — Cheerful vs Newscast

HD voices unlock emotional and situational styles on top of the default neutral register. Same text, same speaker — Xiaoxiao HD reads the line below twice: once in a cheerful tone and once as a broadcast anchor.

Text Cheerful Newscast Typical Use
"今天天气真好,我们出去走走吧!" cheerful newscast Cheerful: ads, product launches, kids' content. Newscast: bulletins, formal reports, corporate video.
"这个消息让人非常兴奋!" cheerful newscast Cheerful: celebrations, social media clips. Newscast: financial summaries, documentary narration.

Emotional styles are available on HD voices: Yunxi HD (cheerful, chat, angry, complaining, depressed) and Xiaoxiao HD (cheerful, chat, angry, customer-service, excited). The remaining 100 voices read in their default neutral register — ideal for narration, e-learning, and everyday voiceover work. Select the style from the dropdown in the editor.

Mandarin Phonetic Highlights — Tones & Pinyin in Practice

Eight phrases that show how the engine handles Chinese pronunciation challenges — tonal contours, tone sandhi, retroflexes, and neutral tones. Click play to listen.

Phrase Pinyin Meaning Phonetic Note
你好 nǐ hǎo Hello Tone sandhi: 3+3 becomes 2+3 in connected speech
谢谢 xiè xie Thank you Second syllable reduces to a neutral tone
我爱你 wǒ ài nǐ I love you Three-tone contour: 3-4-3 across three syllables
中国 Zhōngguó China Retroflex zh- initial + rising tone on guó
普通话 pǔtōnghuà Standard Mandarin Aspirated p-, three distinct tones: 3-1-4
四是四,十是十 sì shì sì, shí shì shí Four is four, ten is ten Classic tongue-twister: s- vs sh- with tonal contrast
再见 zàijiàn Goodbye Both syllables fall sharply — double fourth tone
不客气 bú kèqi You're welcome Tone change: bù shifts to bú before a fourth-tone syllable

Mandarin — Formatting & Input Conventions

How you format your source text affects the spoken output. Four conventions worth knowing when pasting simplified characters:

Numbers

3.14 → "三点一四" — the engine reads decimals as individual digits after the point. Large figures like 10,000 map to the Chinese wan unit: 50,000 reads as "五万" (wǔ wàn), not "fifty thousand".

Currency

¥399 → "三百九十九元". The yuan sign ¥ is read as "元" (yuán). For informal tone, write "块" in the source text and the voice says "kuài" instead — exactly how native speakers talk about money in daily life.

Dates & Time

2026年4月12日 → "二零二六年四月十二日". Year-month-day order is the standard. For times, 14:30 reads as "十四点三十分" — 24-hour format is natural in Mandarin context.

Punctuation

Use full-width punctuation for natural pausing: (comma), (period), (exclamation). The engine handles both full-width and half-width, but full-width produces more accurate breath-group pauses in longer passages.

What You Can Build with Chinese TTS

Home studio with Chinese video editing timeline and Mandarin voiceover waveform

Content Creation & Voiceover

Add a Mandarin voiceover to YouTube videos, Douyin clips, and podcast episodes. Pick a voice, paste your script in simplified characters, and export the audio to drop into Premiere, DaVinci, or CapCut — no recording booth, no voice actor, done in under a minute.

Student desk with Mandarin pronunciation notes, tone charts and headphones

Mandarin Learning & Tone Practice

Paste vocabulary lists or textbook dialogues and listen at 0.75x speed to catch every tone. Useful for drilling third-tone sandhi rules, preparing for proficiency exams, or simply building listening confidence before a trip. Slow it down, repeat, and ramp back up when you follow along.

Modern business presentation with Chinese financial chart and microphone

Business Presentations & Corporate

Voice a quarterly report, investor pitch, or onboarding video for a Mandarin-speaking audience. The narration-professional style reads financial figures, company names, and technical terms clearly. Export the file and embed it directly in PowerPoint, Keynote, or a corporate LMS.

Open classical Chinese book with earbuds and traditional tea cup on wooden desk

Audiobooks & Classical Literature

Turn novels, web fiction, or classical texts into audio. A warm narrator reads chapter after chapter with natural breath pauses, correct measure-word placement, and consistent tone accuracy throughout — whether the source is a modern thriller or a passage from the Four Great Classical Novels.

How It Works — 3 Steps

Three steps to generate Mandarin audio online. No software, no signup.

01

Paste or type your text

Type directly or paste up to 1,000,000 characters in simplified or traditional script. Upload DOCX, PDF, or SRT files. The engine auto-detects hanzi and applies the right pronunciation rules.

02

Choose a Mandarin voice

Pick from 100+ speakers. Filter by gender and quality tier — Pro Neural or HD. Adjust speed and pitch to match your project. HD voices also offer emotional styles in the style dropdown.

03

Listen & download free

Click Convert to Speech, preview the result, and download as MP3, WAV, or FLAC. First 1,000 characters free — no account needed. No watermark on any plan.

Mandarin Spotlight — Tone Sandhi, Cantonese & Dialect Variants

Three things that make this language unique for text-to-speech — and how SpeechGen handles each one.

Tones & Tone Sandhi

Mandarin uses four lexical tones plus a neutral tone. When two third tones appear in sequence, the first shifts to a second tone — a rule called tone sandhi. The engine applies this shift automatically: paste "你好" and the output already reads ní hǎo, not nǐ hǎo. The same logic applies to "不" (bù → bú before tone 4) and "一" (yī → yí / yì depending on the following tone).

Dialects We Cover

The primary voice catalogue targets Standard Mandarin (Putonghua, zh-CN). Beyond that, SpeechGen also provides Cantonese (Yue) voices for Hong Kong and Guangdong audiences, and the library is expanding toward Wu (Shanghainese), Southwestern Mandarin (Sichuanese), Jilu Mandarin (Shandong region), and Zhongyuan Mandarin (Henan). If you previously used a sub-dialect page, it now redirects here — all regional variants live under one roof.

Simplified vs Traditional Script

The voices read both simplified characters (简体, used in mainland China) and traditional characters (繁體, used in Taiwan, Hong Kong, Macau). Paste either script and the engine recognises the character set without manual switching. For Taiwan Mandarin intonation, look for cmn-TW voices in the catalogue — they carry slightly different prosody compared with the zh-CN default.

Chinese TTS — Frequently Asked Questions

How many Mandarin voices does SpeechGen offer?

More than 100 speakers across two quality tiers — Pro Neural and HD. The roster covers male and female voices with varied registers from formal broadcast to everyday conversational. HD speakers add emotional styles you can switch from the dropdown: cheerful, newscast, angry, chat, and more.

Does the engine handle tones correctly?

Yes. The phonology engine recognises all four lexical tones plus the neutral tone and applies standard sandhi rules — third-tone pairs, bù/yī alternation — automatically. You do not need to add tone marks or phonetic markup; paste hanzi and the output follows Putonghua pronunciation norms.

What is the difference between Mandarin and Cantonese voices?

Mandarin (Putonghua) voices speak zh-CN Standard Mandarin with four tones. Cantonese (Yue) voices target the six-tone system spoken in Hong Kong and Guangdong. Both are available in the catalogue — filter by zh-CN for Mandarin or zh-HK / yue for Cantonese. The two language varieties are not mutually intelligible, so pick the one your audience actually speaks.

Can I paste both simplified and traditional characters?

Yes. The engine accepts simplified (简体) and traditional (繁體) characters without any manual toggle. For Taiwan-accented Mandarin, select a cmn-TW voice — the prosody and certain vowel realisations differ slightly from zh-CN mainland voices.

Is there a way to practise tones for proficiency exams?

Paste your vocabulary list or textbook dialogue, set playback speed to 0.75x, and listen phrase by phrase. The engine reads each syllable with the correct tonal contour, which makes it practical for ear-training before listening sections. You can also compare your own pronunciation with the generated audio by playing it back repeatedly.

We use cookies to ensure you get the best experience on our website. Learn more: Privacy Policy

Accept Cookies