Chinese Text to Speech
Convert text to natural Mandarin speech — 100+ AI voices, free MP3 download.
100+ Mandarin Voices — Tones, Pinyin & Simplified Script
Turn any Chinese text into natural Mandarin speech with correct tones and pinyin handled automatically. The engine reads four lexical tones plus the neutral tone, applies third-tone sandhi where two falling-rising syllables meet, and voices the retroflex initials (zh, ch, sh, r) that define standard Putonghua. Pick a narrator like Yunyang (Pro Neural, male) or Xiaoxiao (Pro Neural, female) and download your audio in seconds.
The catalogue spans Pro Neural and HD tiers across male and female speakers. HD voices like Yunxi HD and Xiaoxiao HD unlock emotional styles — cheerful, newscast, angry, customer-service — selectable from the style dropdown in the editor. Useful for Douyin and YouTube voiceover, audiobook narration of classical novels, business presentations to Mandarin-speaking partners, and tone-drilling for learners preparing for proficiency exams. First 1,000 characters free, no account needed.
- 100+ Mandarin voices — Pro Neural & HD
- Four tones + neutral, auto sandhi
- Adjustable speed & pitch
- Download MP3, WAV, FLAC, OGG
- Free — 1,000 chars, no signup
Chinese Voice Gallery — Mandarin Speakers
Click to preview · 100+ voices total
These are 4 featured speakers. Browse all 100+ on the voices page — filter by zh-CN.
Voice Styles — Cheerful vs Newscast
HD voices unlock emotional and situational styles on top of the default neutral register. Same text, same speaker — Xiaoxiao HD reads the line below twice: once in a cheerful tone and once as a broadcast anchor.
Emotional styles are available on HD voices: Yunxi HD (cheerful, chat, angry, complaining, depressed) and Xiaoxiao HD (cheerful, chat, angry, customer-service, excited). The remaining 100 voices read in their default neutral register — ideal for narration, e-learning, and everyday voiceover work. Select the style from the dropdown in the editor.
Mandarin Phonetic Highlights — Tones & Pinyin in Practice
Eight phrases that show how the engine handles Chinese pronunciation challenges — tonal contours, tone sandhi, retroflexes, and neutral tones. Click play to listen.
Mandarin — Formatting & Input Conventions
How you format your source text affects the spoken output. Four conventions worth knowing when pasting simplified characters:
Numbers
3.14 → "三点一四" — the engine reads decimals as individual digits after the point. Large figures like 10,000 map to the Chinese wan unit: 50,000 reads as "五万" (wǔ wàn), not "fifty thousand".
Currency
¥399 → "三百九十九元". The yuan sign ¥ is read as "元" (yuán). For informal tone, write "块" in the source text and the voice says "kuài" instead — exactly how native speakers talk about money in daily life.
Dates & Time
2026年4月12日 → "二零二六年四月十二日". Year-month-day order is the standard. For times, 14:30 reads as "十四点三十分" — 24-hour format is natural in Mandarin context.
Punctuation
Use full-width punctuation for natural pausing: , (comma), 。 (period), ! (exclamation). The engine handles both full-width and half-width, but full-width produces more accurate breath-group pauses in longer passages.
What You Can Build with Chinese TTS
Content Creation & Voiceover
Add a Mandarin voiceover to YouTube videos, Douyin clips, and podcast episodes. Pick a voice, paste your script in simplified characters, and export the audio to drop into Premiere, DaVinci, or CapCut — no recording booth, no voice actor, done in under a minute.
Mandarin Learning & Tone Practice
Paste vocabulary lists or textbook dialogues and listen at 0.75x speed to catch every tone. Useful for drilling third-tone sandhi rules, preparing for proficiency exams, or simply building listening confidence before a trip. Slow it down, repeat, and ramp back up when you follow along.
Business Presentations & Corporate
Voice a quarterly report, investor pitch, or onboarding video for a Mandarin-speaking audience. The narration-professional style reads financial figures, company names, and technical terms clearly. Export the file and embed it directly in PowerPoint, Keynote, or a corporate LMS.
Audiobooks & Classical Literature
Turn novels, web fiction, or classical texts into audio. A warm narrator reads chapter after chapter with natural breath pauses, correct measure-word placement, and consistent tone accuracy throughout — whether the source is a modern thriller or a passage from the Four Great Classical Novels.
How It Works — 3 Steps
Three steps to generate Mandarin audio online. No software, no signup.
Paste or type your text
Type directly or paste up to 1,000,000 characters in simplified or traditional script. Upload DOCX, PDF, or SRT files. The engine auto-detects hanzi and applies the right pronunciation rules.
Choose a Mandarin voice
Pick from 100+ speakers. Filter by gender and quality tier — Pro Neural or HD. Adjust speed and pitch to match your project. HD voices also offer emotional styles in the style dropdown.
Listen & download free
Click Convert to Speech, preview the result, and download as MP3, WAV, or FLAC. First 1,000 characters free — no account needed. No watermark on any plan.
Mandarin Spotlight — Tone Sandhi, Cantonese & Dialect Variants
Three things that make this language unique for text-to-speech — and how SpeechGen handles each one.
Tones & Tone Sandhi
Mandarin uses four lexical tones plus a neutral tone. When two third tones appear in sequence, the first shifts to a second tone — a rule called tone sandhi. The engine applies this shift automatically: paste "你好" and the output already reads ní hǎo, not nǐ hǎo. The same logic applies to "不" (bù → bú before tone 4) and "一" (yī → yí / yì depending on the following tone).
Dialects We Cover
The primary voice catalogue targets Standard Mandarin (Putonghua, zh-CN). Beyond that, SpeechGen also provides Cantonese (Yue) voices for Hong Kong and Guangdong audiences, and the library is expanding toward Wu (Shanghainese), Southwestern Mandarin (Sichuanese), Jilu Mandarin (Shandong region), and Zhongyuan Mandarin (Henan). If you previously used a sub-dialect page, it now redirects here — all regional variants live under one roof.
Simplified vs Traditional Script
The voices read both simplified characters (简体, used in mainland China) and traditional characters (繁體, used in Taiwan, Hong Kong, Macau). Paste either script and the engine recognises the character set without manual switching. For Taiwan Mandarin intonation, look for cmn-TW voices in the catalogue — they carry slightly different prosody compared with the zh-CN default.
Chinese TTS — Frequently Asked Questions
More than 100 speakers across two quality tiers — Pro Neural and HD. The roster covers male and female voices with varied registers from formal broadcast to everyday conversational. HD speakers add emotional styles you can switch from the dropdown: cheerful, newscast, angry, chat, and more.
Yes. The phonology engine recognises all four lexical tones plus the neutral tone and applies standard sandhi rules — third-tone pairs, bù/yī alternation — automatically. You do not need to add tone marks or phonetic markup; paste hanzi and the output follows Putonghua pronunciation norms.
Mandarin (Putonghua) voices speak zh-CN Standard Mandarin with four tones. Cantonese (Yue) voices target the six-tone system spoken in Hong Kong and Guangdong. Both are available in the catalogue — filter by zh-CN for Mandarin or zh-HK / yue for Cantonese. The two language varieties are not mutually intelligible, so pick the one your audience actually speaks.
Yes. The engine accepts simplified (简体) and traditional (繁體) characters without any manual toggle. For Taiwan-accented Mandarin, select a cmn-TW voice — the prosody and certain vowel realisations differ slightly from zh-CN mainland voices.
Paste your vocabulary list or textbook dialogue, set playback speed to 0.75x, and listen phrase by phrase. The engine reads each syllable with the correct tonal contour, which makes it practical for ear-training before listening sections. You can also compare your own pronunciation with the generated audio by playing it back repeatedly.