Skip to editor

Vietnamese Text to Speech

Convert text to natural Vietnamese speech — 89+ AI voices, free MP3 download.

vi-VN
Style
speed:1.0
pitch:0
Volume:100%
File
Pause
Clear
Step backward
Step forward
Ssml
Cut
Sound Selection

89 Vietnamese Neural Voices — Six Tones, Northern & Southern Accents

Hear any text to speech Vietnamese passage read with correct tones and natural rhythm. The engine handles all six diacritical tones — from the level ngang to the heavy nặng — across 89 voices trained on native tiếng Việt pronunciation. Pick a speaker like Mon (Neural, female) or Dang (Neural, male) and download your audio file in seconds.

The catalogue covers standard Northern (Hanoi) delivery, the register most broadcast media in Vietnam uses, alongside voices shaped by Southern speech patterns. Useful for content creators localising into the Southeast Asian market, language learners drilling the difference between and mả, audiobook producers working with modern prose, and developers integrating a Vietnamese AI voice into apps. First 1,000 characters free — no account, no watermark.

  • 89 native voices — Neural & HD tiers
  • All six tones handled natively
  • Adjustable speed & pitch
  • Download MP3, WAV, FLAC, OGG
  • Free — 1,000 chars, no signup

Vietnamese Voice Samples — Male & Female, Neural & HD

Click to preview · 89 native voices total

These are 4 featured speakers. Browse all 89 Vietnamese voices on the voices page — filter by vi-VN.

Vietnamese Pronunciation — Six Tones & Diacritic Vowels

Vietnamese is a tonal language — the same syllable changes meaning entirely depending on its tone mark. Click play to hear each phrase read aloud by a native voice.

Phrase Approx. Sound Play What It Shows
ma má mà mả mã mạ ma (level) · má (rising) · mà (falling) · mả (dipping) · mã (broken) · mạ (heavy) Six tones on one syllable — ghost, mother, which, tomb, horse, rice seedling
Xin chào, rất vui được gặp bạn. sin CHOW, rut VOO-ee duhk GAP ban “Hello, nice to meet you.” — common greeting with mixed tones
Cảm ơn rất nhiều. KAM uhn rut NYEW “Thank you very much.” — dipping tone on cảm, nasal final
phở bò, bún chả, bánh mì FUH baw, BOON chah, BAN mee Iconic foods — hook-above tone on phở, palatal ch, full diphthong in bánh
ăn, âm, ê, ô, ơ, ư an, uhm, eh, oh, uh, ew Six extra vowels — ă â ê ô ơ ư — each distinct from standard Latin
Tiếng Việt không khó. TYENG vyet KHONG kho “Vietnamese is not hard.” — nasal finals (-ng, -nh) with tone contrast
Hà Nội là thủ đô của Việt Nam. ha NOY la too DOH koo-a vyet NAM “Hanoi is the capital of Vietnam.” — formal sentence with mixed tones

What Makes Vietnamese Challenging for TTS

  • Six lexical tones — level (ngang), rising (sắc), falling (huyền), dipping-rising (hỏi), creaky-rising (ngã), and heavy (nặng). Mispronounce one and the word changes meaning completely.
  • Diacritic vowels — ă, â, ê, ô, ơ, and ư are not decorative. Each produces a different mouth shape and duration, and our voices distinguish them all.
  • Nasal finals — consonant clusters like -ng and -nh close syllables with a distinct resonance. The engine preserves the nasal quality instead of clipping it.

Vietnamese Text — Formatting & Conventions

Small details in how you format the source text change how it comes out aloud. Four conventions worth knowing:

Numbers

1.500.000 reads as “một triệu năm trăm nghìn”. Vietnam uses the period as a thousands separator and the comma for decimals — the opposite of English. Enter numbers in this format for correct output.

Currency

50.000đ reads as “năm mươi nghìn đồng”. Place the đ symbol after the number. For dollar amounts, write $10 and the engine reads “mười đô la”.

Dates & Time

15/04/2026 reads as “ngày mười lăm tháng tư năm hai không hai sáu” — day-month-year, the standard format in Vietnam. 24-hour clock is default: 14:30 becomes “mười bốn giờ ba mươi”.

Tone Marks

Always include diacritics: phở (soup) vs pho (toneless, mispronounced). Missing marks force the engine to guess, and some syllables have four or five valid readings. The more accurate your input, the better the output.

When to Use Vietnamese TTS

Young Vietnamese content creator recording a voiceover in a Hanoi home studio with microphone and laptop

Content Creation & Voiceover

Add a Vietnamese voiceover to YouTube videos, TikTok clips, and Facebook Reels. Vietnam is one of the most active creator-economy markets in Southeast Asia — pick a natural voice, export the audio file, and drop it into Premiere, DaVinci, or CapCut.

Language student practicing Vietnamese pronunciation with headphones and a notebook showing tone marks

Language Learning & Pronunciation

Six tones make this one of the hardest languages to learn by ear alone. Slow the playback to 0.75× and listen to each syllable until you can tell from mả. Helpful for Việt Kiều reconnecting with the language, expats preparing for life in Hanoi or Saigon, and university students studying Southeast Asian linguistics.

Open Vietnamese novel with headphones and warm lamplight on a wooden table

Audiobooks & Narration

Turn manuscripts into sách nói (audiobooks) with a natural narrator. Modern prose, ghost stories, and contemporary fiction are growing fast on platforms like Voiz FM and Fonos. Use Dialog Mode to assign distinct speakers to characters and produce a full-cast recording.

Modern Vietnamese corporate meeting room with a business presentation slide in Vietnamese on a large screen

Business Presentations & E-Learning

Voice corporate decks and onboarding modules for the fast-growing Vietnamese market. Fintech firms, tech startups, and international companies with offices in Ho Chi Minh City or Hanoi use narrated slides for internal training and investor updates. Export the track and embed it directly in your slide deck or learning-management system.

How to Generate Vietnamese Voice in 3 Steps

Three steps to convert text to voice. No software, no signup.

01

Paste or type your text

Type directly or paste up to 1,000,000 characters. Upload DOCX, PDF, or SRT files. Works with any text in quốc ngữ script — articles, screenplays, study notes, product descriptions.

02

Choose a voice

Pick from 89 native speakers. Filter by gender and quality tier — Neural or HD. Adjust speed and pitch to fine-tune the reading for your project.

03

Listen & download free

Click Convert to Speech, preview the result, and download as MP3, WAV, or FLAC. First 1,000 characters free — no account needed. No watermark on any plan.

What Makes Vietnamese Unique — and Why It Matters for TTS

Three features of the language that any text-to-speech engine must handle correctly:

Tonal System (6 tones)

Every syllable carries one of six tones marked by diacritics — dấu sắc (rising), dấu huyền (falling), dấu hỏi (dipping), dấu ngã (broken), dấu nặng (heavy), and unmarked ngang (level). A misread tone turns “mother” into “horse”. The neural voices apply all six correctly.

Latin Script with Diacritics

Unlike Chinese or Japanese, Vietnamese uses the Latin alphabet (quốc ngữ) augmented with diacritics for tones and vowel modifications. The engine parses ă â ê ô ơ ư natively — no romanisation step needed, just paste the original text.

Northern vs Southern Accent

Hanoi (Northern) and Saigon (Southern) differ in consonant onsets, tone contours, and everyday vocabulary. Most voices in the catalogue use standard Northern pronunciation, the norm for broadcast media. Useful for formal narration; Southern speakers may notice slight differences in casual phrases.

Vietnamese Text to Speech — FAQ

Do you have a Vietnamese female voice?

Yes. The catalogue includes both female and Vietnamese male speakers across Neural and HD tiers. Mon (Neural, female) is the most popular native voice in the corpus. Achernar VN (HD, female) delivers studio-level quality. Filter by gender on the voices page to browse the full list.

Is there a free way to use this tool?

Yes. The first 1,000 characters are free with no account, no card, and no watermark. Create a free account for an additional 3,000 characters a day for seven days. Paid plans raise monthly limits and unlock extras, but commercial use is included in every tier — even the free one.

Can I use your voices via an API?

Yes. The TTS Vietnamese service is available through a hosted API with documentation, code samples, and per-request billing. If you need to integrate speech synthesis into a mobile app, chatbot, or e-learning platform, the API supports all 89 voices and the same speed-and-pitch controls as the web editor. For open-source alternatives, dedicated research projects exist, but the hosted service offers higher quality and zero setup.

Do your voices use Northern (Hanoi) or Southern (Saigon) pronunciation?

The majority of voices follow standard Northern pronunciation — the register used by national broadcasters in Vietnam. This means consonant onsets like tr- and r- sound distinctly Northern. Southern listeners will understand the output without any issue; the difference is comparable to British vs American English — noticeable but clear.

What formats can I download?

The default export is MP3. You can also choose WAV, FLAC, or OGG. All four deliver the same audio quality and ship watermark-free with a commercial licence. Pick MP3 for web and social media, WAV for video editors that need uncompressed input, FLAC for archival.

We use cookies to ensure you get the best experience on our website. Learn more: Privacy Policy

Accept Cookies