Skip to editor

Japanese Text to Speech

Convert any Japanese text to speech — 100+ AI voices, pitch accent, free MP3.

ja-JP
Style
speed:1.0
pitch:0
Volume:100%
File
Pause
Clear
Step backward
Step forward
Ssml
Cut
Sound Selection

Japanese TTS — 100+ AI Voices with Tokyo Pitch Accent & Mora Timing

Paste any text and hear it read with the correct Tokyo pitch accent — the L-H-L pattern that distinguishes (bridge) from (chopsticks), the silent vowels in です /des/ and ます /mas/, the small sokuon mora, and the alveolar flap R that sits between English R and L. Hiragana, katakana, and kanji mix freely in one input. Pick a voice like Daichi (PRO Neural, male) or Akemi (PRO Neural, female) and download your MP3 in seconds.

For studio-grade output, Achird JP (HD, male) and Achernar JP (HD, female) deliver broadcast quality. The catalogue covers JLPT N5–N1 listening prep, anime dub and visual-novel character work, YouTube voice-over, audiobook narration of literary classics, and tourism audio guides for temples, shrines, and city tours. First 1,000 characters free — no account, no watermark.

  • 100+ native voices — Standard, PRO, HD
  • Hiragana, katakana & kanji mixed
  • Tokyo pitch accent & mora timing
  • Download MP3, WAV, FLAC, OGG
  • Free — 1,000 chars, no signup

Japanese AI Voices — Voice Samples

Click to preview · 100+ voices total

These are 4 featured speakers. Browse all 100+ on the voices page — filter by ja-JP.

Voice Styles — 3 Expressive Registers

Select PRO Neural voices unlock expressive styles on top of the default neutral register. Same sentence, same speaker — Nanami, a Japanese female PRO Neural voice — reads the line below in three different moods.

Style Listen Typical Use
cheerful Kids' content, upbeat announcements, promo spots.
chat Vlogs, casual explainers, podcast intros, friendly conversation.
customerservice IVR greetings, support lines, polite announcements, formal tone.

All three samples above read the same Japanese sentence. Nanami is the only ja-JP voice with multiple expressive styles (cheerful, chat, customer-service). The remaining 100+ Japanese voices read in their default neutral register.

Japanese Pronunciation Guide & Pitch Accent

Japanese pronunciation is defined by mora timing, pitch accent, and three writing systems working together. These six features are where TTS quality separates native-sounding audio from robotic output — hear how SpeechGen handles each.

Word / Phrase Romanization + Audio Feature What to Know
日本語 ni·HO·N·go Tokyo pitch accent Tokyo standard: low on first mora, high from second, then drops. にほんご = L-H-H-L pattern. Pitch accent is not stress — volume stays even while pitch changes.
おかあさん o·ka·A·san Long vowels (長音) The double あ in おかあさん (mother) holds for two morae. Compare おかさん — a meaningless shortening. Long vowels are written as ā in romaji. In TTS: use the correct kana and the engine handles length automatically.
がっこう ga·k·KO·u Mora timing (促音) The small っ (sokuon) is a silent mora — a brief stop before the following consonant. がっこう (school) has a closure before the k. Miss it and the word sounds unnatural. Every mora takes equal time in Japanese.
です des (not de·su) Silent vowels (無声化) In standard Tokyo Japanese, the vowels い and う are devoiced (whispered or silent) between voiceless consonants or at word end. です sounds like "des". Also: ます → "mas", in many words. AI voices handle this correctly.
らりるれろ ra·ri·ru·re·ro Japanese R (弾き音) The Japanese R is an alveolar flap — a single tongue tap against the roof of the mouth. It is neither English R nor L. Closest to the quick "d" in American "butter". Daichi and Akemi produce this sound correctly without any SSML adjustment needed.
橋 vs 箸 ha·SHI vs HA·shi Pitch minimal pair (bridge) = L-H pattern; (chopsticks) = H-L pattern. Same kana はし, different pitch — different meaning. This is why pitch accent matters in japanese pronunciation. The AI voice resolves ambiguity from kanji context automatically.

Why Pitch Accent Matters for TTS

  • Pitch, not stress — Japanese is a pitch-accent language, not a stress-accent language like English. Volume stays even; only the high/low pitch pattern shifts between morae. A wrong pitch pattern sounds foreign even when every sound is perfect.
  • Kanji resolves ambiguity — Many homophone pairs differ only in pitch (橋/箸, 雨/飴). When you input kanji, the AI voice selects the correct pitch pattern from context. Use kanji in your text for the most natural-sounding audio output.
  • Three writing systems, one engine — hiragana, katakana, and kanji can mix freely in the same input. Foreign loanwords in katakana (コーヒー, テレビ, パソコン) and brand names in romaji are all read correctly without manual phoneme intervention.

Formatting & Conventions for TTS

When preparing Japanese text for the voice generator, these formatting rules affect how the engine reads your content:

Numbers & Counters

Write numbers in kanji for the most natural reading: 三つ、五冊、二人. The language uses counter words (助数詞) that change by object type: 一本 (long objects), 一枚 (flat objects), 一匹 (small animals). Arabic numerals work too — 3 → さん — but kanji counters sound more native.

Currency

¥1,500 → "せんごひゃくえん". The yen sign is read automatically. For large amounts: 一万円 (10,000 yen) → "いちまんえん". Japanese uses 万 (10,000) as a unit — the engine handles 3万円 correctly without manual pronunciation markup.

Dates & Time

Japanese date order: year → month → day. 2024年3月15日 → "にせんにじゅうよねん さんがつ じゅうごにち". Time: 14時30分 → "じゅうよじ さんじゅっぷん". Write with kanji date markers (年・月・日・時・分) for correct reading.

Formality (敬語 Keigo)

Japanese has three registers: casual (だ/である), polite (です/ます), and honorific (keigo). Use です・ます endings for professional content, だ・だよ for casual voiceover. The voice engine reads both registers correctly — the choice of formality level is yours.

What You Can Create

Study desk with hiragana charts, JLPT textbook and headphones

Language Learning & Pitch Accent

Paste any sentence and hear exactly how the pitch contour rises and falls between morae. Slow playback to 0.75× to catch silent vowels and the small sokuon. Ideal for JLPT N5–N1 listening prep, shadowing drills against a native model, and drilling kanji vocabulary with the correct contextual reading.

Dark gaming desk with anime character on screen, RGB keyboard and manga volumes

Anime, Visual Novels & Character Voices

Cast character dialogue for anime dubs, gaming NPCs, cosplay reels, and visual novel scenes. Lower pitch by 4–6 semitones for villains and senior characters; raise it slightly for younger or energetic personas. Use Dialog Mode to assign distinct voices across multi-character scripts. Drop into Premiere, DaVinci, Unity, or Ren'Py.

Home studio with video editing timeline and voiceover waveform, teleprompter note

Content Creation & Voiceover

Add professional narration to YouTube videos, podcasts, and social-media reels in seconds. Achernar JP (HD) delivers broadcast-quality female narration; Daichi (PRO Neural) covers clear male delivery for explainers and product walkthroughs. Export as MP3 and sync into Premiere, DaVinci, CapCut, or any editor.

Travel flat-lay with torii gate figurine, Japan map, earbuds and shinkansen ticket

Tourism & Audio Guides

Build audio guides for temples and shrines (Kyoto, Nara, Nikko), city walking tours (Tokyo, Osaka, Sapporo), and ryokan welcome announcements. Generate Shinkansen and metro direction prompts, museum exhibit descriptions, and restaurant introductions. Download as MP3 and deploy on any device offline.

How It Works — 3 Steps

From text to audio in seconds. No software, no signup required.

01

Paste your text

Type directly or paste up to 1,000,000 characters. The engine handles hiragana, katakana, kanji, and mixed scripts in a single pass. Upload DOCX or PDF files for long documents.

02

Choose a voice

Pick from 100+ native speakers. Filter by gender, quality tier (Standard, PRO Neural, HD), and ja-JP. Adjust speed for pitch-accent practice, or set pitch for character voice styles in dub work.

03

Listen & download free

Click Convert to Speech, preview the result, and download as MP3, WAV, or FLAC. First 1,000 characters free — no account needed. No watermark on any plan.

Frequently Asked Questions

Which Japanese voice sounds most natural?

For broadcast and audiobook work, Achernar JP (HD, female) and Achird JP (HD, male) deliver the cleanest, most natural delivery — broadcast-level clarity with accurate Tokyo pitch contours. For everyday content and language learning, Daichi (PRO Neural, male) and Akemi (PRO Neural, female) carry a warm, conversational tone. All four handle hiragana, katakana, and kanji mixed in one input without phoneme markup.

How do I practice pitch accent with TTS?

Paste a word or sentence, generate the audio, and listen for the high-low contour between morae. Slow playback to 0.75× — pitch changes are easier to spot at reduced speed. For minimal pairs like (bridge, L-H) vs (chopsticks, H-L) — same kana はし, different pitch — generate both as separate clips and compare. Type kanji rather than kana so the engine resolves the correct pitch from context.

Can I use these voices for anime and character work?

Yes. Pick a male or female voice, paste the dialogue, adjust pitch by 4–6 semitones to shape character — lower for villains and senior figures, higher for young or energetic personas. Export the MP3 and sync into Premiere, DaVinci, Unity, or Ren'Py. Dialog Mode lets you assign distinct voices to characters across a multi-line script in one session — useful for fan dubs, visual novels, and indie game NPC chains.

Is it really free to download the MP3?

Yes. The first 1,000 characters are free with no account, no card, no watermark — just paste, generate, and download. Create a free account and you get an additional 3,000 characters per day for seven days. Every file ships with a commercial licence built into every plan, so the audio works on monetised YouTube, podcasts, indie games, and client work without extra fees.

Does the engine handle pitch accent correctly?

Yes. The PRO Neural and HD voices are trained on standard Tokyo Japanese and reproduce correct pitch patterns for common vocabulary. Kanji input resolves homophone ambiguity from context — and read differently even though both romanise as "hashi". For rare words, proper nouns, or specialist vocabulary, drop in an SSML <phoneme> tag to specify pronunciation explicitly.

We use cookies to ensure you get the best experience on our website. Learn more: Privacy Policy

Accept Cookies