Google Text-to-Speech


Google Text-to-Speech is a screen reader application developed by Google for its Android operating system. It powers applications to read aloud the text on the screen with support for many languages. Text-to-Speech may be used by apps such as Google Play Books for reading books aloud, by Google Translate for reading aloud translations providing useful insight to the pronunciation of words, by Google Talkback and other spoken feedback accessibility-based applications, as well as by third-party apps. Users must install voice data for each language.

Supported languages

Google Text-To-Speech Android application

Bengali, Bengali, Cantonese, Chinese, Chinese, Czech, Danish, Dutch, English, English, English, English, English, Estonian, Filipino, Finnish, French, French, German, Greek, Gujarati, Hindi, Hungarian, Indonesian, Italian, Japanese, Javanese, Kannada, Khmer, Korean, Malayalam, Marathi, Nepali, Norwegian Bomkål, Polish, Portuguese, Portuguese, Romanian, Russian, Sinhala, Slovak, Spanish, Spanish, Sundanese, Swedish, Tamil, Telugu, Thai, Turkish, Ukranian, Urdu, Vietnamese

Google Cloud Text-To-Speech

Arabic, Bengali, Burmese, Czech, Danish, Dutch, English, English, English, English, Filipino, Finnish, French, French, German, Greek, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Mandarin Chinese, Norwegian, Polish, Portuguese, Portuguese, Russian, Slovak, Spanish, Swedish, Turkish, Ukrainian and Vietnamese

Evolution

Some app developers have started adapting and tweaking their Android Auto apps to include Text-to-Speech, such as Hyundai in 2015. Apps such as textPlus and WhatsApp use Text-to-Speech to read notifications aloud and provide voice-reply functionality.
Cloud Text-to-Speech is powered by WaveNet, software created by Google's UK-based AI subsidiary DeepMind. Since Google bought DeepMind in 2014, it's been exploring ways to turn the company's AI talent into tangible products. Integrating WaveNet into its cloud service is significant as Google tries to win the cloud business away from Amazon and Microsoft, presenting its AI skills as its differentiating factor.
DeepMind's AI voice synthesis tech is notably advanced and realistic. Most voice synthesizers use concatenative synthesis, in which a program stores individual syllables — sounds such as “ba,” “sht,” and “oo” — and pieces them together to form words and sentences. WaveNet instead uses machine learning to generate speech. It then waveforms from a database of human speech and re-creates them at a rate of 24,000 samples per second. The end result includes voices with subtleties like lip smacks and accents. When Google first unveiled WaveNet in 2016, it was too computationally intensive to work outside of research environments, but it's since been slimmed down significantly, showing a clear pipeline from research to product. Google Cloud Text-to-Speech converts text into human-like speech in more than 180 voices across 30+ languages and variants. It applies groundbreaking research in speech synthesis and Google's powerful neural networks to deliver high-fidelity audio.
Includes exclusive access to WaveNet technology DeepMind has done groundbreaking research in machine learning models to generate speech that mimics human voices and sounds more natural, reducing the gap with human performance by 70%. Cloud Text-to-Speech offers exclusive access to 90+ WaveNet voices and will continue to add more over time.

Version history

November 2013