RUTH TTS or TTS+ (a Puretalk.ai product)
Text-to-Speech (TTS), or speech synthesis models are becoming more and more indistinguishable from human sound.
In “A Survey on Neural Speech Synthesis” by Xu Tan, Tao Qin, Frank Soong, and Tie-Yan Liu, they speak of the complexity of “key components such as text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc.”
Methods, libraries & software used to compare voice quality: Documentation
Comparison Below
Below, you’ll find voice data from real human as well as various TTS models, Puretalk.ai TTS+ included. This collection of voice data from various TTS models helps to compare our in-house model (TTS+) to our competitors.
Here are some samples
“Sphinx of black quartz judge my vow, the July sun caused a fragment of black pine wax to ooze on the velvet quilt. While the vixen jumped quickly on her foe, barking with zeal.”
Voice Name | Clip |
---|---|
IOS 16 SIRI | |
Google C - US-Standard | |
Amazon Polly Joanna | |
Human Speaker | |
RUTH TTS+ | |
RUTH TTS+ Male | |
WellSaid Labs Alana | |
Wellsaid Labs Ramona | |
Microsoft Sara neural | |
Microsoft Aria | |
IBM Kevin | |
IBM Female | |
Google Wavenet | |
Microsoft Nancy Standard |
Technical features
The F0 and Intensity values below were determined using Praat from the clips above in which each voice reads the first two sentences of the article (~10 second clips each).
Voice Name | Average F0 (Hz) | Average Intensity (dB) | Synthesis model | Source |
---|---|---|---|---|
IOS 16 Siri | 116.8 | 67.1 | TBD | |
Google C - US-Standard | 133.2 | 74.7 | WaveNet | https://cloud.google.com/text-to-speech/docs/wavenet |
Human 1 | 126.9 | 68.1 | N/A | N/A |
Human speaker | 185.7 | 72.9 | N/A | N/A |
Ruth TTS+ | 184.6 | 67.4 | N/A | N/A |
iOS | 166.3 | 77.5 | TBD | |
Judy GL1 | 188.7 | 76.5 | Tacotron + Griffin Lim | https://github.com/mozilla/TTS/wiki/Mean-Opinion-Score-Results |
Judy GL2 | 197.3 | 72.7 | Tacotron2 + Griffin Lim | https://github.com/mozilla/TTS/wiki/Mean-Opinion-Score-Results |
Judy W1 | 187.3 | 76.9 | Tacotron + WaveRNN | https://github.com/mozilla/TTS/wiki/Mean-Opinion-Score-Results |
Judy W2 | 195.5 | 78.0 | Tacotron2 + WaveRNN | https://github.com/mozilla/TTS/wiki/Mean-Opinion-Score-Results |
LJ Speech | 215.4 | 73.4 | Tacotron + GriffinLim | https://github.com/mozilla/TTS/wiki/Mean-Opinion-Score-Results |
Mac Default | 113.6 | 65.6 | TBD | |
Nancy 1 | 197.7 | 75.2 | Tacotron + Griffin Lim | https://github.com/mozilla/TTS/wiki/Mean-Opinion-Score-Results |
Nancy 2 | 189.0 | 75.9 | Tacotron2 + WaveRNN | https://github.com/mozilla/TTS/wiki/Mean-Opinion-Score-Results |
Polly Joanna | 155.3 | 72.6 | TBD | |
Polly Matthew | 99.6 | 72.8 | TBD | |
Polly Sally | 192.2 | 73.1 | TBD | |
Voicery Nichole | 194.0 | 68.2 | TBD | |
Windows Zira | 176.9 | 66.1 | TBD | |
Windows David | 91.9 | 66.7 | TBD |
Did we get something wrong? If you were involved in the development of any of these voices or notice an error, please let us know so we can correct it by filing an issue or submitting a pull request. We’d appreciate it!
Cite our work
BibTeX coming soon!