Artificial Intelligence

0 readers

0 users here now

Reddit's home for Artificial Intelligence (AI).

founded 1 year ago

MODERATORS

[email protected]

I read the papers for you: Comparing Bark and Tortoise TTS for text-to-speech applications (lemmit.online)

submitted 1 year ago by [email protected] to c/[email protected]

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/Successful-Western27 on 2023-08-09 14:20:43.

If you're creating voice-enabled products, I hope this will help you choose which model to use!

I read the papers and docs for Bark and Tortoise TTS - two text-to-speech models that seemed pretty similar on the surface but are actually pretty different.

Here's what Bark can do:

It can synthesize natural, human-like speech in multiple languages.
Bark can also generate music, sound effects, and other audio.
The model supports generating laughs, sighs, and other non-verbal sounds to make speech more natural and human-sounding. I find these really compelling and these imperfections make the speech sound much more real. Check out an example here (scroll down to "pizza.webm").
Bark allows control over tone, pitch, speaker identity and other attributes through text prompts.
The model learns directly from text-audio pairs.

Whereas for Tortoise TTS:

It excels at cloning voices using just short audio samples of a target speaker. This makes it easy to produce text in many distinct voices (like celebrities). I think voice cloning is the best use case for this tool.
The quality of the synthesized voices is pretty high.
Tortoise supports fine-grained control of speech characteristics like tone, emotion, pacing, etc through priming text.
Tortoise is only trained on English and it's not capable of producing sound effects.

Here's how they compare to the other speech-related models I've taken a look at so far:

I have a full write-up here if you want to read more, it's about a 10-minute read. I also looked at the model inputs and outputs and speculated on some products you can build with each tool.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here