Please think about ElevenLabs? It's a new age for TTS honestly

For about 2 years now, I have paid $5/mo for ElevenLabs (and others) as a package with an Anki card enhancement audio tool called AwesomeTTS. It is honestly SO, UNBELIEVABLY, GOOD, the dream of having a human voice read all the sentences I manually input to study is no longer of much importance.

As a lifetime Clozemaster user, … even if I could plug in my own API key or something and switch the audio over, … wow. Please consider it :wink:

There may be other products, AI TTS, that are similarly good - I’m not stuck on the one - it is simply the one I’m aware of existing.

6 Likes

I totally agree - the voices we get on Clozemaster do sound a bit dated… :sweat_smile:

2 Likes

Elevenlabs is excellent, but at the scale of voice work that Clozemaster requires, it would be massively expensive.

2 Likes

@Lernen_und_Fahren Are you considering caching at all in this response? I mean, depending on the approach, there’s no reason any sentence ever needs to be “read” a second time.

Yes, of course, clozemaster wouldn’t have to make a new API request to elevenlabs each time a sentence is viewed. Almost certainly they would pre-generate audio for each sentence exactly once each and then serve up that audio as static data on each clozemaster user request. But that doesn’t change the fact that they would have to pre-render literally thousands of audio snippets per supported language. I don’t have access to the numbers, but I would guess it’s easily in the tens of thousands of sentences that would have to be generated, and maybe even into the hundreds of thousands in total. That would cost a lot of money.

2 Likes

Which languages and voices have you used / do you like? We tried it and found it would sometimes sound like an English person saying something in the target language, for example, or garble the audio entirely. It happened only occasionally, but it was enough to give us pause. That was 6+ months ago at this point, so perhaps we can revisit. Agreed when it works it’s awesome.

2 Likes

@mike I’m glad it gave you pause as the current voices for Italian are great, particularly Giorgio and Carla who have been with me since Day 1 of my learning.

1 Like

There is definitely a problem - apparently it has “multilingual” mode always on. When I do German sentences if there is a number of date it comes out in English for no reason. Otherwise the German voice was absolutely amazing. Someone on Reddit gave me a good idea: use AI to spell-out all numbers intentionally. After doing this, it would properly read the spelled-out numbers in German.

I hope they get this stuff together and realize there needs to be an API parameter to intentionally pick ONE language.

I really want to see it used for Indonesian, but unfortunately I have not tested it extensively for that language and cannot share any details. I only know it is amazing for German minus the numbers issue.

1 Like

I agree that having better audio, especially in less ‘popular’ languages and for custom sentences is the most important missing improvement for Clozemaster (which otherwise is the best vocabulary app to supplement in person classes from A2 level onwards). Emphasis, intonation, rhythm and other prosody aspects are often very wrong in current audio and from B1 onwards that starts to get quite important. I’m learning Dutch and there is also the issue of Netherlands Dutch vs. Flemish. The quality of voices varies, even on ElevenLabs, which in another argument why being able to choose the voice would be the perfect solution. Would some integration to allow somewhat semi-automatic generation of audio for new sentences with user’s own ElevenLabs (or another quality provider) account once and then keeping the audio locally for the user on the device for future be a feasible solution?

the ‘multilingual’ voices indeed do not seem to be the best - looks more as a gimmick than improvement. However, there are quite few one language only voices with no ‘multilingual’ option and those can be very good