How Russian-from-English pronunciation is broken and how to fix it

alanf_us · November 3, 2024, 3:53pm

Over the nearly five and a half years I’ve been at Clozemaster, I’ve reported literally dozens of mistakes (more than 80 in 2024 alone) in the pronunciation field for sentences in Russian from English. On some of these occasions, and in a thread from January 2021, “Pronunciation” field useless for Russian, but could be made useful, I’ve touched in a more general way on the fundamental problems with Clozemaster’s approach toward pronunciation in Russian. Today I am going to post an even more comprehensive overview, complete with (1) examples, (2) a description of how I’ve reported problems so far, and (3) a request for action.

Russian is a nearly phonetic language, meaning that there is a nearly one-to-one correspondence between letters and sounds. Monosyllabic words are generally unstressed (at the sentence level), and there is always a single accented syllable in a multisyllable word (with only one exception I’ve ever found, namely the word трёхно́гая). These factors should make pronunciation simple. However, there is one catch: the placement of the accented syllable is extremely hard for a non-native speaker to guess. This, coupled with the fact that “о” is pronounced “o” in an accented syllable, but “a” in an unaccented syllable, means that you cannot rely on the spelling of a word to tell you how it is pronounced. Note that there is one letter, “ё” (pronounced “yo”), that always receives the accent when it appears in a word. This should simplify matters. However, the dots are often omitted, at Clozemaster and elsewhere, making the letter look like “е” (pronunced “yeh”). This brings us back to confusion. Finally, there are many instances of multiple words with the same spelling that differ in the placement of the accent. In order to pronounce the word right, you need to understand the meaning and grammar of the sentence and its constituent words.

The handling of pronunciation at Clozemaster is inconsistent across sentences, with the result that the pronunciation field for a particular Russian sentence may contain any of the following:

(1) no pronunciation at all
(2) a pronunciation in Latin characters, which is of little value; it shows what anyone with even a few weeks’ acquaintance with the language should know (the correspondence between a Russian letter and its English counterpart), but fails to show the really important information, namely where the accent falls
(3) a pronunciation in Cyrillic characters, with accent marks, of the base/dictionary/lemma forms of the individual words in a sentence, which is also generally useless because the accent may shift when the word is declined
(4) a pronunciation in Cyrillic characters of the words in the sentence, with accents on some of the words, but with the accent mark omitted for key words (such as the cloze)
(5) a pronunciation in Cyrillic characters of all of the words in the sentence, with accent marks on all of them, even the monosyllabic ones, leaving the mistaken impression that those words are stressed at the sentence level

Finally, it looks pretty clear that at Clozemaster, an automatic stress-guesser with no knowledge of meaning or grammar has been used to provide pronunciations for words. When it encounters a word that could have either of two accented syllables, depending on its meaning or case, it often either guesses the wrong one or refuses to indicate the stress at all. Figuring out the error rate is not easy, but I would estimate the percentage of cases where the wrong stress is indicated for a particular word that I’m interested in could be as high as 15%. (@mike reported the rate of sentences with any pronunciation error at 6%. But given the fact I’m focusing on particular words/sentences, there might be a higher rate.) With such a high likelihood of error, I can’t trust the stress-guesser at all, which means that I need to look up the word at Wiktionary.

As a result, I have to go through the following time-consuming procedure every time I see a word whose pronunciation is not given or I don’t feel I can trust:

(1) Look it up at Wiktionary (which may take several steps, depending on whether the inflected form has a top-level page of its own).
(2) Copy the accented form into my clipboard.
(3) Determine whether the Clozemaster form matches the Wiktionary one.
(3.1) If the pronunciation is correct, copy the accented form, followed by the words “is correct”, into the pronunciation field for the Clozemaster sentence.
(3.2) If the pronunciation is not correct:
(3.3.1) Report the problem with the “Report” button and
(3.3.2) Report the problem in the Clozemaster forum and
(3.3.3) Edit the Clozemaster sentence itself in a way that lets me know both the right answer and that I’ve already reported an error. (My way of doing this is to copy the word into both the pronunciation field and the “Notes” field.)

As you can imagine, this is very time-consuming, even on a computer (as opposed to a phone), and even though I have set up shortcuts to make all this work as efficient as possible.

Why do I go to the trouble of reporting the problem with the “Report” button and posting about it on the forum? I suspect that if and when Clozemaster gets around to fixing these sentences, it will be more efficient to process the ones reported with the “Report” button. However, I have no way of tracking the sentences I’ve reported with this button, so I also post about them on the forum, which lets me see them in one place, together with the dates of the posts.

Now that I’ve described the problem, and how much trouble it’s causing me, here’s my call to action for @mike:

(1) Get someone to go through all the pronunciation mistakes that have been reported and fix the corresponding sentences. The work of finding the problems has been done for Clozemaster for free. Now take advantage of it. I haven’t received notification of a problem I’ve reported on a Russian sentence since July 2022, more than two years ago.
(2) Get rid of the Latin-character transcriptions. As I mentioned, they’re useless, and the fact that there’s a haphazard mix of these transcriptions with Cyrillic ones just (a) makes the site look less professional and (b) complicates the task of finding missing pronunciations.
(3) Get a human to provide Cyrillic prons, including accent marks, where a sentence has none. If it’s too time-consuming to do this for every word in the sentence, do it for the key words (the cloze and/or words where the stress is not obvious). Do not place accent marks on monosyllabic words.

I realize that these steps, especially item 3, will take time and possibly money. But Clozemaster is a service that people like me are paying for. If there’s time and money available for changing the appearance of the user interface, there should be time and money for ensuring that the information presented is correct.

I will post separately in this thread two tables that I’ve compiled. The first shows the pronunciation errors I’ve reported over the course of a month, while the second shows the posts in which I’ve talked about systemic errors.

If any native Russian speakers see errors in the pronunciations I found at Wiktionary, please let me know. I’m also curious whether other Russian learners have been aware of the bad pronunciations and if so, how they’ve dealt with them.

Thanks for reading through this long post.

alanf_us · November 3, 2024, 3:55pm

Pronunciation mistakes I reported during the month from 2024-09-27 to 2024-10-26, with corrected forms taken from Wiktionary:

Day Reported	Sentence	Pronunciation
2024-10-26	Когда увидите её в следующий раз, передайте ей мои наилучшие пожелания.	наилучшие → наилу́чшие
2024-10-23	Я купил набор совершенно новых колонок.	колоно́к → коло́нок
2024-10-23	На поле собрались много ворон.	во́рон → воро́н
2024-10-23	Собака на цепи?	цепи́ → це́пи (note: a native speaker tells me that цепи́ is actually correct)
2024-10-23	Она суверенная королева.	суве́ренная → сувере́нная
2024-10-23	Тигрёнок выглядел как большой котёнок.	Тигре́нок → Тигрёнок
2024-10-23	Отче наш, сущий на небесах, да освятится имя Твоё.	Твое́ → Твоё
2024-10-23	Фильм популярен среди молодёжи.	молоде́жи → молодёжи
2024-10-23	Её синие туфли хорошо подходят к платью.	платью → пла́тью
2024-10-22	Радуга образует в небе дугу.	ду́гу → дугу́
2024-10-22	Я чувствую себя отчуждённым.	отчужде́нным → отчуждённым
2024-10-22	Сколько у нас есть мисок?	мисо́к → ми́сок
2024-10-22	Собака трехногая.	трехнога́я → трёхно́гая (the only exception I’ve found so far to the rule that a Russian word only contains a single accented syllable)
2024-10-22	Они вас не обидят.	обидя́т → оби́дят
2024-10-17	Том каждый месяц кладет немного денег на свой сберегательный счет.	кла́дет → кладёт
2024-10-16	Результат оказался разочаровывающим.	разоча́ровывающим → разочаро́вывающим
2024-10-16	Её голова взрывалась от новых идей.	взры́валась → взрыва́лась
2024-10-15	Она с легкостью забралась на лошадь.	ле́гкостью → лёгкостью
2024-10-15	Он был освобождён от должности присяжного.	освобожде́н → освобождён
2024-10-14	Остановите отсчет.	отсче́т → отсчёт
2024-10-14	Это хороший источник белка?	бе́лка → белка́ (different meaning)
2024-10-10	Все, кто пришли, были схвачены.	Vse, kto prishli, byli shvacheny.
2024-10-08	Все тайное станет явным в своё время.	Все́ → Всё
2024-10-07	Ваша машина превысила скорость.	превыси́ла → превы́сила
2024-10-07	Она превысила лимит по своей кредитной карте.	превыси́ла → превы́сила
2024-10-07	Новые офисные здания, похоже, появляются по всему городу.	офи́сные → о́фисные
2024-10-06	Новостные станции используют вертолеты для освещения дорожной ситуации.	вертоле́ты → вертолёты
2024-10-06	Пруд замёрз.	заме́рз → замёрз
2024-10-06	У него распространенное имя.	распростра́ненное → распространённое
2024-10-04	Ты обладаешь даром предвидения?	предвидения → предви́дения
2024-10-03	Она хвастается, что хорошо готовит.	хвастается → хва́стается
2024-10-02	Пожалуйста, скажи шеф-повару, что было вкусно.	повару́ → по́вару
2024-10-01	Напряженность между двумя странами растёт.	напря́женность → напряжённость
2024-09-28	Мальчик выкопал могилу для своего мёртвого питомца.	питомца́ → пито́мца
2024-09-27	У нас было три самолёта.	самоле́та → самолёта
2024-09-27	Это платье ей очень идет.	иде́т → идёт
2024-09-27	Сколько дней потребуется, чтобы отёк спал?	оте́к → отёк
2024-09-27	Бедные становятся ещё беднее.	бе́днее → бедне́е

Note that a native speaker confirmed that all the Wiktionary forms except one (це́пи) were correct.

alanf_us · November 3, 2024, 4:00pm

Some of the posts in which I commented on pronunciation mistakes in depth:

Day Reported	Sentence	Comments
2024-09-10	Фактически, я изучил все содержание.	Pronunciation: все́ → всё. This is one of those cases where whatever person or software is writing the pronunciation needs to know that содержание is singular neuter and thus все must be, too. This means that it’s actually всё rather than все (the plural form), even though when the two dots are omitted (as they sometimes are), the words look the same.
2024-08-27	Он замер от страха.	Also, as I’ve mentioned for other sentences, putting an accent on the one-syllable words он and от is at best unnecessary and at worst misleading because these words are not stressed in comparison to the other words in the sentence.
2024-08-12	Он сдает кровь.	… I always report them through the “Report” button, and sometimes I report them here through the “discussion” button as well. However, I haven’t received notification of a fixed sentence in Russian since July 2022, and I haven’t been notified of a fixed sentence in any language since February of this year. Update: Here are two similar cases: Завтра нас ждёт большое приключение. Мы нашли муравьёв в корзине для пикника. However, in both of them, the cloze is spelled correctly while the corresponding term in the pronunciation is wrong. Specifically, they’re using accented е (е́) (“yeh”) in the pronunciation where they should be using ё (“yo”).
2024-08-09	Я невысокого мнения о ней.	The listed pronunciation is “Ya nevysokogo mneniya o nej.” As I’ve mentioned elsewhere, @mike, Latin-character text does a poor job of representing Russian pronunciation in general, especially since it fails to indicate the accented syllables. In this case, it’s especially erroneous to represent -ого as “ogo” because the “г” is actually pronounced as “v”, not “g”. In addition, unstressed “о” is pronounced like “a” in most varieties of Russian. If whatever automatic pronunciation generator you’re using doesn’t know the location of the accented syllables, you’re better off not including a pronunciation at all.
2024-08-08	Он сдает кровь.	The pronunciation is сдаёт, not сда́ет. It’s sad to see that even sentences belonging to the new Fast Track, which is supposed to be of higher quality, have incorrect pronunciations.
2024-06-12	Я не понимаю такого хода мысли.	This is another case where the automatic accent generator was faced with two possible stress patterns and chose the wrong one because it couldn’t figure out the case. The pronunciation should say хо́да, not хода́. I reported this via the “Report” button as well.
2024-01-26	Объясни конкретно, каковы причины.	This is another example of a case where providing the pronunciation of the dictionary forms of the words, rather than of the declined forms, is useless. Here’s the pronunciation: Ob#yasni konkretno, kakovy prichiny. ‧ объясни́ть / конкре́тно / како́в / причи́на What I really want to know is where the accent falls in the word каковы. But I need to go outside Clozemaster (to Wiktionary) to do it. Where the accent falls in the dictionary form каков is useless information in pronouncing the word каковы.
2023-11-01	Большинство людей не признают своих ошибок.	This is an example of a sentence where the lack of an accent mark in the pronunciation field is problematic. According to Wiktionary (признают), призна́ют is perfective, while признаю́т is imperfective. So the pronunciation is important, but as in many cases where the syllable stressed differentiates between a perfective and an imperfective form, Clozemaster simply omits the accent mark on this word, even though it’s the cloze word and therefore the most important one in the sentence.

BIG_JEFF · November 4, 2024, 2:50pm

wow

we really need native checked/native voiced content for russian. thanks for the write up.

mike · November 5, 2024, 10:52am

@alanf_us this is great! Thanks for posting! Super helpful.

Good points!

The new Fast Track collections have pronunciations that have all been proofread and approved by a native speaker who also did most of the translations for those collections. For other collections an automatic stress-guesser was used.

We’ll look into why you’re not receiving notifications. Any chance they’re ending up spam given the volume?

It looks like we are caught up on your reports through October 26. For example it looks like you reported “Когда увидите её в следующий раз, передайте ей мои наилучшие пожелания.” on October 26 which has since been updated to “Когда́ уви́дите её в сле́дующий ра́з, переда́йте е́й мои́ наилу́чшие пожела́ния.” Please note if you updated the pronunciation field on your end, that will take precedence in what you see.

All time we’ve received 3631 reports for all of Russian. 3408 of those, 93.9% are currently resolved. All that’s just to say we are actively working on reported sentences despite the lack of notifications on your end. FYI @BIG_JEFF

Thanks for this!

(1) As mentioned above - I’m not yet sure why you haven’t received any notifications, but we are actively fixing reported sentences, including ones you’ve reported as recently as a October 26.

(2) We have removed the Latin-character transcriptions as far as I can tell. The catch is that, at some point, when you edited a sentence, all the fields are saved, even if you only edited the text for example. Then when you see that sentence from then on, you only see the attributes that you’ve saved. So if you played sentence X, and edited the text, the pronunciation was saved too. From then on when you see X, you see the text and pronunciation that was saved, even if the underlying pronunciation for sentence X has been updated.

I’ll have to check if it still works that way. The ideal is likely that if some field matches the underlying sentence attribute when a sentence is updated, then we don’t save that field in the way I described.

In any case, you’ve been playing a long time, which is very much appreciated of course, but which has allowed the bug to propagate. A quick check in the database looks like you have ~1800 sentences with one or more latin characters in the pronunciation, vs a total of just 33 sentences with one or more latin characters in the pronunciation field across all collections.

This action item should already be resolved for newer users. In your case - I wanted to check before removing all pronunciations with one or more latin characters. If you’d like us to do that please let me know. Also please note, you mentioned adding “is correct” - I’m not seeing any pronunciation attributes with “is correct”, but I do see 344 notes attributes with “is correct.”

All this said - please let me know in which collections you’re seeing the latin transliterations in the pronunciation fields and for which sentences and I can double check.

(3) We have had a human go through all stress marks for the new Fast Track, but it sounds like there are still some issues. We’ll have to give some more thought to better process like I describe below before moving forward.

Nice jab.

Agreed! We’ll have to find a new proofreader and/or try a new approach. We machine generated the stress marks then had them proofread them. Perhaps it’d be better to simply have them added the stress marks from scratch. Not sure which might be more prone to errors.

How do you feel with respect to machine generated stress marks vs no stress marks at all?

If you had to pick one, would you prefer native speaker audio recordings or improved stress marks / pronunciations?

Thanks again for this post! Anything I missed please let me know.

BIG_JEFF · November 5, 2024, 4:06pm

@mike thanks for addressing this.

yottapolyglot · November 5, 2024, 8:09pm

Impessive work @alanf_us!

I am aware of the bad pronunciations, only thanks to your posts on subject.

Very sad to say, but I decided not to use clozemaster for Russian, for this very reason.

It is also the reason why I have been anxiously waiting for the new FT to be completed.

I’m not sure what you mean by Latin-character transcriptions, but I’ll say that despite Russian being mostly a phonetic language, I still would like to have the IPA transcriptions if possible.

I know this question was not addressed to me, so I hope you guys (@alanf_us, @mike) won’t mind if I share my opinion.

If I had to pick one, it would be native speaker audio recordings, and it’s not close.

alanf_us · November 6, 2024, 12:15am

@mike, thank you so much for your detailed reply! I really appreciate it.

I didn’t know that. I still play a lot of review sentences from collections that predate the new Fast Track, so I continue to see many automatic stress-guessed pronunciations.

I had no idea my reports were being acted upon. That’s very good news. I did check my spam folder, but I didn’t see any notifications there. I also checked the folder I put these notifications into, but I only saw notifications for the other languages I play, not for Russian. However, today I did receive a notification for a Russian sentence that I believe I reported recently. As far as I can tell, that was the first notification of an updated Russian sentence that I’ve seen in years.

Today I reported a sentence that has both a Latin transcription and accented lemmas in its pronunciation field, but no accented declined words. I don’t think I ever saved this sentence since there’s nothing in the pronunciation or notes field that suggested I did.

I’m not sure what you mean by “you have”, and I’m not sure what you mean by “across all collections”. Do you mean something along the lines that my copies of sentences still have Latin-character pronunciations that have already been removed from the originals?

Yes, I would really appreciate that.

That’s what I meant – I add “<X> is correct” in the notes field.

I was trying to make a straightforward comment rather than an obnoxious one, but it might have come across wrong.

My guess is that proofreading auto-generated stress marks would be more prone to errors.

If the error rate is not vanishingly small, I’d rather have none.

Personally speaking, I would prefer the latter because:
(1) audio takes longer to play, whereas looking at the accent in a word is almost instantaneous
(2) I am often in an environment where I can’t listen to audio

Having said that, I do realize that audio might have a more direct pathway to one’s memory, so I can imagine that many people would place more importance on it.

@BIG_JEFF and @yottapolyglot, thanks for the kind words.

They’re not IPA transcriptions. Here’s an example:

The first part, “Kejt poshla na vecherinku, chtoby uvidet’sya so svoimi druz’yami”, is the Latin-character transcription I’m talking about. “пойти́ / на / вечери́нка / что́бы / ви́деться / с / свой / друг” is an example of accented lemmas. Since the accent often falls in a different place from the one where declined forms are stressed, their usefulness is limited.

Fnirk1 · November 6, 2024, 2:56am

I agree.
Better to have no stress mark than potential wrong ones.

yottapolyglot · November 6, 2024, 11:14am

I guess it all depends on how good “the machine” is.
Given the current state of affairs, I agree with @alanf_us.

Definitely ! +1

I see, thanks for the example. I agree with you, I don’t care much for this type of transcriptions either.

I mentioned IPA because I think it could help with the accuracy (or lack thereof) issue . It’s more formal/powerful and can be used for all languages, in theory…

That said, I am yet to find an online site that is both exhaustive and reliable, when it comes to Russian IPA transcriptions.

Besides, there may actually be a better solution for Russian specifically. I have not read the whole thing yet, but there seems to be some interesting discussion on the subject here:

czisol · November 7, 2024, 3:52pm

I wholly agree that the provided pronunciation guide is unhelpful.

Guessing accent marks from Wiktionary will get it wrong about half the time for words which can be pronounced in two different ways (стОит/стоИт, зАмок/замОк, etc). A quick fix would be to accent words with only one pronunciation, and to leave unaccented those for which there is more than one possibility. Maybe this was even implemented, I’m not sure. For a few words the stress was almost always shown wrongly, an example being пора.

The handling of ё is also uneven. Usually it is rendered as е with an accent where it would be more usefully rendered as ё. Distinctions such as всё/все & чем/чём are generally ignored and if the cloze word is пойдём it seems a lottery as to whether пойдём, пойдем, or both will be accepted.

Once I realized the accenting and occasionally the TTS were inadequately reliable, I started checking accents and the TTS pronunciation myself as routine for every new sentence with a procedure similar to @alan_us. At one time I made a few reports but it was unclear whether anything was happening as a result so I stopped.

I also add accepted translations, such as the alternative masculine or feminine of verbs for which the sentence does not imply the gender. In my opinion if a word can be written correctly with ё or е then the version with ё should always be accepted and if necessary I add that as an alternative translation. For some time I was also removing accents from single-syllable unaccented words, but it just got too time-consuming. I also made a collection of sentences where I thought the TTS was wrong.

I also make sure I understand the new sentence and for example why a perfective or imperfective verb was used. Where useful I make explanatory notes for myself in the Notes field, and add hints. For example if the answer is секрет my hint may be “Not тайна”, if that would otherwise be a plausible answer.

All of this information is saved and I see it next time I encounter the sentence. As I have built up a lot of information like that I absolutely want to continue getting back what I have saved.

I agree that:
Pronunciation guides in Latin characters are little or no help
The stressing of the specific case of a word is required, and not of the dictionary form
Wiktionary is a very reliable guide to stress marks
Reliably correct stress marks will be more useful than native speaker recordings

Nobody is immune from errors. One way to reduce errors might be to have more than one person independently stress-mark everything and compare the results. Double or triple checking of machine-generated marks may be more productive than having them manually generated.

IPA is an interesting idea but might take more work to implement. Reliably correct IPA could be used to improve the TTS.

I hope the above is helpful.

BIG_JEFF · November 7, 2024, 7:33pm

great discussion going on here. let’s keep the momentum going! i want learning russian on clozemaster to be the best possible.

yottapolyglot · November 8, 2024, 1:45am

Likewise.

I have seen this site recommended a few times:

Starting from 0, I have no idea what I’m looking at, let alone how reliable/accurate it is.

And then there is this one:

Easier for a complete beginner / English speaker.

This looks good to me, I actually did test a few other examples listed by @alanf_us, and as far as I can tell, the results seemed accurate.

Are any of you familiar with these?

alanf_us · November 9, 2024, 2:14pm

@yottapolyglot, I salute you for your interest in pronunciation to the point where you find IPA transcriptions useful, but I would be very surprised if IPA transcriptions were ever added to Clozemaster. Learning the IPA is a daunting task. According to Wikipedia, it contains “107 segmental letters, an indefinitely large number of suprasegmental letters, 44 diacritics (not counting composites), and four extra-lexical prosodic marks.” This is not like learning the set of 33 symbols in the Russian alphabet, which is well-defined, smaller, and intrinsic to the task of understanding the language. I would venture to say that most Clozemaster users, including myself, are not familiar enough with the IPA that it would help them in Russian, if they know it at all. Nor is there a reason to learn it, since there aren’t that many words whose pronunciation cannot be adequately predicted by the Russian spelling plus the accent mark, and those words can be learned individually. I think we can’t assume, either, that the Russian speakers who add and correct entries at Clozemaster would know the IPA. You ask a few questions about some sites that were recommended to you or mentioned in a Stack Overflow page. It was interesting to look at them, but they didn’t give me anything that I’m looking for that I haven’t found at Wiktionary. I did note that most of them use the Russian spelling of the word plus either an acute accent mark or color to show the accented vowel.

@mike, it occurs to me that I didn’t mention a third reason why I favor improved stress marks over audio recordings: sometimes it’s not easy or even possible to determine the stress from a recording. On a different subject, if notifications of updated sentences are getting lost, and you’d like me to help you investigate this on my side, please let me know.

@czisol, I am impressed that you go to the trouble of adding alternate translations. I wonder whether you’re using the “Text Box Size: Changes” and “Typing Color Hint: On” settings, as I am. When you do this, alternate translations become less useful because you can try out the various possibilities that would fit before you submit. Of course, these settings have other consequences that you might not like. For instance, they let you figure out a word incrementally if you know (or guess) the first few characters correctly. However, it’s possible to mitigate this with the new “Sentence Text Initially Hidden: On” setting, which lets you try to guess the Russian without seeing how long each individual word is before you see the text box.

czisol · November 11, 2024, 2:41am

Thanks for mentioning the various settings, which I have tried in various combinations. I turned them off on my windows PC because I felt I was cheating myself. If I don’t know the word I should just let the algorithm give me repetitions. However I still have them on my phone. I find the “Sentence Text Initially Hidden: On” setting adds another useful dimension to the practice since obviously the main objective is to know not only the words but also to use them.

I have been using voice input typing on the PC, and the alternative translations were originally for that. I have found it more effective than a Clozemaster voice input tool that I also tried (not sure if it still exists). I would like as much as possible of my attention to be focused on the language rather than on the mechanics of using the practice tools. When the voice typing is accurate enough one can also go very fast. at least with words that are known. (I am worried however that the PC may be training itself to recognize my mispronunciations.) A downside is that one may not be learning the exact spelling, but if a PC can understand me then probably so can a native.

yottapolyglot · November 12, 2024, 9:45pm

I wouldn’t know

I do know more about the IPA than about the Russian alphabet, but then again, I know nothing about the Russian alphabet.

What I do know is a very small subset of the IPA, and I do like the fact that the IPA is very formal / doesn’t leave much room for interpretation.

That said, I don’t feel strongly about the IPA per say, and you do make very good points.

I think the main challenge is finding an authoritative / exhaustive source on Russian word stress / pronunciation, whether it be based on IPA transcriptions or some other system.

On native speaker audio recordings vs. improved stress marks:

First, I should say that the 3 reasons you listed, explaining why you’d prefer improved stress marks, all make sense to me.

Still, I do feel like native speaker audio recordings are more valuable, and I’ll try to explain why.

My reasoning is:

Stress marks

If we find an authoritative/exhaustive source with IPA / [insert whatever system] transcriptions, then we don’t need a “stress-guesser bot” anymore.
We could automate, by comparing the stress marks on clozemaster to the stress marks indicated by the “source of truth” (SOT)
Basically, a database lookup: if match, do nothing / if no match, use SOT
It may not be perfect, but If the SOT is good, this could be done with a high degree of accuracy

Essentially, automating what you have been doing manually, using Wiktionary as your SOT.

Native speaker audio recordings

No good substitute yet: TTS / AI voices can’t replace audio recordings by native speakers, at least not for Russian yet
As a complete beginner, when I listen to an AI, I have no idea if what I’m hearing is actually correct (and even if I did… → see next point)
I believe I am/(would be) much more likely to pick up / develop bad habits by listening to incorrect audio recordings
Even more so when coupled with a feature like chorusing (Request for Chorusing functionality - #4 by MikeInTaiwan) – which I really really like btw – I think this could actually be quite bad

Ultimately, I guess which option people prefer, will depend on how they learn a language and how advanced they are. My approach is very much “audio-focused”, so I do feel strongly about native speaker audio recordings.

EatAtJoes · July 3, 2025, 8:20pm

There is a vast amount of existing native Russian speaking audio out there, with machine-generated transcriptions. While not ideal, I think it would be a useful stopgap to augment the existing Cloze tools to be able to upload audio along with the cloze when creating a collection. Ideally in bulk, by adding (say) a json-formatted upload with “audio_base64” as a mechanism for data transfer…

alanf_us · June 7, 2026, 8:06pm

I would like to bring this topic up again.

I still regularly find accent marks in the wrong place, or missing where they are present in other words in the same sentence, and I still report them both with the “Report” button and in the Russian forum. Today, for example, I reported an instance where the pronunciation of the word “узнаю” (which means “will know” when accented on the second syllable and “know” when accented on the third) was missing an accent mark. When I searched for sentences with this word as the cloze, I discovered that of the 17 results, 12 were missing the accent mark, 3 had it in the correct place, and 2 had it in the wrong place. Most of the instances (including the 2 that had the accent mark in the wrong place) were in a collection (“3,000 Most Common”) of which I had only played 21 sentences in total, none of which contained “узнаю” as the cloze. So it couldn’t be that my edits to the sentences caused the accent-mark-less or the wrongly-accented pronunciation to get stuck in my “local” copy.

I save all my “updated sentence” notification emails in a folder, and other than one from November 5, 2024 (which I mentioned earlier in the thread), I don’t see any emails relating to Russian among them unless I go back to July 31, 2022. By contrast, I have gotten notification emails for updated sentences in other languages all along.