Classes of errors in the pronunciation field of Russian sentences
I find it very helpful that the pronunciation field for Russian pronunciations is filled more often than not, usually with correct information, and that I can choose to always display this field after pressing the “Submit” button. However, errors are still quite common, and I thought it would be useful to classify the scenarios:
(1) There are two possible places to put the stress, depending on the case and/or number, and the automatic accent marker picks the wrong one.
(2) There are two possible places to put the stress, depending on the case and/or number, and the automatic accent marker chooses neither one, leaving the accent mark missing entirely.
(3) There is a “ё” character (always accented, and pronounced “yo”) that is written in the sentence as “е”, a common practice in written Russian that is not designed for language learners. In the pronunciation, it is written as “е” (pronounced “ye”, and capable of being either accented or unaccented). In this case, it’s not clear where the accent is supposed to fall, and whether that character is a “ё” or a “е”.
(4) There is a “ё” character (always accented, and pronounced “yo”) that is written in the sentence as “е”. It is written as “е́” in the pronunciation, which does show where the emphasis falls, but leads the reader to think, incorrectly, that the character is a “ye”.
(5) The pronunciation contains Latin characters, without accent marks. Beginners in their first stage of learning the language may think this is useful, because they’re struggling to master the 33 characters and are generally not even trying to put the stress on the correct syllable, but anyone who has gotten past that stage will find this useless.
(6) The pronunciation field is empty.
I think targeting these categories one at a time could be the best way to proceed. Ideally, you could produce a list of words with multiple possible accent patterns to help you identify sentences that belong to category (1) or (2). But I’m not sure how best to get it. I did find a Stack Overflow post about retrieving information from Wiktionary, though it seems that doing this is not simple. I’m not surprised, because in my searches, it’s hard to predict beforehand whether searching for a particular word will take me to a page for that particular (inflected) form, or to a page that tells me I need to do a full search for it, in which case I need to go to the declension table to find the accented forms. But however you proceed, you will need speakers with some knowledge of the language, and some of the work they do will need to be manual.
I am already reporting problems and updating my “Notes” field every time I see a problem. If there was a way for me to edit the “Pronunciation” field and have it go into the main database, I’d be glad to do it for you. I would only want to deal with the cloze word, though. Adding the accents for all words in the sentences would be too much work for me, so I would not want to tackle sentences in category (5) or (6) unless simply inserting the cloze word with Cyrillic+accent was acceptable.