Hindi sentences can be tricky to split into words. It may that we the process we use sometimes break the sentence into words incorrectly, leading to some of the “words” not being actual words and therefore not appearing on the frequency list, causing the algorithm to select a word that is on the list but is more common.
As a programming geek, I’m curious why splitting the sentence is difficult. That is, words with dashes may be hard, but I’m not sure what else doesn’t work. Presumably all the mushed together letters in Hindi are handled by whatever parser you’re using; you can still find spaces.
Another option is to copy sentences to a custom collection where you can change the cloze word to any text you select.
I’ve started three collections - easy, advanced beginner, and hard. I’m partly doing it to know which sentences I’ll want to get back to in the future. The fact that I picked the right cloze out of four options doesn’t truly mean I understand the words and grammar of a sentence. But I’m also doing it in the hopes that (if I have the stamina to make it through a couple thousand sentences) other Hindi learners could start with the easy stuff. I was intimidated at some of the sentences I got from Random on my first day. While I’m at it, I’m changing a bunch of the cloze words in the collections I copy to. I haven’t been changing the cloze words in the random collection, though maybe I should.
We may also eventually allow selecting a new cloze word for all collections, curious to hear what you all think of this option.
Do you mean a button to change a sentence in all collections? That’s a neat option.
And actually as I’m looking at Hindi, we should be able to make some significant improvements within the next few months - hopefully improving the cloze word selection like you mentioned, as well as adding a Fast Track and Most Common Word groupings.
That would be amazing! Seeing 9750 sentences that I have to go through in random order is a bit overwhelming. It seems that you could get common words automatically from some combination of the online lists of common words and the words in the sentence collection itself. 10K sentences is enough that a ton of words will appear more than once, right? Some combination of “number of words in the sentence” and “number of words in the sentence weighted by how uncommon they are” has to be a better metric of difficulty than completely random.
I could put this in the suggestions channel, but I wonder whether you could have a 1-5 difficulty rating for each sentence that users could optionally select as they’re playing the sentences. It would only take an extra second per sentence. It looks like Hindi only has a few dozen people actively going through it right now, so you won’t have thousands of ratings per sentence. But even ten ratings is better than nothing, and over time you should have statistically significant ratings. (If you want to be fancy don’t turn the ratings on until a user has leveled up a few times, since all sentences may look difficult at first.)
Thanks for your response. More broadly, big thanks to “Mike and the Team at Clozemaster” for all the work you’ve done in putting this great resource together!