Questions about Most Common Words

If I choose the 5,000 most common words, does it mean I’m choosing the 4,001 to 5,000 most common words? The option before that is the 4,000 option. So I’m assuming that the 5,000 option excludes the 4,000 most common words - and I’m really focusing on 4,001 to 5,000 - i…e 1,000 words right?

  1. If my understanding is correct, I’m wondering why it’s named as such, rather than say “4,001 to 5,000”. (I’m doing Spanish-English)

  2. If my interpretation is wrong and it’s actually sentences focused on any of the top 5,000 words (which means it could show the 10th most common word in blanks), then I’m not sure why this is so because I wouldn’t want to learn such a common word.

  3. For Spanish - I see that the 50,000 Most common words option has only 10k sentences. So my question is why is that so because then I wouldn’t get to test all the 20,0001 to 50,000 (30k words).

Thanks.

PS: I have so many questions about clozemaster. IS there some kind of introduction or help/support? Because all my questions I’m sure should be addressed for everyone, not just in a forum. I may have missed a webpage with all the FAQs and explanations of how Clozemaster works.

1 Like

yes. 5000 most common collection is 4001-5000 most common words.

2 Likes

Thanks, in that case, for Spanish-English:

  1. What does the “50,000” most common words only have 10,000 sentences. It’s meant to have the 20,001 to 50,000 most common words right - which would be 30,000 sentences. Will the complete 30k list of words be added eventually?

  2. How about the “20,000” most common words - it should have 10,000 sentences from the 10,001 to 20,000 most common words. There are 10,000 sentences - are there any “repeats” or every one of those sentences address only 1 word?

Thanks

1 Like

@mike can you help answer this?

2 Likes

Another question I hope I can get a response to (Again, if there’s some kind of FAQ page or a page that explains exactly how all these work, do let me know).

I’ve been going through one of the most common words and I see a number of sentences repeating itself. Why is this so? Does every sentence repeat itself automatically? I’m talking about repeated sentences, not words. Thanks.

2 Likes

Yep! The current Most Common Words collections breakdown is simply the default we came up with since most language pairings don’t have many sentences in that range of the frequency list. Eventually we’ll break those down further for the languages that can support it, and ideally add something like a “part 2” of the Fast Track with more difficult sentences.

There may be repeats, meaning multiple sentences with the same missing word. The Most Common Words collections allow up to 10 sentences with the same missing word. Sentences are selected randomly up to 10k sentences total. The 10k limit exists to keep the collections reasonably doable and downloadable.

A given sentence should only occur once in the Most Common Words collections. If you get a sentence incorrect, and depending on your review interval sentences, you may see the same sentence multiple times while playing that collection. If you don’t think this explains what you’re seeing please let us know the sentence and language pairing and we can double check.

2 Likes

In French at least, you can get what appears to be the same sentence appearing over and over if it exists in Tatoeba in various forms for plurals, formality and gender. This is a glaring example (glaring because I don’t really know what it means and it sounds potentially offensive!):

1 Like

Right on / good point - thanks! Although agreed that example doesn’t sound great - working on checking with a native speaker and will of course remove if it is indeed offensive.

3 Likes
  1. I don’t click the “Review” button. Yet I’m still seeing the same sentences when I do the most common words. Is that meant to be the way it works?

  1. Another thing, here’s the image for spanish 10,000 most common words I’m doing.

So this section contains 10,000 sentences from the “5,001 to 10,000” most common words in Spanish right?

Do I know exactly how many of the “5,001 to 10,000” are actually used? Because there could be some words that have 3 sentences and some 5 and some 1 right? So it may be that I am not going through the full 5,000 different words (of the 5,001 to 10,000 most common spanish words).

Also, what does the “456” mean? It means the number of sentences of the 10,000 I’ve gone through right? And what’s the green (0%) in the last column refer to?

Thanks.

1 Like

I don’t click the “Review” button. Yet I’m still seeing the same sentences when I do the most common words. Is that meant to be the way it works?

The default settings will show you some reviews in every round. If you want to change it so you only see new sentences, you can do it under Review Settings on your dashboard:

image

image

So this section contains 10,000 sentences from the “5,001 to 10,000” most common words in Spanish right?

Correct.

Do I know exactly how many of the “5,001 to 10,000” are actually used?

Not to my knowledge.

Because there could be some words that have 3 sentences and some 5 and some 1 right? So it may be that I am not going through the full 5,000 different words

That’s correct. Clozemaster draws its sentences from Tatoeba, so if a word isn’t in Tatoeba it won’t be here.

Also, what does the “456” mean? It means the number of sentences of the 10,000 I’ve gone through right? And what’s the green (0%) in the last column refer to?

456 is the number of sentences in that category that you’ve played. The green % is for the sentences you’ve fully mastered (played once and reviewed a further three times, or manually set to mastered).

2 Likes

Thanks for your help. I’m not sure who you are, but I don’t understand why all these pieces of information aren’t in some section of the website (I’m assuming it isn’t since I’ve hinted at this in my previous messages). Someone who works for Clozemaster should be writing all this up as people do have questions about all this!

Also, I’m wondering how often more words are put into Clozemaster. Do we get updates if that’s so? I’m interested to find if more and more words are being added for Spanish.

1 Like

So even though I have not finished the 10,000 words, I tried the 20,000 Most common words section in Spanish, which I guess should be from the 10,0001 to 20,000 most common words in Spanish.

I got these words: “cierres” - subjunctive of “Cerrar” (close) and “Paremos” - imperative (I think) of “Parar” (stop)

I’m wondering if these are really that “rare” - i.e. outside the most common 10,000 words in Spanish. I am an ESL teacher so I know that “stop” and “close” in English are both very common - in fact, in the top 2,500 most commonly used words according to Macmillan.

Of course, Spanish isn’t English, but I am just expressing my doubts as to whether “cerrar” and “parar” are outside the top 10,000 most commonly used Spanish words. I’m assuming they way words are ranked include all forms (however conjugated, and whether imperative or subjunctive, etc.). If so, I can’t imagine these two Spanish words to be so uncommon.

I may be wrong but I’d love someone’s comment. Unless my understanding of how Clozemaster works is wrong. Thanks

1 Like

I’m do not know anything about the sourcecode of Clozemaster. But all of the sentences are from tatoeba and i think there is no code for conjugation in Clozemaster. The occurence seems to be the occurerence of exactly this byte sequence of the word in the sentences imported from tatoeba.

3 Likes

Is it possible to get those working with Clozemaster to answer this question?

How could I go about doing that? Write into support?

1 Like

AFAIK a word in clozemaster is generally a unique set of characters, so you will get multiple words for each conjugated verb, or adjective, etc. Rarer conjugations of common verbs may be in uncommon word collections.

3 Likes

This! The frequency list we use includes conjugated/declined/inflected forms, so less common conjugations of common verbs are further down the list / considered more difficult.

Nope, and right :slight_smile: Clozemaster uses the most “difficult” word in the sentence as the missing word (the least common according to a frequency list for that language), so it may also be that you end up seeing a number of words that are in a given Most Common Words range but are never used as the missing word.

Agreed! :slight_smile: Thanks for all the questions - work in progress on a knowledge base / improved FAQ. And ideally we’ll keep improving the UI to make the answers to these questions more obvious from the UI itself / just by playing.

Not often at the moment, though we’re working on changing that. I’d also mentioned about potentially splitting some of the Most Common Words collections and adding a “part 2” to the Fast Track above. We may send out an update, but ideally we’ll have it automated such that it occurs every few weeks/months and you’ll simply see more sentences in the collection you’re playing :slight_smile:

Also it’s worth noting that for Spanish there should already be plenty of content to keep you busy for a looong time :slight_smile: It sounds like you’re looking at the 5,000/10,000/20,000 Most Common Words collections, which for Spanish have nearly 24,000 sentences combined - the equivalent of reading the first Harry Potter novel nearly 4 times. :exploding_head: And by the time you get through it all (and perhaps the 50k Most Common collection) you’ll probably want to move on to more native-level content - reading books, watching movies, etc.

Any other questions or anything I missed please be sure to let us know!

2 Likes

My understanding is that this, sadly, will sometimes make valuable words not end up as clozes, when they are masked by some even more difficult word in a sentence. And, that “difficult” word may be a name or something quite irrelevant.

1 Like

This is the part I don’t understand. Most frequency word lists don’t have different conjugations as separate words. And I guess this is the reason I’m seeing so common words in a list (in a less common conjugated form) that should contain less common words.

1 Like

Might you be able to share any of these lists? We likely won’t change the current approach, but we love a good frequency list as much as anyone and perhaps we can use them in other ways.

I’d counter that it might be a bit much to see the past tense subjunctive in the 100 Most Common Words collection when you’re presumably a beginner looking to work through the most common words, unless of course it really is a form you’re likely to see daily. :slight_smile:

Thanks again for all the feedback!

1 Like

I don’t see how a language learning tool would be very useful if it didn’t require you to conjugate verbs in many different forms. I needed a TON of repetition of reading, hearing and reproducing the various French verb forms for them to start seeming natural.

Plurals can be frustrating, in that words like “teachers” and “elephants” sometimes occur in the less frequent word categories, when you’ve already covered “teacher” and “elephant” earlier. However, sentences like that are a small percentage of the overall collection. You can mark them as “ignore” so you won’t see them again after the first time.

1 Like