Minimum number of sentences for a language pairing to be useful?

What do you think is the minimum number of sentences that a language pairing needs to have to be useful?

For language pairings with fewer than 10,000 sentences, we’re only adding the Random Collection at the moment (as opposed to the Fast Track and Most Common Words collections). We’re thinking 5,000 as the minimum number of sentences needed to add a new language pairing, but curious to hear what you think. Would fewer still be useful?

2 Likes

Even something as low as a 1,000 sentences (if not lower) for a given pair of languages would still be pretty useful, assuming:

a) Both the original sentence and the translation are not substandard.
b) There are decent varying degrees of difficulty among those sentences (even for a Random Collection).

So to specify a number, I’d say 1,000, but I wouldn’t mind playing language pairs with less sentences than that. On top of that, this number of sentences can always grow as more sentences are added into Tatoeba and updated to Clozemaster, so I wouldn’t worry too much about the initial low quantity of sentences to practice.

3 Likes

I would say that a month feels like the minimum amount of time that you’d want to give someone something to do. At 20 new sentences a day (= 2 rounds of 10 sentences), 1000 sentences would last 50 days. With reviews, they would last even longer. That’s worth something, particularly in terms of motivating someone to find some other means of getting to the next stage. It can also help show the Clozemaster team gauge how much interest there is in a particular language pairing so you can determine whether to put in the effort to gather more sentences.

It would be nice if those sentences had a decent amount of variety, not only in difficulty, but also in content. But if it’s hard to get the sentences in the first place, maybe it’s not so easy to get a diverse set.

4 Likes

Thanks for the feedback! And good points :+1: We’ll likely aim for language pairings with > 1500 on Tatoeba to start so we end up with ~1,000 sentences on Clozemaster (after accounting for sentences that are too short/long and any other quality issues), which should let us add a couple hundred more language pairings :star_struck: Exciting!

3 Likes

I would like to add that even with a smallish amount of sentences you can get many more useful clozes by mining each sentence multiple times, since there are often more than one memorable word in a sentence. I think that could be useful also for languages that already have a reasonable amount of sentences.

3 Likes

It’s true that a single sentence can be mined for multiple clozes. However, if the sentence comes up twice in quick succession (for instance, in the same round, or in two successive rounds), the first instance can interfere with the second because it will show you as a noncloze word the word that will later appear as a cloze word. I can imagine software logic that would prevent the same sentence with two different clozes from appearing twice in the same round, but it would be harder to keep it from appearing in successive rounds.

3 Likes