Additional sentences and word frequency list (free for personal use)

I have two announcements to make.

Additional sentences for cloze test
I shared my personal sentence collections for additional drills on Clozemaster.

1, As of Mar. 31, 2024, 1,264 new sentence sets are available in total.
2. Most of the cloze-words in my collections don’t overlap those taught by Duolingo and Clozemaster’s ready-made (official) collections (e.g., Most Common Words Collections). Once you’ve done with the Clozemaster’s ready-made collections, try mine!
3. My collections consist of two series: Tatoeba Add-ons for Cloze Test (TACT) and Pulau Bahasa for Cloze Test (PBCT).

  • About TACT: Clozemaster’s ready-made collections imported sentence sets from in the second half of 2015. Since then, new sentences haven’t been imported. Moreover, some old sentences were imported but not used at all due to improperly lemmatized words ranking over 50K. TACT additionally imported such unused yet useful sentences from Tatoeba.
  • About PBCT: Unlike Tatoeba, most of the Indonesian sentences in PBCT were written directly in Indonesian and used in real-world contexts. I found some Indonesian sentences from Tatoeba unnatural due to awkward translation from English or Japanese. I believe PBCT contains fewer unnatural sentences. PBCT is more difficult than TACT.

Learn more about TACT and PBCT here.

Word frequency list

As a sister product, I also published a CEFR-graded word frequency list called “PBWL” for free for personal use.

  • PBWL is much better than well-known word frequency lists. Clozemaster’s Most Common Words Collections, for example, are based on a word frequency list (OpenSubtitles based) picked up by Wiktionary. I found 60% of top 50K words on the OS frequency list as “garbage” due to improper lemmatization. As many of high-ranking words are overrated, the OS frequency list is unrealistic. PBWL fine-tuned these errors.
  • PBWL registers 27K+ words/lemmas (or equivalent to 8K+ root words), which cover B2. The vocab size is larger than the top 50K from the OS frequency list.
  • You can find from PBWL which words are taught by Duolingo, Clozemaster’s ready-made collections and my shared collections.

Let me know if you have any questions about the data sets.