Spoonfed chinese is a deck for Anki which you can buy for a couple of dollars and has received high praise. It includes quality recordings for each sentence. With a bit of google searching and manipulation, you can turn it into something that works similarly to Clozemaster (except it has no cloze, it’s the full sentence, lol).
I was interested in the potential of this resource to offer a solid base of sentences for HSK6 and potentially HSK5, areas were Clozemaster by itself is currently insufficient.
As there was nowhere to find the information, I bought the deck and ran a couple of scripts using jieba tokenizer. Here are the results in case someone would be interested in Spoonfed chinese for HSK preparation purposes:
HSK6 : 791/2500 (32%) unique words, 1280 sentences including such a word. (1.6sent/word)
HSK5 : 747/1200 (62%) unique words, 1932 sentences including such a word. (2.6sent/word)
HSK4 : 470/600 (78%) unique words, 2067 sentences including such a word (4.3sent/word)
One sentence which contains an HSK5 AND an HSK4 word would be counted in both categories. I have not made statistics on words which aren’t in any HSK.
I’m not sure where Clozemaster stands in comparison, as I don’t know how to export the pro collections (to a .txt file for instance). My gut tells me it does better for HSK4 and 5, but I’m not sure for HSK6. At any rate, Spoonfed Chinese seems to be an interesting resource as well.
It’s a pity that there is yet to be created a collection with great audio, and entirely covering the HSK5 and 6 levels.
Also, if Mike is reading this, you could add Tatoeba’s sentences as we discussed in my other thread, that would be a big improvement!
This is still top of mind, will aim to get at least HSK 5/6 updated within the next few weeks.
HSK levels not representing every words - #16 by Ilraon - Mandarin Chinese - Clozemaster
As a post scriptum, a solution I’m currently inspecting is using this database:
And extracting the sentences pertaining to HSK5 and HSK6. If the database has a nice coverage, it would only lack the text to speech feature - but this might just be a question of digging a bit on the (chinese ) internet.