HSK levels not representing every words

@mike As I mentioned in Improve quality of Traditional Chinese - Questions, Suggestions, Feedback - Clozemaster, I have done some work to improve word segmentation of Chinese. Although I have mainly focused on Traditional Chinese, these results can also be used for Simplified Chinese. You can see the results in my online dictionary, which I use to help me learn vocabulary.

For Traditional Chinese I have created a few custom collections, such as TOCFL6 and TOCFL6 missing words (TOCFL is the Taiwanese version of the HSK). You can see that for TOCFL6, more than 50% of the words do not have a corresponding sentence in the Tatoeba corpus. I expect the results for HSK to be similar.

If you’re interested, I can help you to improve the Chinese tracks of Clozemaster. Perhaps this file, which contains the full list of Tatoeba sentences (extracted some time ago), segmented both in Traditional and Chinese characters, could be useful.

1 Like

Thanks a lot Mike, I see you’ve updated the HSK5 collection. Much much appreciated! I’m going to try it out a bit and let you know if the new imports went smoothly. It seems to me as per a quick browse through the new sentences that there might have been lots of imports for clozewords that were already included in the initial 2000 sentences. However, it also seems all the words that were missing up to now have been added :slight_smile: yay!

I haven’t stood an HSK test yet, but I’ve been following the HSK4 course of an official institute (Confucius Institute). In my experience, doing the Clozemaster collections up to HSK4 was extremely helpful in understanding the course. Other skills I’ve had to develop where being more spontaneous orally, being able to read a bit lengthier unknown texts, and a couple grammar points. I did so both by making tandems and paying attention to the explanations of the teacher. There are online free resources as well for those grammar points, but I believe they are protected and couldn’t be used by Clozemaster.

Once I’m done with HSK5, in about a month and a half, I will come back to this topic and will hopefully give a more informed and thought-through answer to your question.

Have a nice day


@Ilraon, I saw on the “Learning Mandarin Chinese” thread that you only started learning simplified Mandarin in Nov 2019. I’m really impressed that you’re taking the HSK5 after such a short time.

I don’t know if I’ll stick with it long enough to get that far, but if I do, I’ll be glad that you prompted improvements to the HSK collections! I’m taking the HSK1 next month just for fun and am currently partway through the HSK2 collection.

Thank you! It’s indeed a fun detail. All in all, having fun or finding anything interesting along the way matters more than speed in all cases! Such that I believe my case is not necessarily more shiny than someone who would’ve learnt in ten years.

1 Like

Hello, I was supposed to do HSK5 today, but due to me messing up with PCR test certificates, I can’t write it.
I do have a couple remarks on the update:

  • Some sentences are pretty long. One instance is, a dozen or so sentences are excerpts from speeches of presidents of the USA. Listening to them sometimes takes about a minute each.

  • Some are duplicates ; rarely with exactly the same clozeword, most often with a different clozeword in the sentence which is still interesting.

  • Some words which were listed “not in tatoeba” in your other posts, actually appear in the update

  • a few characters use traditional instead of simplified mandarin, as was already noted in other places of clozemaster, which messes up the audio (usually those characters are skipped by the TTS altogether). This issue is really perplexing to me because the conversion from traditional to simplified characters doesn’t appear to be difficult, I’m wondering why those mistakes exist in the first place.

  • some sentences have no pinyin for some reason

  • most sentences have a really well adapted level (they use multiple hsk5 words and are appropriate to prepare it)

  • As the update is only to HSK5, there are still potentially some missing words in HSK4, but I believe not too many. (One instance, to my knowledge, is 符合). I might check all the words of HSK4 to make sure.

In terms of how well that prepares for the HSK5, even though I didn’t do it, I did practice for it, and my weakest area was reading and writing. This is logical considering I always work with audio mode. There is an other reason, though: HSK5 reading actually uses vocabulary beyond the scope of HSK5 - I think they expect you to have a full command of the HSK5 vocabulary, and understand a bit more. Besides, Clozemaster doesn’t fully train you to write - for this it’s still better to just exchange with real people who’ll correct you. My grammar was also not quite to the level.
All in all, I’d say having fully mastered Clozemaster’s HSK5 gives good chances of passing the test, but only if one pays attention to the global context of the sentences, and not just the clozewords. Best would probably be to follow a HSK5 course in parallel to self-study with Clozemaster.

/edit : one such example

So sorry to hear you didn’t get to take the test!