HSK6 deck working similar to Clozemaster

Hi there,

I built a HSK6 deck on Anki which works kinda like Clozemaster. It has coverage of roughly 2080/2500 HSK6 words, which I think is sufficient to then go on learning your own material/more genuine stuff. It largely builds onto work done by other people, I mostly just adapted stuff.
I haven’t really tested it yet so there might be problems I’m unaware of, but I’m more likely to remember all that right now than in a month. In any case it seems technically working.

I can’t share the deck directly because I partly used SpoonfedChinese as a basis (which is not free), but I’ll indicate some of the steps in case someone would be interested. It’s a bit technical but if you’ve studied a bit of IT it should be ok.

Use the database here (see Leguan’s #3 post): which covers about 2000/2500 HSK6 words

You can mix it with Spoonfed for more variety. Leguan’s database on the thread has 23000 sentences. Making the text files of Leguan look like something that can be imported into Anki takes a bit of manipulation but it’s ok.

Clearly 23 000 sentences (+ Spoonfed! = 31 000) is a bit much, so it’s best editing it down a little.
In this spirit, you’d like to keep only sentences which feature at least a HSK6 word.

To do that it’s good to use the tokenizer jieba which you can find on github. Then run the comparison of the tokens of each sentence against a list of HSK6 words. In the end I ended up with about 10 000 sentences (roughly 5 sentences per word on average). If you’re like me you might prefer cutting into chunks of smaller collections/decks. Personally I cut it in 10 and tried to have each HSK6 word have its representative sentences in the same chunks. (IE when nearing the end of a chunk, you’ll be familiar with all HSK6 words, and when starting the next chunk, the majority of HSK6 word should be new).

Personally I like to work with Clozemaster’s audio mode (which is especially neat to memorize tones). So I replicated the behaviour best I could, by modifying the notes/cards structure (starting from Spoonfed Chinese), keeping only audio on front, and everything on back.

I also used “Automatically flip cards add-on” which is an Anki add-on you can find easily online. It’ll show the back of the card automatically after the audio is played. It also then doesn’t wait for your answer, which is pesky. You can change that by adding a maximum of 20 seconds waiting time before flipping on the back of the card. Alternatively, if you want Anki to wait for your command infinitely like Clozemaster does on Audio mode, you can look into the init file of the add-on and modify the time-limit yourself to something really high.

Finally there’s the problem that Leguan’s database doesn’t have audio (unlike Spoonfed which has fantastic audio). I looked online and used zhtts (again, can be found in github) which is quite primitive but does a better job than microsoft’s voice IMO and doesn’t sound too robotic. Acceptable all in all. (Better TTS probably exists on chinese websites but I found it a bit hard to navigate there.) I’m unsure if it’s possible to run a script directly when anki shows cards, so I generated all the audio files in advance and put them in the media folder.
zhtts generates in wav, which is accepted by Anki, but might take a bit of space. If you’re short on hard drive or just dislike the idea of spoiling it, you might want to convert the files to mp3. This can be done with python modules as well (pydub) (then you’ll have to install ffmpeg).

Leguan’s database also doesn’t have pinyin, but this was a less critical issue to me. Anyhow, I used an other module which I currently can’t remember the name of. I’ll have to see if it’s accurate.

The review settings of Anki can probably be edited to something similar to Clozemaster, that’s up to personal preference.

Limits:

  • doesn’t have clozes for the words. I like to have some words highlighted so I put the HSK6 words in blue. Personnally not having clozes doesn’t bug me all that much, I usually listen to the sentence and start writting the characters that seem challenging to my ears on a sheet of paper. If you need to look at the sentence, it’s a sign you’re not fully familiar with the sentence.

  • sentences are short thanks to Leguan’s excellent work, but they might be a bit complicated (lots of words outside of HSK bounds). Well, at HSK6, I’d say it’s high time! Besides exams also don’t only feature the official vocab lists.

PS. After a day of testing, works fine, but note that:

  1. sentences can have more than one HSK6 word (words showing up in blue)
  2. sentences are fairly complex (transition after Clozemaster’s HSK5 is pretty rough).
4 Likes