I am wondering whether you “stem” words (condense inflected forms into a single lemma/base form) or treat all inflected forms separately when you calculate the frequency used to select sentences.
Why do I care? Because I’d like to see more distinct roots, rather than more inflected forms. For instance, in English, “walk”, “walks”, “walking”, and “walked” are all common words, and if each one were treated separately, they could drive out other words. I’d rather see a greater variety of verbs, so I would hope that all the “walk*” words get put into one bin (though when the sentences within that bin are chosen, I would hope that one form doesn’t drive out all the others).
Your description does mention that you use frequency lists from Wiktionary, but I think those lists might be compiled in different ways.