Sentence updates are getting worse

I got 150+ update notifications of user reports in the last 30 days. And I found some fundamental technical issues in approx. 20% of the updates that may also affect other language courses. In the past six months, I see the update speed has been accelerated but the quality has been declining. I wish the user report handlers stop updates for a while and spend more time on reviewing their operation.

Summary of Issues:

  1. False notification – The email notification tells that the English translation was update. The general search returns the updated translation. However, none of the collections is actually updated.
  2. Mismatch with Tatoeba – Tatoeba sometimes had already updated a sentence in a target language (TL) in order to reconcile a mismatch between TL and English. However, the Clozemaster admin ignored it and updated the English translation instead. As a result, such a sentence pair is unsearchable on Tatoeba.
  3. No update info in the email – Some email notifications don’t tell which parts Clozemaster updated although most of normal ones show both “pre-” and “post-” updates in the email. Without the comparison, I cannot see which part(s) was changed: TL or translation.
  4. Ignoring word frequency – Some updated sentences in TL ignore the word frequency list and change the {{cloze-words}} to super easy ones.

Here are some screenshots for your reference.

Issue #1: False notification

  • ID: Dia menjadi kegirangan setelah mendapatkan SIM.
  • EN (old): He is bent on getting the driving license.
  • EN (new): He became overjoyed after getting his driver’s license.
  • The pair currently belongs to 50K Most Common, Random Collection (RC), and Fast Fluency Track (FFT)

Image 1: Email notification for the update on the translation

Image 2: Looked it up in the general sentence search

Image 3: After pushing each of the three collections

Image 4: Another similar glitch

  • ID: Tambahkan gula dan vanili pada krim kemudian kocok sampai krimnya mengental.
  • EN (old): Add the sugar and vanilla to the cream and beat vigorously until the cream thickens.
  • EN (new): Add the sugar and vanilla to the cream then beat until the cream thickens.
  • The pair belongs to 50K and FFT.

Issue #2: Mismatch with Tatoeba
Image 1: “going through Boston”

  • The original author on Tatoeba fixed the error in the target language (TL) on April 4, 2022.
  • I shared the update via the Clozemaster sentence discussion and filed the error report on June 20, 2022.
  • Clozemaster ignored it and changed the English translation instead on September 25, 2022.
  • ID (Tatoeba version): Tom dengan mobilnya akan pergi melewati Boston dalam perjalanan mengunjungi Mary.
  • ID (Clozemaster version): Tom akan melewati Boston dalam perjalanan mengunjungi Mary.
  • EN (Tatoeba version): Tom will be driving through Boston on his way to visit Mary.
  • EN (Clozemaster version): Tom was going through Boston on his way to visit Mary.

Image 2: “looks and money”

  • The original author on Tatoeba fixed the error in TL on April 4, 2022.
  • I shared the update with Clozemaster via the the error report.
  • Clozemaster ignored it and changed the English translation instead on September 25, 2022.
  • ID (Tatoeba version): Dunia itu hanyalah tentang paras dan materi.
  • ID (Clozemaster version): Dunia itu hanyalah tentang uang dan kehormatan.
  • EN (Tatoeba version): Only looks and money count in this world.
  • EN (Clozemaster version): The world is all about money and honor.

Issue #3: No update info in the email
Image 1

Image 2


Issue 4: Ignoring word frequency
Image 1: The cloze-word for 50K Most Common Collection is now “mother” (ibu)
==> See the sentence discussion.

Image 2: The cloze-word for 50K Most Common Collection is now “chair” (kursi)
==> See the sentence discussion.

Image 3: The cloze-word for 50K Most Common Collection is now “am/are/is” (adalah)
==> See the sentence discussion.


Re: #2 (mismatch with Tatoeba), this is particularly inconvenient for mobile app users. At least the iOS app doesn’t give a hyperlink to the source (Tatoeba) via the sentence search. There is no way for mobile users but to look up directly in Tatoeba. As far as I know, however, Tatoeba doesn’t enable us to look up old sentences. So, the new sentence pair is not searchable.
This is also technically a copyright infringement. All sentences on Tatoeba are published under the CC BY 2.0 license, which requires external users to display a hyperlink to fulfill the attribution right of the original author (BY). I raised this issue five months ago, and @mike said that “work in progress”. Five months are probably not a standard length of “grace period”.


Re: #4 (word frequency), the more sentences CM updates, the more the Most Common Word Collections are disorganized. The same issue was reported by @zeiphon (Tagalog learner) two months ago, and @mike promised that the team would work on a fix. However, I guess the issue was not shared with the user report handlers and they keep spreading the same issue to other languages.
Furthermore, such a random change in cloze-word selection makes users’ own edits paralyzed. For example, some PRO users added hints for the initial cloze-words, which now don’t make sense due to the new cloze-words.

These are just the tip of the iceberg.

1 Like

Hello @MsFixer! Thanks for all this! It’s super helpful.

  1. False notification - there are two things going on here. One is that, at the moment, a reported sentence is only updated in the collection for which it was reported and in the general search. We should probably extend that so at least translations are automatically updated across collections when the sentence text is the same. Automatically updating the sentence text across collections, however, seems like it would risk introducing issues with Grammar Challenge collections, Most Common Word collections, etc., so we’ve left it at that for now.

    The second is a potential bug where, when a sentence note is added for example, everything in submitted in the form at that time is then preferred whenever loading the sentence in the future, including the translation. For the example you included, Dia menjadi kegirangan setelah mendapatkan SIM., it looks like a note was added to the sentence that was reported and updated, so when it’s then loaded, the translation at the time of the note is preferred. In this particular case I’ve updated the sentence for you so that you can see it was updated in the 50,000 Most Common collection. It seems like we could update the sentence edit functionality so that translations are saved only if they’re different than the original sentence translation.

    The same applies for Sentence updates not applied to other collections and Some sentences are not searchable after updates by the admin (apologies for the slow reply on those!).

  2. Mismatch with Tatoeba - the sentences may diverge from Tatoeba when they’re updated on Clozemaster. The moderation team refers to Tatoeba and uses or makes changes there when they can, but ultimately try to resolve reported sentences in the best way possible for Clozemaster. Please let me know of course if you think there’s some issue with the translation change examples you included.

  3. No update info in the email - we’ll get this fixed, thanks for letting us know!

  4. Ignoring word frequency - thanks for letting us know! We’ll aim to get better at updating clozes to fit the collection when these issues come up, and thanks for the sentence discussion posts on those as well as the re-reports.

Curious to hear about the rest of the iceberg. Resolving the reported sentences is best effort and we’re always working to improve the process. Thanks again!

2 Likes

Thank you @mike for your reply.
I’m afraid my point #2: Mismatch with Tatoeba is misunderstood. Clozemaster violates the copyright law. That’s the real problem. You should stop updating sentences until implementing the revision history system. Please make sure you and all of your report moderation team members reading the terms of requirements for the attribution of CC BY 2.0.

Re: the rest of the iceberg, it’s impossible to report all of them here. I received ~200 updates in the last 30 days. As I said, approx. 20% of recent updates have either of the abovementioned four issues. Furthermore, I keep receiving the same problematic notifications even after I filed this error report. My email inbox is clogged with junk notifications.

1 Like