I see that the direct translations between Japanese and Italian, around 5000 sentences, are not a lot.
How many sentence pairs would be considered enough to setup a new course?
I ask because I got into the habit of adding an Italian translation to Tatoeba sentences when I feel confident I understand the meaning (so, for relatively simple sentences), and maybe at some point, that would be enough to prepare at least a basic course (like, for example, Danish from Italian, which has 5522 random sentences).
2 Likes
@mike-lima
Thank you so much for contributing to the Clozemaster Italian and Japanese forums as well as to Tatoeba.
5,000 direct sentence pairs are good enough to develop a new course. However, it’s not the right time to have a new Japanese/Italian course (or Japanese/any other languages) on Clozemaster.
As a native Japanese speaker, I played 900 sentences of the Japanese/English course so far. As you probably noticed, I reported many errors to the forum and also to the admin by pushing the “report” button. I estimate that the error rate is around 20%. It’s way too high. However, most of them have not been handled yet. It’s better to fix critical errors in the Japanese/English course first, and then expand it to Japanese/Italian in order to minimize the maintenance costs.
FYI: I also reported many errors in my target language, Indonesian. Unlike the Japanese course, the Indonesian report handlers have updated 150+ sentences upon my error reports since February 2022. On the other hand, I got only 2 update email notifications in the Japanese/English course. I guess there is no active error report handler available in Japanese at this moment.
I am hoping that the recent change on the maximum number of lessons per day for non-PRO users would accelerate the speed of improvement in the Japanese/English course. It costs much more to hire a good report handler in Japanese/English than in Indonesian/English. The recent change will definitely acquire more PRO subscribers, and the additional revenue can be partially allocated to hiring costs.
3 Likes
Thanks @MsFixer for your interesting opinion on this.
I understand your points, and they are valid. Japanese sentences have many errors, and as a learner I might miss them.
Not all errors are the same, however:
I think the most important thing is that the sentence must be correct and natural in the target language.
If the translation is not accurate, it is less of a problem, I think, as it is hard to translate between the languages in a way that is both accurate and succinct.
I hope clozemaster will have the resources to act on the problem reports more promptly, but I have noticed from time to time some of my reports have been taken care of (but I got no mail notification).
Anyway, I do not need the new course right away, it was more an inquiry to know if it could be in the cards, and if me contributing to Tatoeba could be helpful.
Regarding Italian and Tatoeba, the good news is that there are about 3300 (they were around 2000 when I started a month or so ago) Japanese sentences with a direct translation to Japanese, and about 5300 Italian sentences with a direct Japanese translation. So, it hasn’t a lot of Tanaka corpus baggage, which going forward could produce a better sentence set. At the beginning, I tended to mostly try to translate sentences as I was exercising here, but lately, I try to translate sentences that are tagged OK. There are not a lot of those though.
2 Likes
Thank you @mike-lima for your reply. It’s good to know that the Japanese/Italian pairs on Tatoeba are less influenced by Tanaka Corpus (田中コーパス). I really hope that the Japanese/English CM course will remove pairs from Tanaka Corpus, and import more reliable ones from other sources.
Here is the summary of errors I detected. I played 931 sentences out of 80K across the all Most Common Word Collections evenly. The error rate is extremely high…
- Note 1: “Somehow unnatural in JP” means “understandable, but I would definitely recommend to rephrase it by other expressions”.
- Note 2: “Translation unmatched” means that JP and EN both sound natural, but they mean differently (e.g. “This is an apple” in Japanese and “That is an orange” in English).
- Note 3: “Unnatural word split” is like this: “This is a calendar” is a target sentence, for example. I sometimes find a cloze-word like {{lend}} instead of {{calendar}} – those are totally different words.
- Note 4: I don’t double-count errors. If a single sentence pair has more than one errors, I categorize it only into the most critical one.
I will keep playing the lessons as a sort of “alpha tester” and reporting errors though it will be very slow.
4 Likes