Currently, when we select a word from a sentence that we have already submitted an answer for, a dialog is shown that looks like this:
I use this dialog all the time to look up the word at Wiktionary, search for occurrences of the word at Tatoeba, or listen to the pronunciation at Forvo. But I’d like to suggest one more link, namely to Reverso Context for the given language pair. Here’s the link to the search results for “butterflies” for the English-Russian pair:
I find that Reverso Context complements Tatoeba well. The sentence pairs at Reverso Context are collected from real-world sources, while the ones at Tatoeba are mostly composed intentionally for the site. Both are useful for language learners.
That could be a valuable complement to the Tatoeba corpus, especially for those language pairings that have very few contributors and thus tend to be somewhat idiosyncratic.
Just to make it clear: the question of whether we should consider incorporating these sentences/translations into Clozemaster is separate from my original request, where I was asking for a link from the sentence dialog to a Reverso Context query. Incorporating Reverso Context sentence pairs into Clozemaster wouldn’t make those links any more or less valuable. The point of a link to a Reverso Context query is that it allows you to see, at once, a whole group of sentence pairs for the word you’ve selected. I hope we can do that regardless of whether we’ve imported some of those sentences from the site.
As for incorporating Reverso Context sentences into Clozemaster, it depends, first and foremost, on whether their license permits this. If it does, I would say it makes sense, but only if done via a careful manual process. Several issues that make manual selection and proofreading a necessity:
Some Reverso Context sentence pairs are rather loose in terms of a match between the original and translation.
Reverso Context text is sometimes more fragmentary than Tatoeba text (where individual sentences are meant to stand alone).
Reverso Context sentences are generally less closely proofread than Tatoeba text (which itself is not free from errors).
But just as importantly, we should make this an opportunity to address gaps in Tatoeba coverage. Strengthening content for language pairs that are insufficiently covered at Tatoeba would be nice, but there are far fewer language pairs covered at Reverso Context (though it’s possible that some of them are not covered sufficiently at Tatoeba). An especially important aspect of coverage that can be readily quantified in an automated way is vocabulary. (It’s harder to determine coverage for grammar.) The worst case, at least for language pairs that already have decent coverage in Tatoeba, would be if a bunch of new sentences were added from Reverso Context that were effectively no different from sentences already in Clozemaster that came from Tatoeba. If sentence A from Reverso Context is only trivially different from sentence B from Tatoeba that someone has already mastered, adding sentence A at 0% mastered would just mean they’d have to work harder (mark sentence A as 100% known, or review it again and again) to get it out of the way in order to find sentences with new vocabulary.
Also, the fact that we currently derive all our sentences from a single source, Tatoeba, and that Tatoeba handles sentences in such a transparent, open-source way, makes the “feedback” workflow simpler. If we find errors, we can report them, and we know the way that corrections are made at Tatoeba. It’s possible to report errors at Reverso Context, but it’s not clear to me whether there’s a way to track the fixes (if they’re ever made). And if we started getting sentences from two sources, the user interface would have to reflect that, so where it now gives you a way to report your error at Tatoeba, it would have to give you a way to report it at either of the two places it might have come from. Certainly doable, but more complicated.
Reverso is half ridiculous. They use open subtitles, which are only as good as the submissions (which are half insane). It’s better than nothing but there are other options (and they don’t remove errors as often as they should)… There is no comparison for colloquial languages/slang definitions, but it’s still not great. If given the choice between the two (reverso vs custom/crowdsharing options) - I’d chose to be able to share our created collections with others.
A language group I am a part of is planning on making a collection now using Sherlock Holmes English short story sentences and an actually published translated version. Sherlock Holmes and others like this are - depending on their edition- out of copyright. As is Alice in Wonderland. Agatha Christie.
Also -
Most publishers offer sample chapters of their ebooks. Including the translated editions. I bet you could do a deal with a bookseller or publisher to use the first chapter of some books and their translations (and they will get links to their books for purchase while you get free professionally translated content, and depending on the link, the potential for affiliate fees - maybe with pocketbook ereader as well… They offer a slew of language keyboards/interfaces (which kindle, etc. do not) and also free multiple language books, if you buy an ereader or use their app).
A lot of us would have no problem volunteering if you had this permission to align our preferred target language translation with the English version of… Say, Harry Potter book sentences, or similar. (Books with a lot of translated versions).
TL;DR - here are ways to improve clozes, make users much happier AND allow clozemaster to make extra money
(Because tatoeba is insanely bad, and not worthy of the awesomeness that is clozemaster).
@mike
I’m very skeptical about the legality of Reverso Context.
Some sentences on Reverso are sourced from OpenSubtitles.org, which is now added to “piracy site blocklists” in some countries such as Australia and Greece.
Then use the other half. Much of its content comes from sources that have nothing to do with open subtitles. I find it an invaluable place for seeing how words are used in the real world in the target language, despite the presence of errors in the text and translations.
One of Tatoeba’s main purposes is to serve projects like Clozemaster, which draws its sentences from there. If there were no Tatoeba, Clozemaster would not exist in its current form, and perhaps not at all. How “insanely bad” can a site be if it makes an “awesome” site possible?
@alanf_us
I don’t think it’s a good idea for Clozemaster to use Reverso Context. Some of the sentences in European languages on Reverso are sourced from ParaCrawl, which is run by several universities and co-funded by the European Union. Clozemaster can directly import real-life sentences from ParaCrawl so that it can eleminate illegal sentences from OpenSubtitles.
How can a sentence be illegal?
And anyway by adding a link that will redirect you to reversoContext, how can CM do something illegal by that?
For me Reverso Context is a very useful place to understand how a language use certain words or expressions, I use it all the time, and it will be nice to have a faster way to check something in ReversoContext.
@Adrianxu
It’s obviously copyright infringement. A movie is a copyrightable work. It cannot be translated into any languages without getting permission from the copyright holder because translation is a right of creating a derivative (i.e. a part of copyrights). That’s why OpenSubtitles is judged as a piracy site by Australian court. Distributing illegal content is also copyright infringement and should be taken down. One of the well-known take-down processes is called “DMCA” (an American copyright law). In order to avoid being sued by copyright holders, online service operators should remove content from their websites if it violates copyright laws. That’s why YouTube, for example, deletes thousands of videos every day.
How useful doesn’t matter. Your argument is like this: “I know the maker of this product violates child labor law, but Clozemaster should sell the product because I love it.”