Split Spanish

BanachTarski · November 13, 2021, 9:24am

Could we split Spanish at least into Spain/ Latin America? It’s annoying having to manually go through tons of vocabulary with zero relevance to my life.

zzcguns · November 13, 2021, 10:51am

I do understand your frustration/annoyance at this, and it is also the same in other languages such as Portuguese (Brazilian vs European) and English (American vs British or Australian etc.).

Unfortunately, as Clozemaster sources these sentences from Tatoeba, and since Tatoeba doesn’t make any distinction, then I don’t think that there is an easy fix for this. Note also, that most (if not all) popular word frequency lists also don’t make any distinction between dialects and regional variations, as the source for their word lists are normally very mixed in their place of origin (for example subtitles from lists and TV shows).

Therefore, the most likely way of fixing this would be to hire a native speaker for each language to go through and sort sentences when they are specific to one dialect or specific form of a language. There are also other parts of Clozemaster where getting a native speaker might be very helpful too (see for example the posts from @alanf_us about stress markings to aid Russian pronunciation).

However, there then comes the question of whether Clozemaster has the number of Pro subscribers (yet!) to support a business case for these measures on a large scale for each language i.e. for whole collections as opposed to responding to individual user queries (I don’t know how many Pro subscribers Clozemaster has at the moment, or whether there are enough Pro subscribers learning a specific language such as Spanish to justify this on a single-language basis).

I do note though, that issues such as filtering on regional variations (and verifying stress markings in Russian) would only need to be addressed once per target language, as opposed to having to be done for each language pairing.

If there are any tools out there that could do these things automatically for these languages, then that would be a different matter, and I would be wholeheartedly in favour!!

If it were to be possible to set up separate streams for the dialect/regional variations of the target languages, then there is also the question of which sentences would this apply to.

I personally think that just having separate collections for the Fast Tracks would be very helpful, even if the source languages stayed in mixed form (e.g. Latin-American vs Castilian Spanish Fast Track, Brazilian Portugues vs European Portuguese Fast Track; however these separate tracks would just use an unfiltered source language, for example English as opposed to separating out American vs British/Australian English etc.).

If this Fast Track filtering approach were possible, then a new user could click on Spanish from XXX, and then have two or more additional Fluency Fast Track collections to choose from if they wished to learn one particular stream of the language.

In summary, I support what you are saying, I’m just not sure how easily it could be achieved.

mike · November 18, 2021, 10:08pm

@zzcguns provided a better response than I could have and hit all the key points While having different collections, or even different language pairings, for different dialects would be awesome, we’re still likely a ways off from getting it implemented. It’d also be especially useful for Portuguese (Brazilian vs European).

What’s the vocabulary you have in mind? Which Spanish are you learning? If it’s a small set of words that keep popping up, you could use the Manage Collection feature to search for those words and ignore them.

datsunking1 · May 6, 2024, 4:34pm

Hey Mike, just following up on this from search. I had a prompt pop up probably a week ago asking for feedback if Castilian Spanish and LATAM Spanish (I think Mexico?) should be split.

I think that’s a fantastic idea and was wondering if that is being worked on?

I have every use for Mexican/Latam Spanish, and virtually none for Castilian.

Thanks!

SpringPaper · May 6, 2024, 8:31pm

I have an AI program and interestingly, it has Portuguese (Brazilian) and Portuguese (European) as options…but only Spanish.

From my perspective, it would probably be best to leave as is (Spanish). While there are obvious differences (vosotros, different pronunciations, different meanings for certain words, etc.)… there are many similarities. In fact, Spanish from Argentina (Uruguay, Paraguay, and parts of Chili…the ‘southern cone’) has as many differences from most Latin American countries as does Spain.

Plus, some Spanish countries use ‘vos’ whereas most do not.

English, like Spanish, is spoken a bit differently throughout the English speaking world. Both languages are diverse in how it is spoken. Trying to separate them into different categories may be a bit of an overkill, IMO.

ianparri · May 7, 2024, 9:44am

Languages should be taught in their entirety, I feel, especially as there aren’t that many differences between Castillian and Latin American Spanish. The English spoken in Ireland is different to USA or Canadian English. Are we suggesting that speakers of the same language shouldn’t try to understand one another?

into7 · May 12, 2024, 5:09pm

people put far too much emphasis on the difference between the spanish in spain and the spanish in latin america. Latin american spanish doesnt even exist, there are as many differences between the spanish speaking countries.

MikeInTaiwan · May 14, 2024, 1:45am

I wonder if this is an interesting problem to point an AI agent at. I can imagine that throwing this to ChatGPT to classify as either neutral, clearly European or clearly Latin America would be well within its ability. Is it worth the compute time and cost though is a question.

drbespanol · October 3, 2025, 12:25pm

Obviously, coming very late to this party. Regarding the motivation for splitting Spain/LA Spanish, for me it is a question of efficiency. I’d love to have a vocabulary of 100,000 words (and am working on it :-), and know the native food words or kitchen appliance words in every country, but I’m not there yet. So, I’d rather prioritize a complete vocabulary for the LA region (understanding there are still many regional variations.) When I talk to my Venezuelan daughter-in-law, many of the Spain Spanish words are new to her. Like MikeInTaiwan, I think ChatGPT could do a good job of categorization. And, if that exercise yielded a small number of sentences with a difference, then we’d put this to rest. If not, either a filter for regional variations or dedicated sentence collections could be a possible outcome. In the meantime, I agree with several who say to use the tools available to manage this. No one is forcing me to learn a new word I don’t consider relevant

morbrorper · October 4, 2025, 6:19am

My view on this is that since no other Internet resource that I know of (RAE, Wiktionary, SpanishDict, Linguee, Word Reference, etc.) has made a similar split, I don’t think Clozemaster should, or even could, do it either.

What could be done is to enable tagging of clozes and filtering on tags. But would I trust an AI bot to set those tags? I’m afraid that an over-confident AI would enthusiastically tag even neutral sentences as one or the other.

alanf_us · October 4, 2025, 6:55pm

I think splitting Spanish as a whole at Clozemaster would not be a good idea, but there’s no reason that it can’t offer, in addition to the other collections, ones that are specialized for geographical areas. For instance, in Portuguese, there’s already a “Brazilian Slang and Internet Expressions” collection.

morbrorper · October 5, 2025, 6:52am

For Spanish from English, there is the “Mexican Spanish” collection, which contains a lot of idioms that sound non-Castilian to me, but also sentences that could be Castilian, for what I know:

¿El precio ya incluye el IVA ? Does the price already include VAT? (IVA is definitively used in Spain.)
¿Qué harás el fin de semana ? What are you doing this weekend? (fin de semana is, to my knowledge, used in every region.)
¿Cuánto piden por estas chanclas ? How much are these sandals? (I don’t know, maybe chanclas is shared between Spain and Mexico, but not used in some other parts of Latin America?)

This only goes to show how difficult it is to do the classification, when you have a number of sometimes overlapping subsets.

corgwin24 · October 7, 2025, 1:38am

morbrorper, I agree with you about not trusting AI to do any task of importance. There are some articles describing the problem that if AI doesn’t know an answer, it makes one up.