In languages that have grammatical gender, there’s an issue with male/female voices being selected at random regardless of the gender of the speaker based on the grammar of the sentence.
I’ve seen people discuss this in the past, but nowadays this problem could easily be fixed with very little manual labour by having AI go through every sentence and tag it as “said by a man” or “said by a woman”, so there’s really no reason I can see not to do this (even if it weren’t 100% accurate, it would still be way better than 50%, and the errors could then be fixed manually in little time).
It would be a great experiment if somebody proficient with AI could throw a bunch of sentences in a few languages at it, asking the AI to classify them into “spoken by a male”, “spoken by a female”, or “gender-neutral”.
All the major multilingual models are able to classify Hindi sentences by gender of speaker, see for example this test I ran with ChatGPT 4o with 10 random sentences from Clozemaster:
@morbrorper And with a slightly different prompt, you can get a less verbose, more organized answer (this last one made a partial mistake by classifying the first sentence as spoken by a female rather than gender-neutral, however it’s not a full mistake since gender-neutral sentences can be spoken by a female. The other 9 sentences were classified correctly):