As I understand it Chorusing is saying the words directly on top of the recording (as heard in the speakers earphones) with the goal to sync the tone curve and inflection with the speaker. Shadowing is similar but following with a delay so it is getting used to speaking the words but not specifically matching the tone and inflection curves. I think Chorusing is specifically for accent, pronunciation, rhythm work as opposed to Shadowing’s focus on familiarity with the phrase and words.
In Clozemaster’s case, because we have the text in front of us as a crutch, Chorusing seems do-able.
I think I would prefer to set a certain number of repeats in advance so I will have fore knowledge of the number of times to repeat and have made a mental commitment for the process. So, I would prefer “set it for x times (3,5,10,15,20) spinner” rather than to have to press stop.
I don’t think a counter will be required if the user has already made a pre-committement to a chosen reps, but perhaps is a nice to have, though the UX will be more complicated. (I would prefer to do quick and easy change in settings to the behavior of the play button, and see if the function is used before changing main screen UI at all)
As you mention, pressing next should immediately kill it and move on to the next sentence.
There will need to be a small break between plays, which will depend on if the speech files have any included buffer or are exactly trimmed. Something like 1/2-1 second I think, but I’m guessing and this will need experimentation, but I am assuming that is can be a single global setting and not something complicated or variable.
With the current course I’m doing, we are recording a track of ourselves on every play and then going back to listen to it. It is valuable and instructive but comes with a bunch of complications. The sound curves for the user will not be matched in time to the voice file due to tech (Bluetooth for example), and human response time. Matching those curves is tricky and an art. The level of complexity to record, align, match the audio level etc to do that well vs. just simply repeat the play 5 times is in completely different realms.
Using Audacity, I spend a bunch of time aligning the recordings on the timeline, adjusting the soundtrack volumes so I can clearly hear myself, put the speaker at about the same volume - and all of that is specific to the devices used, speaker’s natural volume, volume of the sound file, etc etc dependant. I think that is far beyond the scope - and Pareto curve of value - for Clozemaster.
There is a big quick win here, and recording would be nice, but too many downsides for usability, compactness, programming etc.