fatoldgit
Addicted to Fun and Learning
- Joined
- Feb 29, 2020
- Messages
- 531
- Likes
- 759
So I have written (in a ksh script) a voice control interface for my music playback GUI and it works well except for one situation.
The issue I have is all the english models (across any number of voice recog tools I have tested) are based on US English.
But I am a Kiwi (NZ'er) and every one of these available Linux VRT's, while they work fine for keywords like "up", "down", "left", "right", "next", "previous" etc, do a crap job with single alpha utterances (A, B, C, D etc).
They can't tell my "A" from my "I", my "B" from my "E" etc. I even tried non-english models (I dont care what text they spit out as long as it's unique)
Some models appear to support training but aside from the complexity of that (it involves lots of discrete steps), you literally have to install many GB's (sometimes > 10GB) of "stuff" to do this.
Also a primary concern is something that will be installable in 10, 20,30 years time and must be "offline" (i.e. does not need a cloud resource).
So the requirement is simple: I have say 50 utterances that need to be recognized and I can easily setup a loop that listens and records from a mic and can trim out the pre and post silence.
What I haven't been able to find is some simple, durable Linux set of CLI commands that can compare a captured utterance against a master WAV file (the loop is simple: grab the sound from the mic and loop through comparing against my master set of WAV file utterances).
No issue having to compile from source...In fact I prefer that as it means I can support that well into the future (my entire playback stack is compiled from C/C++ source code)
So what I need is something that can compare two WAV files and produce (as a stdout value) a "confidence" level that they are or aren't the same...whether that is a direct compare or needs to produce a secondary file (a "fingerprint") which is then compared, I don't care.
Any help greatly appreciated.
Peter
PS. You can probably tell from my use of ksh, C and C++ that I am an old skool Unix, latterly, Linux dev of some 45+ years experience so these are my "go to" tools for backend software development (popen is very robust for integrating C/C++ with a ksh script where you need to interrogate the return values and with ksh you have the whole world of the Linux command line at your disposal [find, awk, sed, cat, sort, grep, xdotool etc etc])
The issue I have is all the english models (across any number of voice recog tools I have tested) are based on US English.
But I am a Kiwi (NZ'er) and every one of these available Linux VRT's, while they work fine for keywords like "up", "down", "left", "right", "next", "previous" etc, do a crap job with single alpha utterances (A, B, C, D etc).
They can't tell my "A" from my "I", my "B" from my "E" etc. I even tried non-english models (I dont care what text they spit out as long as it's unique)
Some models appear to support training but aside from the complexity of that (it involves lots of discrete steps), you literally have to install many GB's (sometimes > 10GB) of "stuff" to do this.
Also a primary concern is something that will be installable in 10, 20,30 years time and must be "offline" (i.e. does not need a cloud resource).
So the requirement is simple: I have say 50 utterances that need to be recognized and I can easily setup a loop that listens and records from a mic and can trim out the pre and post silence.
What I haven't been able to find is some simple, durable Linux set of CLI commands that can compare a captured utterance against a master WAV file (the loop is simple: grab the sound from the mic and loop through comparing against my master set of WAV file utterances).
No issue having to compile from source...In fact I prefer that as it means I can support that well into the future (my entire playback stack is compiled from C/C++ source code)
So what I need is something that can compare two WAV files and produce (as a stdout value) a "confidence" level that they are or aren't the same...whether that is a direct compare or needs to produce a secondary file (a "fingerprint") which is then compared, I don't care.
Any help greatly appreciated.
Peter
PS. You can probably tell from my use of ksh, C and C++ that I am an old skool Unix, latterly, Linux dev of some 45+ years experience so these are my "go to" tools for backend software development (popen is very robust for integrating C/C++ with a ksh script where you need to interrogate the return values and with ksh you have the whole world of the Linux command line at your disposal [find, awk, sed, cat, sort, grep, xdotool etc etc])
Last edited: