07-11-2017 03:55 PM
Hi,
I'm new to labview and am working on a program that, if successful, should be able to take an audio file of a person saying different numbers and reproduce what the person is saying, but in text. I've researched around and what I've gathered is that I need to make some sort of dictionary with pre-recorded sounds that can be compared with the recording, and I'm not sure where to start with that or if it's even the best way. Any help at all would be very appreciated!
07-12-2017 04:36 AM
The challenge is that sound is not 100% identical each time. If you say "Please" several times, the sound your body produces will be slightly different each time, even if you say it with the same speed and pitch. So a direct comparison by "equal" is not sufficient.
I consider that task to be a Ph.d. task at least. This is definitely breaking all bounds of a 'short term project'. What is your situation?
Please refer to speech recognition for more information on the complexity.
07-12-2017 09:03 AM
Thanks for your reply! I'm interning at a lab and this was the project assigned to me, and while I have a mentor available to help me, the expectation is that I complete the project almost entirely on my own. Voice recognition programs have been made in various different contexts, so I was hoping to learn from those and make my own program suited to my needs using those as a guide. I just started and I'm still learning, but so far I've been conflicted as to what exactly I even need to do the program, as the only thing the program would need to recognize are a series of numbers from 1-10, but from various people and different noise-to-sound situations. Would I require each different voice to create a dictionary of their own numbers to contrast later recordings with, or would I just need many different pre-recorded voices to somehow make a guideline of how each number sounds on its own that I could compare with other, different voices? Any advice you could give me would be greatly appreciated!
07-12-2017 09:32 AM
@anna770 wrote:
Thanks for your reply! I'm interning at a lab and this was the project assigned to me, and while I have a mentor available to help me, the expectation is that I complete the project almost entirely on my own. Voice recognition programs have been made in various different contexts, so I was hoping to learn from those and make my own program suited to my needs using those as a guide. I just started and I'm still learning, but so far I've been conflicted as to what exactly I even need to do the program, as the only thing the program would need to recognize are a series of numbers from 1-10, but from various people and different noise-to-sound situations. Would I require each different voice to create a dictionary of their own numbers to contrast later recordings with, or would I just need many different pre-recorded voices to somehow make a guideline of how each number sounds on its own that I could compare with other, different voices? Any advice you could give me would be greatly appreciated!
If this is a Windows PC, you can always take advantage of Microsoft Speech API in LabVIEW. There is at least one example out there, but I can't seem to find it ATM.
07-12-2017 09:33 AM
I strongly recommend you to talk to your mentor. This topic definitely blows any kind of internship as it is very complex.
If you can define enough constraints (e.g. recorded voice is trained directly, no background noise, equal level/pitch, ...) it is rather simple as a waveform envelope will do ('predefined pattern match'). But anything above is more than an internship can ever realize.
Please refer to this outdated discussion for some information.
However, there is good news: Instead of going into the details of sound analysis, you might have the option to include and interface existing speech recognition software.
07-12-2017 09:55 AM
Thanks for your timely response! I'll definitely consult my mentor, but one last question: If I were to integrate an open-source program written in a different programming language (for example, java or python) would it be possible to somehow put it into a labview program?
07-12-2017 09:58 AM
For C/C++ you can use the Call Library Function Node (calls into C/C++ DLLs).
For .NET you can use the .NET nodes (constructor, property, invoke).
07-14-2017 01:31 PM
Hi, thanks for your answer! How would I be able to integrate the Microsoft Speech API and what would it do?
07-16-2017 10:45 AM - edited 07-16-2017 10:51 AM
Well, I couldn't find the link to the example - but I do have the example on my computer!
I've found that MS Speech likes to guess at commands a lot, and this example adds your set of commands to the MS command list, so some of the guesses can range from amusing to annoying to downright dangerous. Unfortunately, it's a lot more difficult to instantiate your own instance of MS Speech API, and I haven't even been able to do that successfully yet.
05-10-2018 03:57 PM
You should check this out...