Simple solution: Write a VI with a button ("Press to Start"). Station a volunteer with a pair of headphones to listen to the audio and press the button when he/she hears human speech.
Much more complex solution: Learn a lot about audio signals, including how to characterize them in the frequency/time domain, and how to use this information to, perhaps, decide if the signals represent human speech. Once you know how to do this "theoretically", then start worrying about how to implement the algorithm.
Bob Schor