From Friday, April 19th (11:00 PM CDT) through Saturday, April 20th (2:00 PM CDT), 2024, ni.com will undergo system upgrades that may result in temporary service interruption.

We appreciate your patience as we improve our online experience.

Example Code

Using LabVIEW for Voice Signal Analysis

Products and Environment

This section reflects the products and operating system used to create the example.

To download NI software, including the products shown below, visit ni.com/downloads.

    Software

  • LabVIEW

Code and Documents

Attachment

Description

Overview


Voice signals, which contain pathologic information, play an important role in today’s clinical diagnosis. You can use the LabVIEW Advanced Signal Processing Toolkit to design voice signal analysis applications. This article describes some practical examples of voice signal analysis by using the Advanced Signal Processing Toolkit.

Voice Signal Fundamentals

When you pronounce a vowel or a voiced consonant, the vocal cords periodically vibrate to generate glottal flow. The glottal flow is composed of glottal pulses. The period of a glottal pulse is the pitch period. The reciprocal of the pitch period is the pitch, also known as the fundamental frequency. The vocal tract acts as a time-varying filter to the glottal flow. The characteristics of the vocal tract include the frequency response, which depends on the position of organs, such as the pharynx and tongue. The peak frequencies in the frequency response of the vocal tract are formants, also known as formant frequencies.

In signal processing, a voice signal is a convolution of a time-varying stimulus and a time-varying filter. The time-varying stimulus is the glottal flow. The time-varying filter is the vocal tract.

The following figure shows the formation of a voiced signal in signal processing.

Figure 1. Signal Processing View of a Voiced Signal

Researchers study formant tracks and pitch contour to understand how formants and pitch evolve over time. This article describes how to detect formant tracks and pitch contours from voice signals by using the National Instruments LabVIEW graphical development environment. This article includes the Voice Signal Analysis demonstration VI, which is a voice signal analysis application built with LabVIEW and the Advanced Signal Processing Toolkit. The figures in this article are results from this VI.

Voice Signal Analysis for Clinical Diagnosis

Many researchers conclude that pitch and formant are two of the most important characteristics of voice signals. The following studies are some examples that support the importance of studying pitch and formant for clinical diagnosis of diseases that affect the voice:

• Rehan A. Kazi[1] concludes that the formant frequencies of laryngectomy patients are higher than the formant frequencies of normal subjects.

• Margaret Skinner[2] shows the importance of formants to the speech recognition process.

• Gregory K. Sewall[3] claims that patients with Parkinson’s-related dysphonia usually have reduced pitch ranges and increased vocal tremor.

• Iain R. Murray[4] indicates that pitch varies when people experience different emotions.

You can observe changes in voice signals, such as changes in formant tracks and pitch contours, during clinical diagnosis of certain diseases.

For example, in the following intensity graphs, you can observe a noticeable difference in the formant tracks of a male patient with Obstructive Sleep Apnea Syndrome (OSAS) before and after a tonsillectomy and uvulopalatopharyngoplasty. Figures 2(a) and 2(b) show the formant tracks of the patient pronouncing a single vowel before and after the operations, respectively.

Figure 2. Formant Tracks of an OSAS Patient Pronouncing a Single Vowel

Courtesy of Jack Jiang/UW-Madison and Yi Zhang

In the previous figure, the black areas indicate the amplitude of the frequency response of the vocal tract. The shade of darkness in these areas indicates the amplitude level. For example, a completely black area represents amplitude that is higher than a gray area. The red areas indicate the position of formants.

According to Figure 2, the OSAS patient has different formant tracks than a person without OSAS. The different formant tracks indicate that the OSAS patient has difficulty adjusting the vocal tract.

In the following graphs, you can observe the pitch contours of a patient with Unilateral Vocal Cord Paralysis (UVCP) before and after an Autologous Fat Injection (AFI) operation. Figures 3(a) and 3(b) show the pitch contour of the patient pronouncing a single vowel before and after the operation, respectively.

Figure 3. Pitch Contours of a UVCP Patient Pronouncing a Single Vowel

Courtesy of Jack Jiang/UW-Madison and Yi Zhang

According to the previous figures, when the UVCP patient pronounces voiced sounds, the patient is unable to maintain a stable pitch. An AFI operation helps stabilize the pitch. Therefore, you might be able to analyze the pitch variance after an AFI operation to evaluate the success of the AFI operation.

Using LabVIEW to Detect Formants and Pitch

You can detect formant tracks and pitch contour by using several methods. The most popular method is the Linear Prediction Coding (LPC) method. This method[5] applies an all-pole model to simulate the vocal tract. Figure 4 shows the flow chart of formant detection with the LPC method.

Figure 4. Formant Detection with the LPC Method[5]

In Figure 4, applying the window w(n) breaks the source signal s(n) into signal blocks x(n). Each signal block x(n) estimates the coefficients of an all-pole vocal tract model by using the LPC method. After calculating the discrete Fourier transform (DFT) on the coefficients A(z), the peak detection of 1/A(k) produces the formants.

Figure 5 shows the flow chart of pitch detection with the LPC method. This method[6][7] uses inverse filtering to separate the excitation signal from the vocal tract and uses the real cepstrum signal to detect the pitch.

Figure 5. Pitch Detection with the LPC Method[6][7]

In Figure 5, the source signal s(n) first goes through a low pass filter (LPF), and then breaks into signal blocks x(n) by applying a window w(n). Each signal block x(n) estimates the coefficients of an all-pole vocal tract model by using the LPC method. These coefficients inversely filter x(n). The resulting residual signal e(n) passes through a system which calculates the real cepstrum. Finally, the peaks of the real cepstrum calculate the pitch.

The Advanced Signal Processing Toolkit includes the Modeling and Prediction VIs that you can use to obtain the LPC coefficients or AR model coefficients, as shown in Figure 6 and Figure 7. The LabVIEW Signal Processing VIs include the Scaled Time Domain Window VI and the FFT VI. You can use these VIs to apply a window to the source signal and calculate the DFT of the signal block, respectively.

Figure 6. Formant Detection with the LPC Method by Using LabVIEW

The Advanced Signal Processing Toolkit also includes the Correlation and Spectral Analysis VIs that you can use to calculate the real cepstrum of a signal, as shown in the Figure 7.

Figure 7. Pitch Detection with the LPC Method by Using LabVIEW

In these previous figures, you use the WA Multiscale Peak Detection VI to detect the resonance peaks in the cepstrum.

Summary

Voice analysis and the study of voice characteristics, such as formants and pitch, are increasingly important in today’s clinical diagnosis. By using LabVIEW and the Advanced Signal Processing Toolkit, you can calculate formant tracks, pitch contour, and other voice signal-related statistics.

Voice Signal Analysis VI

The Voice Signal Analysis VI requires the LabVIEW Run-Time Engine 8.5. You can download a free copy of this software.

The LabVIEW graphical development environment and the Advanced Signal Processing Toolkit include many analytical VIs which you can use for formant analysis. Refer to the LabVIEW Help for more information about these VIs.

References

[1] Kazi, Rehan A.; Vyas M.N. Prasad; Jeeve Kanagalingam; Christopher M. Nutting; Peter Clarke; Peter Rhys-Evans; and Kevin J. Harrington. Assessment of the Formant Frequencies in Normal and Laryngectomy Individuals Using Linear Predictive Coding. Journal of Voice 21, no. 6:661-668.

[2] Skinner, Margaret W.; Marios S. Fourakis; Timothy A. Holden; Laura K. Holden; and Marilyn E. Demorest. 1999. Identification of Speech by Cochlear Implant Recipients with the Multipeak (MPEAK) and Spectral Peak (SPEAK) Speech Coding Strategies II. Consonants. Ear & Hearing 20, no. 6:443.

[3] Sewall, Gregory K. MD; Jack Jiang, MD, PhD; and Charles N. Ford, MD. 2006. Clinical Evaluation of Parkinson’s-Related Dysphonia. The Laryngoscope 116:1740-1744.

[4] Murray, Iain R., and John L. Arnott. 2008. Applying an Analysis of Acted Vocal Emotions to Improve the Simulation of Synthetic Speech. Computer Speech and Language 22, no. 2:107-129.

[5] Markel, J. D. 1972. Digital Inverse Filtering—A New Tool for Formant Trajectory Estimation. IEEE Transactions on Audio and Electroacoustics 20, no. 2:129-137.

[6] Noll, A.M. 1967. Cepstrum Pitch Determination. Journal of the Acoustical Society of America 41 (February): 293-309.

[7] Markel, J.D. 1972. The SIFT Algorithm for Fundamental Frequency Estimation. IEEE Transactions on Audio and Electroacoustics 20 (December): 367-377.

Example code from the Example Code Exchange in the NI Community is licensed with the MIT license.