Secrets of listening handout

Download pdf handout: Secrets of Listening

Hearing the voice problem

When an individual has a problem with their voice, how do we determine the specific problem and then the cause of that problem? If the vocal cords alter pitch, volume and clarity, we can strive to set all of these parameters constant, then by varying only one at time, we will most efficiently find our problem.

Generally, the patient doesn’t even need to speak words during an examination assessing a vocal problem. Singing is far more fruitful for finding a vocal problem than speaking. Perhaps I should say, making various, continuous vowel sounds is most helpful, as some people feel they cannot sing. If the examiner can listen to a sound at various volumes and various pitches, a vocal impairment will be more easily elicited.

Robert Bastian has described this process originally for vocal cord swellings. Extending that inital thought process can be utilized for all vocal cord impairments. By modifying one parameter at a time, when we hear the problem, we will know quite likely where the problem lies, even before we look at the vocal cords. Then when we look, we will not be distracted by the color of the vocal cords, for instance. We will go straight to where the problem should be and find it. 

During all of these tests, the examiner is really an audio engineer, an engineer listening for a signal – a clear tone. The examiner is also searching for any increase in noise. An excellent voice has a high signal to noise ratio, that is, a lot of clarity and minimal, unwanted, non-harmonic sound.

The vowel /i/

There are some advantages to using the vowel sound “ee” for most of a vocal exam. In phonetic English I would write /i/ (or /i:/) as in the word feel or bead. This vowel utilizes the most upright position of the larynx and most open position of the pharynx or throat above the voice box. It makes examination of the vocal cords with an endoscope easier. The /u/ sound is a close second, as in who or boo. The other vowels /æ/, /e/, /o/ tend to move the tongue and epiglottis back, narrowing the throat and make visualization of the vocal cords more difficult. 

Remember a benefit of simplification, to stabilize all parameters except one and then to vary that parameter. I pick one vowel and generally stick with it during my exam, so there is one less variable to deal with. Then I put the voice through a series of tests with this vowel. 

Why a series of tests?

A familiar medical analogy comes from cardiology. A person complains of chest pain and an EKG test is performed. If the result appears normal, rather than telling the patient “you are normal,” the cardiologist continues with a stress test, having the patient run on a tread mill while still hooked up to the EKG, perhaps even to the point of reproducing the chest pain during the test. Now the EKG appears abnormal in association with the elicited pain. Many more problems will be found by stressing the system, whether it is the heart pumping blood or the vocal cords making sound.

If our patient just says /i/, there may be no obvious hoarseness at one particular pitch or one particular volume, yet it may appear at another pitch or volume. In fact, most patients, and especially performers with a vocal problem, do everything they can to avoid sounding “bad” in the doctor’s office. It is up to the examiner to stress test the voice. 

Of course, for many people just showing up at the doctor’s office is stressful. Then the thought of singing to the doctor elevates stress to a new height. That is the kind of stress, which leads the patient to avoid sounding bad. The goal of our stress test is to elicit vocal impairments so that we hear the actual problem.

Vocal exam


Record the patient’s voice by putting on a headset microphone held lateral to and near the front of the mouth. A headset keeps distance from the mouth relatively constant during an examination and between examinations. Keeping it to the side of the mouth avoids the recording of air blown out through the lips during plosives. 

An audio recording of vocal capabilities documents the vocal functional status of the larynx, including the motor nerves, muscles and mucosal covering of the vocal cords and outlines vocal limitations. 

A recording has value in several ways because: 

  1. sound impairments happen so fast that they can easily be missed. A recording can be reviewed multiple times.
  2. the only method to go back in time for comparison is to have already made a recording. 
  3. optimal evidence (legal or medical) that no unintentional change has occurred during an intervention or that change had occurred before the intervention is from a recording. 
  4. physicians who operate near the recurrent and superior laryngeal nerves would have a much better sense of how often the nerves are injured both temporarily and permanently and could offer their patients reasonably accurate estimates during a presurgery conference as well as alter their future surgical techniques based on this feedback. 
  5. an audio recording is a far more accurate record for comparison than a physician’s memory or written notes or even a phonetogram without sound. 
  6. recording from a microphone attached to a laptop computer takes little effort, less than 5 minutes of time and costs are minimal.

The goal of the recorded evaluation is to explore the capacity of the voice; removing any compensation that may be hiding the impairment the patient is experiencing. A stereotyped exam has a benefit of standardization, not missing information. The following tasks, performed listening for changes in signal and noise seem to document a fairly complete vocal exam: 

  1. reading aloud, 
  2. maximum phonation time, 
  3. pitch range — low, 
  4. pitch range — high, 
  5. high volume, 
  6. low volume — vocal swelling tests,
  7. vegetative sounds.

Recording digitally leaves a signal that may be viewed as “volume vs time” in a video editing program such as Final Cut Pro.

This is a visual representation of an audio recording, in the editing program FinalCut Pro™ of a typical 4-5 minute recording of the above vocal tasks. Volume is on the vertical axis and time on the horizontal axis. This view of volume vs time allows me to quickly identify various tasks and re-listen to them.

For example, the highest spikes in this recording represent the testing of a yell or loud phonation. They are yellow and then red near the tip and nearly reach the top of the recording box. Reaching the red area of a signal is a caution to the sound engineer (laryngologist) that they are reaching saturation of the digital recording and clipping of the signal. This means replaying the recording, the sound will not be truly representative of what was made by the patient.

One trick for getting around this problem is to record two channels simultaneously. Most programs record at least two channels as in the case of stereo. With a splitter on the microphone into the preamp, amplification is reduced on one channel relative to the other. Then in this case where the sound has saturated one of the channels, I can listen to the loud sound on only the other, less amplified channel (without clipping distortion).

With a single microphone, the split signal is recorded at differential volumes to expand the dynamic range of the recording.

Noise & Signal

On each of these tasks, the assessment is in terms of signal (what is functioning correctly) and noise (what is not functioning correctly). Appropriate signal is a pure tone. Any unintentional non-harmonic sound is usually thought of as noise.


  • Air leak (white noise)
  • Roughness (Polyphonia)
    • Diplophonia is most common
      • Harmonic intervals
      • Non-harmonic intervals

Altered signal

  • Onset delays
  • Pitch breaks
  • Neurologic control findings
    • Regular oscillations of pitch
    • Irregular interruptions of sound

Signal when you don't expect it

  • Sound during breathing
    • Inspiratory white noise or tone
    • Expiratory white noise or tone