Chapter3 The Singularity of Speech


This chapter covers several closely related topics, beginning with the anatomy and physiology involved in producing speech, which entails an explanation of the role of the airway above the larynx, the supralaryngeal vocal tract (SVT), and the nature of ¡§formant frequencies¡¨.

 First, explain how the formant frequency encoding-decoding process yields the high data transmission rate of speech. Then tackle with issues that concern the manner in which human perceive formant frequencies and how the length of a person¡¦s airway affects the formant frequency patterns that differentiate vowels and consonants.

     Although human speech has singular properties, the continuity of evolution is evident when we examine the aspects of vocal communication that are common to humans and other species.


1.          In contrast to Muller¡¦s work, later nineteenth-century descriptive studies of the larynx focused on its anatomy. The term ¡¨vocal folds¡¨ is often used to refer to the vocal cords, because they looked like folds when viewed from above by means of a mirror placed at the back of a person¡¦s mouth.

2.          The comparative electrophysiologic studies that we have suggest that many species have neural perceptual mechanisms that match the acoustic output of their laryngeal source and their supralaryngeal filter. Peterson showed neural responses in monkeys to stimuli having the formant frequencies of their species-specific vocal sounds. Studies of cats show that they are equipped with a neural device that tracks the fundamental frequency of phonation, F0. This is no mean achievement, because we have yet to make an electronic device or write a computer algorithm that will accurately track fundamental frequency, though hundreds of attempts have been submitted to the U.S. Patent Office since 1936. Systems for the detection of laryngeal pathologies, task-related stress and cognitive impairment that depend on accurate measurements of the fundamental frequency of phonation and other acoustic parameters still involve intervention by skilled human operators who must check the measures provided by automated computer-implemented procedures.

3.          Chapter 4 focuses on the apparent neural bases of human language and relevant experimental data.

4.          Gerstman developed an algorithm for computer-implemented speech recognition that used normalization coefficients derived from the F1s and F2s of a speaker¡¦s [i] and [u]. However, Gerstman¡¦s algorithm involves a listener¡¦s deferring her or his judgment of a vowel¡¦s identity until tokens that are known to be examples of [i] and [u] are heard. The algorithms developed by Nearey provided a solution that is a better match to the responses of human listeners. Moreover, as we shall see, the vowel [i] also is less susceptible than other vowels to formant frequency variations deriving from differences in tongue placement. The vocal tract is perturbed to an extreme position. In contrast to other vowels, the intrinsic muscles of the tongue are sometimes employed when producing an [i].

5.          The studies discussed in Chapter 4 suggest that vocal tract modeling most likely involves activating the neural substrate that also regulates overt speech. The effects of vocal tract normalization on speech perception have been noted in other psychoacoustic experiments. May, for example, notes a shift in the boundary for the identification of the fricatives [s] and [s] before the vowel [?]. The boundary shifted to higher frequencies for the stimuli produced with an [?] vowel corresponding to a shorter supralaryngeal vocal tract. 

6.          Riede and his colleagues analyzed one type of vocalization of Diana monkeys and showed formant transitions that appear to reflect either the monkeys¡¦ starting the call with their lips protruded and somewhat constricted or the opening of their lips. The formant frequencies of the calls deviate from those of rhesus macaque measured by Lieberman and approach those of the vowel [a]; but when the length of the monkeys¡¦ airways and the measured shape of the supralaryngeal vocal tracts are compared with those of human speakers, it is evident that they are not producing the vowel [a].