Chapter 8 Talking Computers

 1. The frame of the chapter is the following

8.1 How Writing Must Be Pronounced

8.2 Words and Sounds in Sentences

8.3 Synthesizing Sounds from a Phonetic Transcription


2. Outlines and Summary

        A system that will turn any kind of written text into synthesized speech is called a Text-To-Speech (TTS) system. The first step in turning a text into speech is to change it into a phonetic transcription. This includes making sure that all the abbreviations and symbols are properly transcribed.

        Once a proper phonetic transcription of the text is set up for synthesizing, the corresponding sounds searching is under problem. There are two approaches to this problem. In the first approach, which is called parametric synthesis, the target values for each sound (the frequencies o the formants, their intensities, and so on) are stored in a table. However, it is very hard to state rules that give all the details of how the parameters vary in continuous speech when joining sounds together. The second approach to the speech synthesis involves storing whole words or phrase of speech as large as possible and then joining them up. This is known as concatenative synthesis. But the main problem remains that the whole words, phrases would not be joined together properly.

         A common technique for speech synthesis is to use what is called “diphone synthesis”, in which a computer stored all possible sequences of two sounds in a language. All possible sequences are diphone sequences, which the last half of one segment and first half of the next are stored and used as the building blocks for complete utterances. Major problems still present include getting the right rhythm by the correct adjustment of the duration of the segments, and the right intonation, through consideration of the punctuation, the syntax, and even the meaning.

        Though computer can produce speech, it is far from the natural speech.




