Nonsegmental Phonology



I.  Introduction

Lexical knowledge representation at the phonological level of description is usually segmental, based on phonemes or feature representations. A standard feature bundle representation is just a simple sequence of segments which contains no explicit temporal information. For particular speech tasks, however, a nonsegmental description based on autosegmental  representation, gestural scores or event representations may be more desirable. In the discussion which follows, we will extend our descriptions to deal with nonsegmental representations.
II. Feature representations

We assume a feature classification in terms of multivalued features or tiers as defined in the following table.

Figure 4: Default word structure for English and German
Tier Features
phonation {voiced, voiceless}
manner {plosive, fricative, nasal, lateral, vowellike}
place {labial, alveolar, palato-alveolar, velar, palatal, uvular, glottal}
v-place {front, back, central}
height {high, low, mid}
length {long, short, lax, tense}
roundness {round, nonround}
The notion of tier will be discussed in more detail in the next section. For now it is sufficient to assume that tier refers to an attribute or type of feature whereas the value of this attribute may be one of a number of features. On the basis of this feature classification, we can define the following general description of vowels:


    <> == Null
    <place>     == "<v-place>"
    <phonation> == [voiced]
    <manner>    == [vowellike]
    <v-place>   == [central]
    <height>    == [mid]
    <roundness> == [nonround]
    <length>    == [lax]
    <featural>  == [ "<phonation>" "<manner>" 
                     "<place>" "<height>" 
                     "<roundness>" "<length>" ].
This node defines a featural template for vowels in terms of attributes or tiers and feature of value specifications for these attributes. Information about the phonemic segment is also provided for the individual vowels. We assume the neutral central vowel schwa ([@]) to be the default case. Other vowels differ from the neutral vowel with respect to vowel place, height, roundness or length.
    <> == V
    <segmental> == @.
    <> == V
    <height>    == [low]
    <segmental> == 6.
    <> == V
    <v-place>   == [front]
    <segmental> == E.
    <> == V_E
    <length>    == [tense]
    <segmental> == e:.
    <> == V
    <v-place>   == [back]
    <height>    == [mid]
    <roundness> == [round]
    <segmental> == O.
A featural template for consonants may be defined in a similar manner. In this case the default consonant is taken to be [z] since for German at least, this leads to the most economic representation. Other consonants are defined on the basis of this default. Thus, for example, the definition of the segment [d] only requires an equation specifying the value of manner to be plosive since all other feature values are inherited from the specification of the default.
    <> == Null
    <phonation> == [voiced]
    <manner>    == [fricative]
    <place>     == [alveolar]
    <featural>  == [ "<phonation>" "<manner>" "<place>" ].
    <> == C
    <segmental> == z.
    <> == C
    <manner>    == [plosive]
    <segmental> == d.
    <> == C_d
    <phonation> == [voiceless]
    <segmental> == t.
    <> == C_t
    <place>     == [velar]
    <segmental> == k.
In what follows, we assume the syllable structure in terms of onset, peak and coda which we defined in the syllable structure. However, we must now also take the more detailed phonotactic structure into account. In German phonotactics, for example, syllables have the canonical form C${\sf_0}{\sf^3}$V${\sf_1}{\sf^2}$C${\sf_0}{\sf^5}$.This says that a syllable can have up to three consonants in the onset, one or two vowels in the peak and up to five consonants in the coda. We represent this as follows:
    <> == Null
    <structure> == I[ <onset> II <peak> II <coda> I]
    <onset> == "<phn onset first>"
               "<phn onset second>"
               "<phn onset third>"
    <peak>  == "<phn peak first>"
               "<phn peak second>"
    <coda>  == "<phn coda first>"
               "<phn coda second>"
               "<phn coda third>"
               "<phn coda fourth>"
               "<phn coda fifth>".
If we want a fully structured featural representation then we can obtain it from the value of the <structure featural> path, whereas if we want a traditional unstructured phonemic representation (in SAMPA ) then we can obtain it from the value of the <structure segmental> path. For this to work as we intend, we need to define the three utility punctuation nodes invoked in the equation for <structure> above. Dealing with the formal language punctuation in this indirect way may seem unintuitive but it has the advantage that one can make the format of the punctuation sensitive to the type of information requested.
    <> == Null
    <structure featural>  == I] I[.
    <> == II
    <structure featural>  == [
    <structure segmental> == /.
    <> == I[
    <structure featural>  == ].
Syllable entries in the lexicon inherit default information from the general Syllable node which defines the syllable structure in terms of onset, peak and coda. Specific information as to which segments make up the syllable are defined in the individual entries by rules of referral.

We can now define syllable entries in our lexicon. The example entries shown here are the syllables /te:/, /E6/ and /dOk/.

    <> == Syllable
    <phn onset first> == "C_t:<>"
    <phn peak first>  == "V_ee:<>".
    <> == Syllable
    <phn peak first>  == "V_E:<>"
    <phn peak second> == "V_6:<>".
    <> == Syllable
    <phn onset first> == "C_d:<>"
    <phn peak first>  == "V_O:<>"
    <phn coda first>  == "C_k:<>".
From these node definitions, taken together with the axioms for syllable structure, we can now infer the following phonemic and feature representations for the syllable /te:/, for example:
    <structure segmental> = / t e: /
    <structure featural>  = [ [ [voiceless] [plosive]
                                [alveolar] ] ]
                            [ [ [voiced] [vowellike]
                                [front] [mid] [nonround]
                                [tense]  ] ].

We omit the empty segments (i.e., the second and third positions of the onset, the second peak slot, and the complete coda). They can easily be removed using a technique which will be described in more detail in the section on time map domains, below.

III. Time map domain

 In the autosegmental representation of the syllable /dOk/ above, we indicated that a temporal interpretation is possible. The sequencing of information along the tiers is defined with respect to a temporal relation of precedence and the association between information on separate tiers is defined with respect to the temporal relation of overlap. Thus it is not required that overlapping autosegments have the same inherent duration but can vary with respect to their individual properties. Although this interpretation incorporates a temporal dimension into the representation by introducing the notion of an event defined with respect to property and time interval, this does not go far enough for speech applications where actual temporal annotations are required. Let us assume three temporal domains which refer to different perspectives on spoken language utterances: T$_{\rm cat}$, the category or hierarchical time domain,  T$_{\rm rel}$the relative time domain and T$_{\rm  
abs}$the absolute time domain.

In the relative time domain, we are interested in the temporal relations which exist between features in an autosegmental representation. In the absolute time domain, on the other hand we are interested in token utterances with actual temporal annotations. What is required in speech recognition, for example, is a mapping from the absolute time annotations to a relative time domain in which the actual time is no longer required. Likewise, speech synthesis requires a mapping from the relative time domain to the absolute time domain since here also temporal annotations or at least average durations are required.

In the rest of this section we will concentrate on the the relative time domain. We will return to the absolute time domain in the section on linguistic word recognition. In order to incorporate temporal relations into our description of syllables we must define the symbols for precedence (<) and overlap (a). We can incorporate this information into our representation by invoking an FST in our general Syllable definition:

    <rel-time> == Relations:<<>>.
The first two equations of the Relations transducer are just a version of the IDEM transducer discussed in the section on finite state transducers, above (but where variable $G ranges over features and square brackets). The next three equations simply delete any empty feature bundles ([ ]). And the final pair of equations introduce the temporal relations into the representation (where variables $F1 and $F2 range over features).
    <>        == Null
    <$G>      == $G <>
    <[ ]>     == <>
    <[ [ ]>   == <[>
    <] [ ]>   == <]>
    <] [>     == ] '<' <[>
    <$F1 $F2> == $F1 o <$F2>.

We can now infer the following feature information for the example syllable /dOk/:

<rel-time structure featural> =
       [ [ [voiced] o [plosive] o [alveolar] ] ] 
       [ [ [voiced] o [vowellike] o [back] o 
           [mid] o [round]  o [lax] ] ]
       [ [ [voiceless] o [plosive] o [velar] ] ] 

Copy From
Last updated 06/12/08