I. Introduction Lexical knowledge representation at the phonological level of description is usually segmental, based on phonemes or feature representations. A standard feature bundle representation is just a simple sequence of segments which contains no explicit temporal information. For particular speech tasks, however, a nonsegmental description based on autosegmental representation, gestural scores or event representations may be more desirable. In the discussion which follows, we will extend our descriptions to deal with nonsegmental representations. We assume a feature classification in terms of multivalued features or tiers as defined in the following table.
V:
<> == Null
<place> == "<v-place>"
<phonation> == [voiced]
<manner> == [vowellike]
<v-place> == [central]
<height> == [mid]
<roundness> == [nonround]
<length> == [lax]
<featural> == [ "<phonation>" "<manner>"
"<place>" "<height>"
"<roundness>" "<length>" ].
This node defines a
featural template for vowels in terms of attributes or tiers and feature of
value specifications for these attributes. Information about the phonemic
segment is also provided for the individual vowels. We assume the neutral
central vowel schwa ([@]) to be the default case. Other vowels differ
from the neutral vowel with respect to vowel place, height, roundness or length.
V_@:
<> == V
<segmental> == @.
V_6:
<> == V
<height> == [low]
<segmental> == 6.
V_E:
<> == V
<v-place> == [front]
<segmental> == E.
V_ee:
<> == V_E
<length> == [tense]
<segmental> == e:.
V_O:
<> == V
<v-place> == [back]
<height> == [mid]
<roundness> == [round]
<segmental> == O.
A featural template
for consonants may be defined in a similar manner. In this case the default
consonant is taken to be [z] since for German at least, this leads to
the most economic representation. Other consonants are defined on the basis of
this default. Thus, for example, the definition of the segment [d] only
requires an equation specifying the value of manner to be plosive since all
other feature values are inherited from the specification of the default.
C:
<> == Null
<phonation> == [voiced]
<manner> == [fricative]
<place> == [alveolar]
<featural> == [ "<phonation>" "<manner>" "<place>" ].
C_z:
<> == C
<segmental> == z.
C_d:
<> == C
<manner> == [plosive]
<segmental> == d.
C_t:
<> == C_d
<phonation> == [voiceless]
<segmental> == t.
C_k:
<> == C_t
<place> == [velar]
<segmental> == k.
In what follows, we
assume the syllable structure in terms of onset, peak and coda which we defined
in the syllable structure. However, we must now also take the more detailed
phonotactic structure into account. In German phonotactics, for example,
syllables have the canonical form CSyllable:
<> == Null
<structure> == I[ <onset> II <peak> II <coda> I]
<onset> == "<phn onset first>"
"<phn onset second>"
"<phn onset third>"
<peak> == "<phn peak first>"
"<phn peak second>"
<coda> == "<phn coda first>"
"<phn coda second>"
"<phn coda third>"
"<phn coda fourth>"
"<phn coda fifth>".
If we want a fully
structured featural representation then we can obtain it from the value of the <structure
featural> path, whereas if we want a traditional unstructured phonemic
representation (in SAMPA ) then we can obtain it from the value of the <structure
segmental> path. For this to work as we intend, we need to define the
three utility punctuation nodes invoked in the equation for <structure> above. Dealing with the formal language punctuation in this indirect way may
seem unintuitive but it has the advantage that one can make the format of the
punctuation sensitive to the type of information requested.
II:
<> == Null
<structure featural> == I] I[.
I[:
<> == II
<structure featural> == [
<structure segmental> == /.
I]:
<> == I[
<structure featural> == ].
Syllable entries in
the lexicon inherit default information from the general Syllable node
which defines the syllable structure in terms of onset, peak and coda. Specific
information as to which segments make up the syllable are defined in the
individual entries by rules of referral.
We can now define syllable entries in our lexicon. The example entries shown here are the syllables /te:/, /E6/ and /dOk/. S_tee:
<> == Syllable
<phn onset first> == "C_t:<>"
<phn peak first> == "V_ee:<>".
S_E6:
<> == Syllable
<phn peak first> == "V_E:<>"
<phn peak second> == "V_6:<>".
S_dOk:
<> == Syllable
<phn onset first> == "C_d:<>"
<phn peak first> == "V_O:<>"
<phn coda first> == "C_k:<>".
From these node
definitions, taken together with the axioms for syllable structure, we can now
infer the following phonemic and feature representations for the syllable /te:/,
for example:
S_tee:
<structure segmental> = / t e: /
<structure featural> = [ [ [voiceless] [plosive]
[alveolar] ] ]
[ [ [voiced] [vowellike]
[front] [mid] [nonround]
[tense] ] ].
We omit the empty
segments (i.e., the second and third positions of the onset, the second peak
slot, and the complete coda). They can easily be removed using a technique which
will be described in more detail in the section on time map domains, below.
In the relative time domain, we are interested in the temporal relations which exist between features in an autosegmental representation. In the absolute time domain, on the other hand we are interested in token utterances with actual temporal annotations. What is required in speech recognition, for example, is a mapping from the absolute time annotations to a relative time domain in which the actual time is no longer required. Likewise, speech synthesis requires a mapping from the relative time domain to the absolute time domain since here also temporal annotations or at least average durations are required. In the rest of this section we will concentrate
on the the relative time domain. We will return to the absolute time domain in
the section on linguistic word recognition. In order to incorporate temporal
relations into our description of syllables we must define the symbols for
precedence (<) and overlap ( Syllable:
...
<rel-time> == Relations:<<>>.
The first two
equations of the Relations transducer are just a version of the IDEM transducer discussed in the section on finite state transducers, above (but
where variable $G ranges over features and square brackets). The next
three equations simply delete any empty feature bundles ([ ]). And the
final pair of equations introduce the temporal relations into the representation
(where variables $F1 and $F2 range over features).
Relations:
<> == Null
<$G> == $G <>
<[ ]> == <>
<[ [ ]> == <[>
<] [ ]> == <]>
<] [> == ] '<' <[>
<$F1 $F2> == $F1 o <$F2>.
We can now infer the following feature information for the example syllable /dOk/: S_dOk:
<rel-time structure featural> =
[ [ [voiced] o [plosive] o [alveolar] ] ]
<
[ [ [voiced] o [vowellike] o [back] o
[mid] o [round] o [lax] ] ]
<
[ [ [voiceless] o [plosive] o [velar] ] ]
Copy From http://www.informatics.sussex.ac.uk/research/groups/nlp/polylex/polynode91.html |
||||||||||||||||