Prosodic Phonology
I. Historical Development of
Prosodic Analysis
An
approach to the analysis of phonological structure which differs in fundamental
ways from virtually every other 20th century view was developed by
John R. Firth (1890-1960) at the School of Oriental and African Studies in
London, beginning in the late 1930s. Though it was never well defined, Firth’s
general position dominated discussion and analytic practice in Britain, at
least with respect to phonology, through the early 1960s. Firth and his associates
presented it largely in a number of detailed accounts of specific languages
which were produced.
The development of Prosodic Analysis
involved very little interaction with other theoretical frameworks, but it is
possible to see partial similarities with some other views. For instance, the
theory of long components, developed by Zellig Harris, also attributes
important aspects of phonological structure to units similar to prosodies,
whose scope is greater than a single segment. In the 1980s, Generative
Phonology also stressed the analysis and formalization of phonological
properties whose domain of specification is greater(or smaller)than a single
segment, and the analysis of structural units other than segments. (Please see Autosegmental
Phonology and Metrical Phonology)
(Source:
International Encyclopedia of Linguistics, Vol.3 & 4, Oxford Univ. Press)
In 1977,
Liberman and Prince proposed that, in addition to the hierarchical structures
embodied in syntactic surface structure trees, an adequate characterization of
sentences required a description in terms of a separate phonological hierarchy
whose constituents were not everywhere identical to those of the surface
syntax.
Liberman and Prince focused on the usefulness of a
hierarchy of phonological constituents for describing prominence relations
among the words and syllables of a sentence. They argued that the differences
between syntactic and phonological trees were confined to the word and lower
structures; above the level of the word, the branching of the two trees was the
same. Other investigators took up the concept of a hierarchy of phonological
constituents separate from (although related to) the syntactic hierarchy and
proposed differences between the two hierarchies above the word level as well.
It was argued that the new constituents made it
easier to state the phonological rules of a language that govern interactions
between sound segments or phonemes (e.g., in English, /t/ flapping is blocked
in certain prosodic contexts), as well as the phonological rules that govern
intonational, rhythmic, and pausing patterns that had proven difficult to state
in terms of syntactic structure (e.g., pauses, pre-boundary lengthening, and
boundary tones do not always occur at the boundaries of major syntactic
constituents in English). Additional evidence for a prosodic structure that
differs from the syntactic surface structure came from Gee and Grosjean (1983)
who presented performance structures that group words according to the length
of pauses inserted between them at slow speaking rates. These structures are
clearly distinct from the groupings suggested by the syntactic bracketing.
The research program suggested by the concept of a
phonological as well as a syntactic hierarchy of constituents, i.e., the
proposal of various hierarchies of constituents and notations for expressing
them and the search for phonological evidence to support them, has occupied a
number of investigators in the intervening decade and a half, among them Liberman
and Prince (1977), Selkirk (1980; 1984), Beckman and Pierrehumbert (1986),
Nespor and Vogel (1983), and Ladd (1986; Ladd and Campbell, 1991). The term
``prosodic constituents'' is now generally accepted for describing the
structures that characterize each proposed level, but consensus on what the
appropriate constituents are has proved difficult to achieve. In the following,
we will briefly summarize some of the hierarchical structures that have been
proposed by these linguistic theorists, before turning to a description of the
hierarchy used in the present study.
Many
recent phonological theories have either been inspired by, or proposed in
reaction to, the work of Chomsky and Halle (1968). Attempting to develop a
general accounting of English sound structure, they proposed a transformational
approach to grammar in which an abstract representation of a sentence's
meaning, the ``deep structure,'' is transformed into a ``surface structure.''
The surface structure contains a complete syntactic bracketing of the sentence,
and a variety of phonological rules were proposed to describe the process by
which the surface structure is transformed into the phonetic representation,
which actually describes the sounds to be produced. Chomsky had observed
earlier that syntactic phrases did not always correspond to the perceived
phrasing in speech. Consequently, in the Sound Patterns of English,
``readjustment'' rules, which alter the surface structure to partition it into
``phonological phrases'' that may differ from the phrasing in the syntactic
bracketing, were introduced. These rules may also modify or delete boundaries
between distinct lexical items. As a result, the perceived phrasing of a spoken
sentence is not necessarily the same as the syntactic structure, or bracketing,
of the surface structure, although the two are certainly related.
As noted
above, Liberman and Prince (1977) formalized the idea of a phonological
hierarchy by proposing a phonological tree that, in its branching below the
level of words (which they refer to as ``mots''), accounted for some of the
prominence relationships between syllables in a sentence. Selkirk, trying to
describe more general prosodic relationships, rejected Liberman and Prince's
claim that the branching of the prosodic tree above the word level was
isomorphic to that of the syntactic tree and presented a phonological tree that
contains intonational phrase, phonological phrase, prosodic word, foot, and syllable
(Selkirk, 1980). In later work, however, Selkirk has argued that use of a
metrical grid obviates the need for separately defined prosodic constituents
between the intonational phrase and the foot (Selkirk, 1984). The idea of a
hierarchy containing intonational phrase, intermediate phrase, and word levels
has also been advanced by Beckman and Pierrehumbert (1986) who suggest the
possibility of an accentual phrase level between the intermediate phrase and
word levels. Similarly, Nespor and Vogel (1983) have proposed a hierarchy
containing intonational phrase, phonological phrase, and phonological word
levels. While the hierarchies put forward in these proposals are quite
distinct, there appears to be general agreement on the need for levels
corresponding to the intonational phrase and the prosodic (or phonological)
word, and possibly an intervening level.
Although
the notion of a prosodic word is generally accepted, there is some debate over
the relationship between prosodic words and elements of the lexicon. Kurath
(1964), for example, pointed out that some rules governing the sequencing of
phonemes applied only within a lexical word. Chomsky and Halle also found that
some rules applied only within words and others applied across word boundaries.
Consequently, some of Chomsky and Halle's ``words'' can, through the action of
the readjustment rules, contain more than one lexical item. This view is
compatible with that of Booij (1983) who observes that a ``phonological word''
may correspond to more than one lexical word in some cases, and less than one
in other cases. In contrast, Nespor and Vogel (1986) argue that, while the
phonological word may be smaller than the morphological constituent, it is not
larger. Liberman and Prince (1977) define ``mots'' as the unit that defines the
domain of word-internal phonological rules. This is essentially the same way
that Selkirk (1980) defined the phonological word in her earlier work. The
relationship between prosodic words and elements of the lexicon is decidedly nontrivial:
as Kaisse (1985) has pointed out, there seem to be several different mechanisms
that can cause distinct lexical items to be perceived as a single unit, and
these must be carefully distinguished.
The other
prosodic constituent that appears to be widely accepted is the intonational
phrase. The intonational phrase is a group of words in an utterance, which is
delimited in some way as a larger unit of phrasing. Most phonologists posit
some sort of intonational phrase, although there are differences in precisely
how they define it. Ladd (1986) traces the origins of the intonational phrase
back over a half century, and he identifies three properties common to all of
the various proposals: intonational phrases are the largest phonological entity
with phonetically definable boundaries into which utterances can be divided,
they have a particular intonational structure, and they are assumed to relate
in some way to syntactic or discourse-level structure.
Although
this broad definition is helpful in unifying the works of several researchers,
we need a more specific definition for this study. Consequently, we will adopt
the definition proposed by Pierrehumbert (1980), which says that an
intonational phrase is delimited by high or low boundary tones. Pierrehumbert
proposed two types of boundary tones: a low tone such as occurs at the end of a
declarative sentence, and a high tone such as at the end of a yes/no question.
This definition (as part of a much larger phonology of intonation proposed by
Pierrehumbert), has been quite influential, and her definition of intonational
phrase has been adopted by a number of other researchers (e.g., Selkirk, 1984;
Nespor and Vogel, 1986). While Pierrehumbert's definition of an intonational
phrase differs from, for example, Lieberman's (1967) ``breath group,'' its
boundaries seem to coincide with those of Halliday's (1967) ``tone group.''
Although
most researchers agree on at least two levels of a prosodic hierarchy (prosodic
words and intonational phrases), other constituents have been proposed and are
of interest in this study. Consequently, we now consider proposed constituents
both above and below the level of the intonational phrase.
Beckman and Pierrehumbert (1986) argue that there is
at least one, and possibly two, levels of phrasing between the prosodic word
and the intonational phrase. Their ``intermediate phrase'' groups words into
phrases having at least one accented syllable. That is, each intermediate
phrase contains at least one ``pitch accent,'' a pitch marker that makes a
syllable more prominent perceptually. Intonational phrases are made up by
grouping together one or more intermediate phrases and marking the end of the
final one with a boundary tone. This intermediate phrase is similar to the unit
Nespor and Vogel (1986) refer to as a phonological phrase.
The other
possible level of phrasing between the prosodic word and the intermediate
phrase, which has been explored by Beckman and Pierrehumbert (1986), is the
``accentual phrase.'' Here, they find the evidence inconclusive: in Japanese
they find clear evidence for an accentual phrase as a simple grouping of words,
but in English they find the evidence for justifying it much less compelling,
although it is clearly possible to define such a unit.
As for levels
of phrasing above the intonational phrase, Liberman and Pierrehumbert (1984)
have identified phonetic effects that appear to have a domain larger than a
single intonational phrase. Beckman and Pierrehumbert (1986), however, argue
that these effects are related to discourse structure and do not provide
evidence of a higher-level phonological unit.
In
contrast to the relatively sparse hierarchies advocated in the works discussed
above, Ladd (1986) proposes allowing a recursive prosodic structure and sees no
principled reason to restrict the number of levels in the hierarchy. Ladd
argues that the single level of intonational phrasing is inadequate to capture
both the boundary phenomena (i.e., the boundary tones) and the relationship
between the phonological and syntactic units. Recently, Ladd and Campbell
(1991) have begun to look for acoustic evidence supporting this hypothesis and
have shown that four levels of phrasing above the word level account for more
of the observed variation in boundary-related lengthening phenomena than the
two-level intermediate/intonational phrase labeling.
It should be clear from the preceding discussion
that, while many phonologists are in substantial agreement on the need for some
types of prosodic constituents, there are still substantial differences in how
they choose to define those constituents. Moreover, there are several types of
constituents that have been suggested by some, but not widely adopted.
Nonetheless, if we consider the constituents that have been suggested,
eliminating notational variants, we arrive at a superset (a set union of all
the theories) of prosodic constituents. In the next subsection, we examine the
relationship between the levels in this superset and the perceptual labels used
in this study.
Phonological utterance U
ò
Phonological phrases Φ
ò
Phonological words ω
(
)U
( )( )IP
( )( )(
)Φ
( )( )( ) ( )( )( )( )( )ω
No
language is produced in a smooth, unvarying stream. Rather, the speech has
perceptible breaks and clumps. For example, we can perceive an utterance as
composed of words, and these words can be perceived as composed of syllables,
which are composed of individual sounds. At a higher level, some words seem to
be more closely grouped with adjacent words: we call these groups phrases.
These phrases can be grouped together to form larger phrases, which may be
grouped to form sentences, paragraphs, and complete discourses. These
observations raise the questions of how many such constituents there are and
how they are best defined.
The
domain of linguistic theory most appropriate to address these questions is
phonology, traditionally defined as the study of sound units and their
structural inter-relationships in spoken language. Work in this field has been
reported for seven centuries [c.f. Jones' History of English Phonology (Jones,
1989)]. However, only in the last half century have researchers begun to substantially
address the relationships between intonational, rhythmic, and pausing patterns.
Pike (1945) described a hierarchy of rhythmic units, separate from syntactic
structure, and examined their interaction with intonation and pausing.
With the development of syntactic surface structure
trees, the phonological information was thought to be contained within, or
derived from, the single structure (Chomsky and Halle, 1968). More recently,
phonologists have again begun to develop theoretical frameworks that include a
separate hierarchy of prosodic constituents. The next subsection reviews
several of the prosodic hierarchies that have been proposed. Although the
proposals differ in many respects, we try to illuminate the many areas in which
they overlap. In particular, we note that one can extract a superset of
constituent types, which takes account of almost all of the proposals. The
relationship between this superset and the perceptual labeling used in this
study is then explored in the following subsection.
A fundamental
characteristic of spoken language is the relation between the continuous flow
of sounds on the one hand, and the existence of structured patterns within this
continuum on the other hand. In this respect, spoken language is related to
many other natural and man-made phenomena, which are characterized not only by
their typically flowing nature but also by the fact that they are structured
into distinct units such as waves and measures.
Prosodic phonology is
a theory of the way in which the flow of speech is organized into a finite set
of phonological units. It is also, however, a theory of interactions between
phonology and the components of the grammar. The interactions, in the form of
mapping rules that build phonological structure on the basis of morphological,
syntactic, and semantic notions, provide the set of phonological units
necessary to characterize the domains of application of a large number of
phonological rules. While the division of the speech chain into various
phonological makes reference to structures found in the other components of the
grammar, a fundamental aspect of prosodic theory is that the phonological
constituents themselves are not necessarily isomorphic to any constituents
found elsewhere in the grammar.
Although the specification of the domains which phonological
rules are bound is the main goal of prosodic phonology, it turns out that the
same units are relevant in other areas of the organization of language as well.
For example, even in the absence of phonological rules, the prosodic units of
grammar are relevant at the first level of speech processing in the
disambiguation of ambiguous sentences. In addition, the constituents provided
by prosodic phonology account for a number of rhythmic patterns and metrical
conventions observed in works of poetry.
The organization of phonology into prosodic units
encompasses seven hierarchically structured levels, going from the syllable to
the phonological utterance. While the two smallest units, the syllable and the
foot, are essentially constructed on the basis of phonological criteria, the
remaining units represents each in its own way the interface between phonology
and the other components of the grammar. The pattern that emerges from the various
type of interaction is one in which the degree of abstractness and generality
of the nonphonological information required correlates with the height of a
prosodic constituent in the hierarchy. In particular, the mapping rules that
construct the phonological word and the clitic group must make reference to
such general syntactic notions as the positions in which affixes and clitics
are attached to their host. Phonological phrase construction makes reference to
such general syntactic notions as the head of a phrase and the direction of
embedding. The two highest prosodic constituents, the intonational phrase and
the phonological utterance, make use of even more general notions, such as the
root sentence and the highest node in the syntactic tree, respectively.
At the three
uppermost levels of the prosodic hierarchy, restructuring rules are needed
under certain circumstances to readjust the prosodic structures built by the
mapping rules. While the mapping rules make reference only to morphological and
syntactic structure, the restructuring rules also make reference to semantic
notions and to factors such as the length of the string in question and the
rate and style of speech. The restructuring rules, like the basic mapping
rules, exhibit a correlation between the height of a constituent in the
hierarchy and the generality and abstractness of the phonological phrase, the
crucial notions are that of phrasal complement and the length of the complement
as defined by its branchingness. As far as the intonational phrase is concerned,
however, the relevant criteria for restructuring include the notion of cyclic
node and a somewhat abstract notion of temporal length, as defined on the basis
of a combination of such factors as the length of a given string and the rate
of speech. Finally, in the case of the phonological utterance, restructuring
depends on the existence of certain syntactic relations and implicit semantic
connections between adjacent sentences. The fact that most restructurings are
optional allows for a certain degree variability at the highest levels of the
prosodic hierarchy. Given that syntax does not permit any variability in its
constituent structure, it is even clearer in the case of restructured strings
that the prosodic and syntactic hierarchies represent independent and often nonisomorphic
structures.
( Source: Segmental Durations in the Vicinity of
Prosodic Phrase Boundaries
C. W. Wightman - S. Shattuck-Hufnagel - M. Ostendorf
- P. J. Price*
- Boston
University - MIT - * SRI
International )
Nespor, Marina and Irene Vogel. Prosodic Phonology.
1986. Dordrecht: Foris