Prosodic Phonology

I. Historical Development of Prosodic Analysis

An approach to the analysis of phonological structure which differs in fundamental ways from virtually every other 20^th century view was developed by John R. Firth (1890-1960) at the School of Oriental and African Studies in London, beginning in the late 1930s. Though it was never well defined, Firth’s general position dominated discussion and analytic practice in Britain, at least with respect to phonology, through the early 1960s. Firth and his associates presented it largely in a number of detailed accounts of specific languages which were produced.

The development of Prosodic Analysis involved very little interaction with other theoretical frameworks, but it is possible to see partial similarities with some other views. For instance, the theory of long components, developed by Zellig Harris, also attributes important aspects of phonological structure to units similar to prosodies, whose scope is greater than a single segment. In the 1980s, Generative Phonology also stressed the analysis and formalization of phonological properties whose domain of specification is greater(or smaller)than a single segment, and the analysis of structural units other than segments. (Please see Autosegmental Phonology and Metrical Phonology)

(Source: International Encyclopedia of Linguistics, Vol.3 & 4, Oxford Univ. Press)

II. Phonological Theories

In 1977, Liberman and Prince proposed that, in addition to the hierarchical structures embodied in syntactic surface structure trees, an adequate characterization of sentences required a description in terms of a separate phonological hierarchy whose constituents were not everywhere identical to those of the surface syntax.

Liberman and Prince focused on the usefulness of a hierarchy of phonological constituents for describing prominence relations among the words and syllables of a sentence. They argued that the differences between syntactic and phonological trees were confined to the word and lower structures; above the level of the word, the branching of the two trees was the same. Other investigators took up the concept of a hierarchy of phonological constituents separate from (although related to) the syntactic hierarchy and proposed differences between the two hierarchies above the word level as well.

It was argued that the new constituents made it easier to state the phonological rules of a language that govern interactions between sound segments or phonemes (e.g., in English, /t/ flapping is blocked in certain prosodic contexts), as well as the phonological rules that govern intonational, rhythmic, and pausing patterns that had proven difficult to state in terms of syntactic structure (e.g., pauses, pre-boundary lengthening, and boundary tones do not always occur at the boundaries of major syntactic constituents in English). Additional evidence for a prosodic structure that differs from the syntactic surface structure came from Gee and Grosjean (1983) who presented performance structures that group words according to the length of pauses inserted between them at slow speaking rates. These structures are clearly distinct from the groupings suggested by the syntactic bracketing.

The research program suggested by the concept of a phonological as well as a syntactic hierarchy of constituents, i.e., the proposal of various hierarchies of constituents and notations for expressing them and the search for phonological evidence to support them, has occupied a number of investigators in the intervening decade and a half, among them Liberman and Prince (1977), Selkirk (1980; 1984), Beckman and Pierrehumbert (1986), Nespor and Vogel (1983), and Ladd (1986; Ladd and Campbell, 1991). The term ``prosodic constituents'' is now generally accepted for describing the structures that characterize each proposed level, but consensus on what the appropriate constituents are has proved difficult to achieve. In the following, we will briefly summarize some of the hierarchical structures that have been proposed by these linguistic theorists, before turning to a description of the hierarchy used in the present study.

Many recent phonological theories have either been inspired by, or proposed in reaction to, the work of Chomsky and Halle (1968). Attempting to develop a general accounting of English sound structure, they proposed a transformational approach to grammar in which an abstract representation of a sentence's meaning, the ``deep structure,'' is transformed into a ``surface structure.'' The surface structure contains a complete syntactic bracketing of the sentence, and a variety of phonological rules were proposed to describe the process by which the surface structure is transformed into the phonetic representation, which actually describes the sounds to be produced. Chomsky had observed earlier that syntactic phrases did not always correspond to the perceived phrasing in speech. Consequently, in the Sound Patterns of English, ``readjustment'' rules, which alter the surface structure to partition it into ``phonological phrases'' that may differ from the phrasing in the syntactic bracketing, were introduced. These rules may also modify or delete boundaries between distinct lexical items. As a result, the perceived phrasing of a spoken sentence is not necessarily the same as the syntactic structure, or bracketing, of the surface structure, although the two are certainly related.

As noted above, Liberman and Prince (1977) formalized the idea of a phonological hierarchy by proposing a phonological tree that, in its branching below the level of words (which they refer to as ``mots''), accounted for some of the prominence relationships between syllables in a sentence. Selkirk, trying to describe more general prosodic relationships, rejected Liberman and Prince's claim that the branching of the prosodic tree above the word level was isomorphic to that of the syntactic tree and presented a phonological tree that contains intonational phrase, phonological phrase, prosodic word, foot, and syllable (Selkirk, 1980). In later work, however, Selkirk has argued that use of a metrical grid obviates the need for separately defined prosodic constituents between the intonational phrase and the foot (Selkirk, 1984). The idea of a hierarchy containing intonational phrase, intermediate phrase, and word levels has also been advanced by Beckman and Pierrehumbert (1986) who suggest the possibility of an accentual phrase level between the intermediate phrase and word levels. Similarly, Nespor and Vogel (1983) have proposed a hierarchy containing intonational phrase, phonological phrase, and phonological word levels. While the hierarchies put forward in these proposals are quite distinct, there appears to be general agreement on the need for levels corresponding to the intonational phrase and the prosodic (or phonological) word, and possibly an intervening level.

Although the notion of a prosodic word is generally accepted, there is some debate over the relationship between prosodic words and elements of the lexicon. Kurath (1964), for example, pointed out that some rules governing the sequencing of phonemes applied only within a lexical word. Chomsky and Halle also found that some rules applied only within words and others applied across word boundaries. Consequently, some of Chomsky and Halle's ``words'' can, through the action of the readjustment rules, contain more than one lexical item. This view is compatible with that of Booij (1983) who observes that a ``phonological word'' may correspond to more than one lexical word in some cases, and less than one in other cases. In contrast, Nespor and Vogel (1986) argue that, while the phonological word may be smaller than the morphological constituent, it is not larger. Liberman and Prince (1977) define ``mots'' as the unit that defines the domain of word-internal phonological rules. This is essentially the same way that Selkirk (1980) defined the phonological word in her earlier work. The relationship between prosodic words and elements of the lexicon is decidedly nontrivial: as Kaisse (1985) has pointed out, there seem to be several different mechanisms that can cause distinct lexical items to be perceived as a single unit, and these must be carefully distinguished.

The other prosodic constituent that appears to be widely accepted is the intonational phrase. The intonational phrase is a group of words in an utterance, which is delimited in some way as a larger unit of phrasing. Most phonologists posit some sort of intonational phrase, although there are differences in precisely how they define it. Ladd (1986) traces the origins of the intonational phrase back over a half century, and he identifies three properties common to all of the various proposals: intonational phrases are the largest phonological entity with phonetically definable boundaries into which utterances can be divided, they have a particular intonational structure, and they are assumed to relate in some way to syntactic or discourse-level structure.

Although this broad definition is helpful in unifying the works of several researchers, we need a more specific definition for this study. Consequently, we will adopt the definition proposed by Pierrehumbert (1980), which says that an intonational phrase is delimited by high or low boundary tones. Pierrehumbert proposed two types of boundary tones: a low tone such as occurs at the end of a declarative sentence, and a high tone such as at the end of a yes/no question. This definition (as part of a much larger phonology of intonation proposed by Pierrehumbert), has been quite influential, and her definition of intonational phrase has been adopted by a number of other researchers (e.g., Selkirk, 1984; Nespor and Vogel, 1986). While Pierrehumbert's definition of an intonational phrase differs from, for example, Lieberman's (1967) ``breath group,'' its boundaries seem to coincide with those of Halliday's (1967) ``tone group.''

Although most researchers agree on at least two levels of a prosodic hierarchy (prosodic words and intonational phrases), other constituents have been proposed and are of interest in this study. Consequently, we now consider proposed constituents both above and below the level of the intonational phrase.

Beckman and Pierrehumbert (1986) argue that there is at least one, and possibly two, levels of phrasing between the prosodic word and the intonational phrase. Their ``intermediate phrase'' groups words into phrases having at least one accented syllable. That is, each intermediate phrase contains at least one ``pitch accent,'' a pitch marker that makes a syllable more prominent perceptually. Intonational phrases are made up by grouping together one or more intermediate phrases and marking the end of the final one with a boundary tone. This intermediate phrase is similar to the unit Nespor and Vogel (1986) refer to as a phonological phrase.

The other possible level of phrasing between the prosodic word and the intermediate phrase, which has been explored by Beckman and Pierrehumbert (1986), is the ``accentual phrase.'' Here, they find the evidence inconclusive: in Japanese they find clear evidence for an accentual phrase as a simple grouping of words, but in English they find the evidence for justifying it much less compelling, although it is clearly possible to define such a unit.

As for levels of phrasing above the intonational phrase, Liberman and Pierrehumbert (1984) have identified phonetic effects that appear to have a domain larger than a single intonational phrase. Beckman and Pierrehumbert (1986), however, argue that these effects are related to discourse structure and do not provide evidence of a higher-level phonological unit.

In contrast to the relatively sparse hierarchies advocated in the works discussed above, Ladd (1986) proposes allowing a recursive prosodic structure and sees no principled reason to restrict the number of levels in the hierarchy. Ladd argues that the single level of intonational phrasing is inadequate to capture both the boundary phenomena (i.e., the boundary tones) and the relationship between the phonological and syntactic units. Recently, Ladd and Campbell (1991) have begun to look for acoustic evidence supporting this hypothesis and have shown that four levels of phrasing above the word level account for more of the observed variation in boundary-related lengthening phenomena than the two-level intermediate/intonational phrase labeling.

It should be clear from the preceding discussion that, while many phonologists are in substantial agreement on the need for some types of prosodic constituents, there are still substantial differences in how they choose to define those constituents. Moreover, there are several types of constituents that have been suggested by some, but not widely adopted. Nonetheless, if we consider the constituents that have been suggested, eliminating notational variants, we arrive at a superset (a set union of all the theories) of prosodic constituents. In the next subsection, we examine the relationship between the levels in this superset and the perceptual labels used in this study.

III. A Prosodic Hierarchy

Prosodic Hierarchy

Phonological utterance U

Intonational phrases IP

ò

Phonological phrases Φ

Phonological words ω

( )U

( )( )IP

( )( )( )Φ

( )( )( ) ( )( )( )( )( )ω

No language is produced in a smooth, unvarying stream. Rather, the speech has perceptible breaks and clumps. For example, we can perceive an utterance as composed of words, and these words can be perceived as composed of syllables, which are composed of individual sounds. At a higher level, some words seem to be more closely grouped with adjacent words: we call these groups phrases. These phrases can be grouped together to form larger phrases, which may be grouped to form sentences, paragraphs, and complete discourses. These observations raise the questions of how many such constituents there are and how they are best defined.

The domain of linguistic theory most appropriate to address these questions is phonology, traditionally defined as the study of sound units and their structural inter-relationships in spoken language. Work in this field has been reported for seven centuries [c.f. Jones' History of English Phonology (Jones, 1989)]. However, only in the last half century have researchers begun to substantially address the relationships between intonational, rhythmic, and pausing patterns. Pike (1945) described a hierarchy of rhythmic units, separate from syntactic structure, and examined their interaction with intonation and pausing.

With the development of syntactic surface structure trees, the phonological information was thought to be contained within, or derived from, the single structure (Chomsky and Halle, 1968). More recently, phonologists have again begun to develop theoretical frameworks that include a separate hierarchy of prosodic constituents. The next subsection reviews several of the prosodic hierarchies that have been proposed. Although the proposals differ in many respects, we try to illuminate the many areas in which they overlap. In particular, we note that one can extract a superset of constituent types, which takes account of almost all of the proposals. The relationship between this superset and the perceptual labeling used in this study is then explored in the following subsection.

IV. Conclusion

A fundamental characteristic of spoken language is the relation between the continuous flow of sounds on the one hand, and the existence of structured patterns within this continuum on the other hand. In this respect, spoken language is related to many other natural and man-made phenomena, which are characterized not only by their typically flowing nature but also by the fact that they are structured into distinct units such as waves and measures.

Prosodic phonology is a theory of the way in which the flow of speech is organized into a finite set of phonological units. It is also, however, a theory of interactions between phonology and the components of the grammar. The interactions, in the form of mapping rules that build phonological structure on the basis of morphological, syntactic, and semantic notions, provide the set of phonological units necessary to characterize the domains of application of a large number of phonological rules. While the division of the speech chain into various phonological makes reference to structures found in the other components of the grammar, a fundamental aspect of prosodic theory is that the phonological constituents themselves are not necessarily isomorphic to any constituents found elsewhere in the grammar.

Although the specification of the domains which phonological rules are bound is the main goal of prosodic phonology, it turns out that the same units are relevant in other areas of the organization of language as well. For example, even in the absence of phonological rules, the prosodic units of grammar are relevant at the first level of speech processing in the disambiguation of ambiguous sentences. In addition, the constituents provided by prosodic phonology account for a number of rhythmic patterns and metrical conventions observed in works of poetry.

The organization of phonology into prosodic units encompasses seven hierarchically structured levels, going from the syllable to the phonological utterance. While the two smallest units, the syllable and the foot, are essentially constructed on the basis of phonological criteria, the remaining units represents each in its own way the interface between phonology and the other components of the grammar. The pattern that emerges from the various type of interaction is one in which the degree of abstractness and generality of the nonphonological information required correlates with the height of a prosodic constituent in the hierarchy. In particular, the mapping rules that construct the phonological word and the clitic group must make reference to such general syntactic notions as the positions in which affixes and clitics are attached to their host. Phonological phrase construction makes reference to such general syntactic notions as the head of a phrase and the direction of embedding. The two highest prosodic constituents, the intonational phrase and the phonological utterance, make use of even more general notions, such as the root sentence and the highest node in the syntactic tree, respectively.

At the three uppermost levels of the prosodic hierarchy, restructuring rules are needed under certain circumstances to readjust the prosodic structures built by the mapping rules. While the mapping rules make reference only to morphological and syntactic structure, the restructuring rules also make reference to semantic notions and to factors such as the length of the string in question and the rate and style of speech. The restructuring rules, like the basic mapping rules, exhibit a correlation between the height of a constituent in the hierarchy and the generality and abstractness of the phonological phrase, the crucial notions are that of phrasal complement and the length of the complement as defined by its branchingness. As far as the intonational phrase is concerned, however, the relevant criteria for restructuring include the notion of cyclic node and a somewhat abstract notion of temporal length, as defined on the basis of a combination of such factors as the length of a given string and the rate of speech. Finally, in the case of the phonological utterance, restructuring depends on the existence of certain syntactic relations and implicit semantic connections between adjacent sentences. The fact that most restructurings are optional allows for a certain degree variability at the highest levels of the prosodic hierarchy. Given that syntax does not permit any variability in its constituent structure, it is even clearer in the case of restructured strings that the prosodic and syntactic hierarchies represent independent and often nonisomorphic structures.

( Source: Segmental Durations in the Vicinity of Prosodic Phrase Boundaries

C. W. Wightman - S. Shattuck-Hufnagel - M. Ostendorf - P. J. Price*

- Boston University - MIT - * SRI International )

V. Reference

Nespor, Marina and Irene Vogel. Prosodic Phonology. 1986. Dordrecht: Foris