Optimality phonology

1. Introduction

Optimality Theory (OT) is a linguistic model originally proposed by the linguists Alan Prince and Paul Smolensky in 1993. OT has been expanded by John J. McCarthy and Alan Prince, beginning in 1993. Although much of the interest in OT has been associated with its use in phonology (the area to which OT was first applied), the theory is also applicable to other subfields of linguistics (e.g. syntax, semantics). Optimality theory is usually considered a development of generative grammar, which shares its focus on the investigation of universal principles, linguistic typology and language acquisition.
OT is often called a connectionist theory of language, because it has its roots in neural network research, though the relationship is now largely of historical interest. It arose in part as a successor to the theory of harmonic grammar, developed in 1990 by Géraldine Legendre, Yoshiro Miyata and Paul Smolensky.
The main idea of OT is that the observed forms of language arise from the interaction between conflicting constraints. There are three basic components of the theory. GEN generates the list of possible outputs, or candidates, CON provides the criteria, violable constraints, used to decide between candidates, and EVAL chooses the optimal candidate. OT assumes that these components are universal. Differences in grammars reflect different rankings of the universal constraint set, CON. Language acquisition can be described as the process of adjusting the ranking of these constraints.

2. Input and GEN: the candidate set

OT supposes that there are no language-specific restrictions on the input. This is called richness of the base. Every grammar can handle every possible input. For example, a language without complex clusters must be able to deal with an input such as /flask/. Languages without complex clusters differ on how they will resolve this problem; some will epenthesize (e.g. /falasak/, or /falasaka/ if all codas are banned) and some will delete (e.g. /fas/, /fak/, /las/, /lak/). Given any input, GEN generates an infinite number of candidates, or possible realizations of that input. A language's grammar (its ranking of constraints) determines which of the infinite candidates will be assessed as optimal by EVAL.

3. CON: the constraint set

In OT, every constraint is universal. CON is the same in every language. There are two basic types of constraints. Faithfulness constraints require that the observed surface form (the output) match the underlying or lexical form (the input) in some particular way; that is, these constraints require identity between input and output forms. Markedness constraints impose requirements on the structural well-formedness of the output. Each plays a crucial role in the theory. Faithfulness constraints prevent every input from being realized as some unmarked form ([ba] for example), and markedness constraints motivate change.
The universal nature of CON makes some immediate predictions about language typology. If grammars differ only by having different rankings of CON, then the set of possible human languages is determined by the constraints that exist. OT predicts that there cannot be more grammars than there are permutations of the ranking of CON. The number of possible rankings is equal to the factorial of the total number of constraints, thus giving rise to the term Factorial Typology. However, it may not be possible to distinguish all of these potential grammars, since not every constraint is guaranteed to have an observable effect in every language. Two languages could generate the same range of input-output mappings, but differ in the relative ranking of two very lowly-ranked constraints.

4. EVAL: definition of Optimality

Given two candidates, A and B, A is better than B on a constraint if A incurs fewer violations than B. Candidate A is better than B on an entire constraint hierarchy if A incurs fewer violations of the highest-ranked constraint distinguishing A and B. A is optimal in its candidate set if it is better on the constraint hierarchy than all other candidates. For example, given constraints C1, C2, and C3, where C1 dominates C2, which dominates C3 (C1 >> C2 >> C3), A is optimal if it does better than B on the highest ranking constraint which assigns them a different number of violations. If A and B tie on C1, but A does better than B on C2, A is optimal, even if A has 100 more violations of C3 than B. This comparison is often illustrated with a tableau. The pointing finger marks the optimal candidate, and each cell displays the number of violations for a given candidate and constraint. Once a candidate does worse than another candidate on the highest ranking constraint distinguishing them, it incurs a crucial violation (marked in the tableau by an exclamation mark). Once a candidate incurs a crucial violation, there is no way for it to be optimal, even if it outperforms the other candidates on the rest of CON.
A violation tableau

A violation tableau
	C1	C2	C3
☞A	*	*	***
B	*	**!

Constraints are ranked in a hierarchy of strict domination. The strictness of strict domination means that a candidate who violates only a high-ranked constraint does worse on the hierarchy than one that doesn't, even if the second candidate fared worse on every other lower-ranked constraint. This also means that constraints are violable; the winning candidate need not satisfy all constraints. Within a language, a constraint may be ranked high enough that it is always obeyed; it may be ranked low enough that it has no observable effects; or, it may have some intermediate ranking. The term the emergence of the unmarked describes situations in which a markedness constraint has an intermediate ranking, so that it is violated in some forms, but nonetheless has observable effects when higher-ranked constraints are irrelevant.
An early example proposed by McCarthy & Prince (1994) is the constraint NoCoda, which prohibits syllables from ending in consonants. In Balangao, NoCoda is not ranked high enough to be always obeyed, as witness roots like taynan (faithfulness to the input prevents deletion of the final /n/). But, in the reduplicated form ma-tayna-taynan 'repeatedly be left behind', the final /n/ is not copied. Under McCarthy & Prince's analysis, this is because faithfulness to the input does not apply to reduplicated material, and NoCoda is thus free to prefer ma-tayna-taynan over hypothetical ma-taynan-taynan (which has an additional violation of NoCoda). Constraints are also violable; the winning candidate need not satisfy all constraints, as long as for any rival candidate that does better than the winner on some constraint, there is a higher ranked constraint on which the winner does better than that rival.
Some Optimality theorists prefer the use of comparative tableaux, as described in Prince (2002). Comparative tableaux display the same information as the classic or "flyspeck" tableaux, but the information is presented in such a way that it highlights the most crucial information. For instance, the tableau above would be rendered in the following way.
For instance, in order to have a consistent ranking some W must dominate all L's. Brasoveanu and Prince (2005) describe a process known as fusion and the various ways of presenting data in a comparative tableau in order to achieve the necessary and sufficient conditions for a given argument.

5. Example

As a simplified example, consider the manifestation of the English plural:
/cat + z/ → [cats] (also smirks, hits, crepes)
/dog + z/ → [dogz] (also wugs, clubs, moms)
/fish + z/ → [fishiz] (also classes, glasses, bushes)
Also consider the following constraint set:
M: Agree(Voi) - one violation for every pair of adjacent obstruents in the output which disagree in voicing
M: *SS - one violation for every pair of adjacent sibilants in the output
F: Ident(Voi) - one violation for each segment that differs in voicing between the input and output
F: Max - one violation for each segment in the input that doesn't appear in the output (deletion)
F: Dep - one violation for each segment in the output that doesn't appear in the input (insertion)
(M: markedness, F: faithfulness)

fish + z > fishiz
fish + z	*SS	Agree	Max	Dep	Ident
☞ fishiz				*
fishis				*	*!
fishz	*!	*
fish			*!
fishs	*!				*
dog + z > dogz
dog + z	*SS	Agree	Max	Dep	Ident
dogiz				*!
dogis				*!	*
☞ dogz
dog			*!
dogs		*!			*

cat + z > cats
cat + z	*SS	Agree	Max	Dep	Ident
catiz				*!
catis				*!	*
catz		*!
cat			*!
☞ cats					*

No matter how the constraints are re-ordered, the 'is' allomorph will always lose to 'iz.' For example, there is no way to rerank the constraints such that 'dogis' will win. This is called harmonic bounding. The violations incured by the candidate 'dogiz' are a subset of the violations incured by 'dogis'; specifically, if you epenthesize a vowel, changing the voicing of the morpheme is gratuitous violation of constraints. In the 'dog + z' tableau, there is a candidate 'dogz' which incurs no violations whatsoever. Within the constraint set of the problem, 'dogz' harmonically bounds all other possible candidates. This shows that a candidate does not need to be a winner in order to harmonically bound another candidate.
The tableaux from above are repeated below using the comparative tableaux format.

dog + z > dogz
dog + z	*SS	Agree	Max	Dep	Ident
dogz ~ dogiz	e	e	e	W	e
dogz ~ dogis	e	e	e	W	W
dogz ~ dog	e	e	W	e	e
dogz ~ dogs	e	W	e	e	W

From the above tableau for dog + z, it can be observed that any ranking of these constraints will produce the observed output dogz. Because there are no loser-preferring comparisons, dogz wins under any ranking of these constraints; this means that no ranking can be established on the basis of this input.

cat + z > cats
cat + z	*SS	Agree	Max	Dep	Ident
cats ~ catiz	e	e	e	W	L
cats ~ catis	e	e	e	W	e
cats ~ catz	e	W	e	e	L
cats ~ cat	e	e	W	e	L

The tableau for cat + z contains rows with a single W and a single L. This shows that Agree, Max, and Dep must all dominate Ident; however, no ranking can be established between those constraints on the basis of this input. Based on this tableau, the following ranking has been established: Agree, Max, Dep >> Ident

fish + z > fishiz
fish + z	*SS	Agree	Max	Dep	Ident
fishiz ~ fishis	e	e	e	e	W
fishiz ~ fishz	W	W	e	L	e
fishiz ~ fish	e	e	W	L	e
fishiz ~ fishs	W	e	e	L	W

This tableau shows that several more rankings are necessary in order to predict the desired outcome. The first row says nothing; there is no loser-preferring comparison in the first row. The second row reveals that either *SS or Agree must dominate Dep, based on the comparison between fishiz and fishz. The third row shows that Max must dominate Dep. The final row shows that either *SS or Ident must dominate Dep. From the cat + z tableau, it was established that Dep dominates Ident; this means that *SS must dominate Dep.
So far, the following rankings have been shown to be necessary: *SS, Max >> Dep >> Ident
While it is possible that Agree can dominate Dep, it is not necessary; the ranking given above is sufficient for the observed for fishiz to emerge.
When the rankings from the tableaux are combined, the following ranking summary can be given: *SS, Max >> Agree, Dep >> Ident
or
*SS, Max, Agree >> Dep >> Ident
There are two possible places to put Agree when writing out rankings linearly; neither is truly accurate. The first implies that *SS and Max must dominate Agree, and the second implies that Agree must dominate Dep. Neither of these are truthful, which is a failing of writing out rankings in a linear fashion like this. These sorts of problems are the reason why most linguists utilize a lattice graph to represent necessary and sufficient rankings, as shown below.

A diagram that represents necessary rankings of constraints in this style is often casually referred to as a Hasse diagram.

6. Criticism

Optimality Theory has drawn a good deal of criticism, most of which is directed at its application to phonology (rather than syntax or other fields).
Many criticisms of OT are, according to its proponents, based on fundamental misunderstanding of how it works. A well-known example of this is Chomsky's (1995) assertion that OT would predict every lexical input to be reduced to a single optimal syllable (e.g. every word is realized as [ba]). In fact, universal neutralization of this type would only be predicted if there were no faithfulness constraints (see McCarthy 1997). In a sense, the diametrically opposite kind of criticism comes from Halle (1995): “... the existence of phonology in every language shows that Faithfulness is at best an ineffective principle that might well be done without.” By 'phonology', Halle clearly means disparity between inputs and outputs. OT would fail to predict this disparity only if there were no markedness constraints (see Prince 2007). Input-output disparity is normally the result of markedness constraints being ranked over faithfulness constraints (M >> F).
Another objection to OT is the claim that it is not technically a theory, in that it does not make falsifiable predictions. The source of this issue is terminology: the term 'theory' is used differently here than in physics, chemistry, and other sciences. Specific instantiations of OT may make falsifiable predictions, in the same way that specific proposals within other linguistic frameworks can. What predictions are made, and whether they are testable, depends on the specifics of individual proposals (most commonly, this is a matter of the definitions of the constraints used in an analysis). Thus, OT as a framework is best described as a scientific paradigm.
More serious objections to OT are claims that it cannot account for phonological opacity (see Idsardi 2000, e.g.). There have been a number of proposals designed to account for opacity within OT; however, most of these proposals significantly alter OT's basic architecture, and therefore tend to be highly controversial. Frequently, such alterations add new types of constraints (which aren't Universal Faithfulness or Markedness constraints), or change the properties of GEN or EVAL. Some well-known examples of these include John J. McCarthy's Sympathy Theory and Candidate Chains theory, and there are many others.
A relevant issue is the existence of circular chain shifts, i.e. cases where input /X/ maps to output [Y], but input /Y/ maps to output [X]. Many versions of OT predict this to be impossible (see Moreton 2004, Prince 2007). It is not certain whether patterns of this sort occur in natural languages.
OT is also criticized as being an impossible model of speech production/perception: computing and comparing an infinite number of possible candidates would take an infinitely long time to process. The most common rebuttal to this argument is that OT is purely representational. In this view, OT is taken to be a model of Linguistic competence, and is not intended to explain the specifics of Linguistic performance. Further, work by Heinz, Kobele, and Riggle (forthcoming) show that in fact, OT is computationally tractable, under certain reasonable assumptions.

7. Theories within Optimality Theory

In practice, implementations of OT often assume other related theories, such as Syllable theory, Moraic theory, or Feature Geometry. Completely distinct from these, there are sub-theories which have been proposed entirely within OT, such as positional faithfulness theory, Correspondence Theory, Sympathy Theory, and a number of theories of learnability. There are also a range of theories specifically about OT. These are concerned with issues like the possible formulations of constraints, and constraint interactions other than strict domination.

8. References

Brasoveanu, Adrian, and Alan Prince (2005). Ranking & Necessity. ROA-794.
Chomsky (1995). The Minimalist Program. Cambridge, MA: The MIT Press.
Dresher, Bezalel Elan (1996): The Rise of Optimality Theory in First Century Palestine. GLOT International 2, 1/2, January/February 1996, page 8 (a humorous introduction for novices)
Halle, Morris (1995). Feature Geometry and Feature Spreading. Linguistic Inquiry 26, 1-46.
Heinz, Jeffrey, Greg Kobele, and Jason Riggle (forthcoming). Evaluating the complexity of Optimality Theory. Linguistic Inquiry.
Idsardi, William J. (2000). Clarifying opacity. The Linguistic Review 17:377-50.
Kager, René (1999). Optimality Theory. Cambridge: Cambridge University Press.
Legendre, Géraldine, Jane Grimshaw and Sten Vikner. Optimality-theoretic syntax. MIT Press.
McCarthy, John (2007). Hidden Generalizations: Phonological Opacity in Optimality Theory. London: Equinox.
McCarthy, John (2001). A Thematic Guide to Optimality Theory. Cambridge: Cambridge University Press.
McCarthy, John and Alan Prince (1993): Prosodic Morphology: Constraint Interaction and Satisfaction. Rutgers University Center for Cognitive Science Technical Report 3.
McCarthy, John and Alan Prince (1994): The Emergence of the Unmarked: Optimality in Prosodic Morphology. Proceedings of NELS.
Moreton, Elliott (2004): Non-computable Functions in Optimality Theory. Ms. from 1999, published 2004 in John J. McCarthy (ed.), Optimality Theory in Phonology.
Prince, Alan (2007). The Pursuit of Theory. In Paul de Lacy, ed., Cambridge Handbook of Phonology.
Prince, Alan (2002). Entailed Ranking Arguments. ROA-500.
Prince, Alan (2002). Arguing Optimality. In Coetzee, Andries, Angela Carpenter and Paul de Lacy (eds). Papers in Optimality Theory II. GLSA, UMass. Amherst. ROA-536.
Prince, Alan and Paul Smolensky. (1993/2002/2004): Optimality Theory: Constraint Interaction in Generative Grammar. Blackwell Publishers (2004) [1](2002). Technical Report, Rutgers University Center for Cognitive Science and Computer Science Department, University of Colorado at Boulder (1993).

Last updated 06/20/08