\documentclass[12pt,a4paper]{article}
\usepackage[pdftex]{graphicx}
\usepackage[english]{babel}
\begin{document}

\begin{center}
{\em 
Andreas van Cranenburgh 0440949 \\
Language \& Optimality \\
December 30, 2009}
\end{center}

\section*{The comprehension-production dilemma}

\abstract{Smolensky (1996) presents a way of dealing with the
comprehension-production dilemma that avoids positing two grammars or ad-hoc
performance constraints. In contrast, this paper tentatively suggests that the
dilemma is actually an artefact of the competence-performance distinction, and
shows several ways in which it can be explained in non-generative theories of
language acquisition.}

\subsection*{Introduction}
Smolensky's (1996) famous article proposes an explanation of the gap between
comprehension and production in a simple Optimality Theory (OT) framework. In
this article I shall review his proposal and question both whether the proposal
is adequate and whether the problem is really that much of a problem after
all.

\subsection*{Smolensky's proposal}
Smolensky calls the gap between comprehension and production a dilemma because
of an unwritten assumption that a single grammar, unconstrained in its
production, would have to function symmetrically (ie., comprehension and
production being equally good). Since this is empirically not the case 
%(though also not in adults, which seems to falsify the assumption altogether), 
a dilemma presents itself in that either the linguist has to assume two
grammars, or assume external constraints on production such as pronunciation
difficulties or limited working memory.

This is a dilemma because both these options are undesirable. The latter
option, of external constraints, is undesirable because it fails to explain
empirical observations relating the marked structures avoided in production by
the child, as well as by the people around them (comprehension) and
cross-linguistically (other languages according to the Jakobson typology).
Another problem is that there has never been adequate empirical evidence for
such processing constraints.

The former, of assuming two grammars is also undesirable. It is not
parsimonious and leaves more questions than it solves, for example about why
production agrees so much with comprehension if they are separate grammars.

The unwritten assumption leading to the dilemma derives from the generative
tradition. A competence/performance distinction is assumed, and
only the idealised capacity of competence is deemed worthy of study. Optimality
Theory is, like other generative theories, a theory of competence. Even in
Smolensky (1996), dealing with something as obviously and deeply performance
related as a comprehension-production gap, it is stressed that the proposal
purely concerns competence, not performance. If it weren't for this important,
assumed theoretical distinction of competence and performance, there probably
wouldn't be any dilemma in the first place, but instead a reality theories
should accommodate. The dilemma seems to consist of: given that we assume an
idealised competence, why aren't language users performing like idealised
language users?

In Optimality Theory there is a way out of the dilemma. With a simple,
technical modification it is possible to present a single grammar with perfect
comprehension but severely hindered production. This is achieved by separating
the constraints in two categories, markedness and faithfulness. Of the two
sides in a structural description, markedness constraints operate only on one,
whereas the faithfulness constraints operate on both. This difference has the
consequence that markedness constraints only affect production, when different
realizations compete; in comprehension, on the other hand, different underlying
forms compete, all with the same overt form (as it was perceived). By letting
all faithfulness constraints be outranked (dominated) by markedness
constraints, one can have an unfaithful production, due to the markedness
constraints, with faithful comprehension, which is unaffected by the markedness
constraints.

\subsection*{OT and Language acquisition}
\begin{quote}
OT is unique in that it regards the process of language acquisition as central
to its tenets. -- Gierut (2006)
\end{quote}

While I agree that language acquisition should be a central tenet of any theory
of language, I disagree that OT is an appropriate framework. The argument for
this is that OT makes much too strong assumptions about representations to
explain much of the difficult feat of language acquisition. The only difference
between a child just starting its acquisition and an adult speaker is their
ranking of a universal set of constraints. It is assumed that both operate on
unique and abstract representations capable of accurately describing the
structure of their linguistic input.

This goes against both very natural intuitions about cognitive development
and empirical evidence. The intuition is that acquiring better and more
accurate representations is one of the most important parts of cognitive
development, and thus also of language acquisition. Representations need to
allow hierarchies of concepts, such as tables and chairs, and it is implausible
to assume they are innate.

There is also empirical evidence that the phonological representations
presented in Smolensky (1996) are inadequate. Beckman (2003) explores
pronunciation errors made by children with a Phonological Disorder, and
argues that they stem from their underdeveloped representations. Of Smolensky's
proposal she writes:

\begin{quote}
 [...] couching this proposed explanation in a
model of phonology such as (8) vitiates its explanatory power because of
other assumptions that are packaged with it. In particular, in the model in
(8), each word is associated with a single unique input representation, and
that representation is a very abstract minimalist one. 

[...] However, a large body of
experimental literature on phonological comprehension shows this standard
assumption to be untenable. This literature, which is reviewed in Johnson
(1997) and Pierrehumbert (2002), supports instead an ``exemplar model'' of
the mental lexicon, as outlined in Pierrehumbert (2003).
\end{quote}

Such an exemplar model contains a whole range of stored instances for each
unit to be recognized. As development proceeds, more and richer exemplars are
acquired, and more generalizations can be made between them. This elegantly
accounts for the piecemeal development of rich representations from simple
beginnings, without assuming the representations to be abstract and minimal
to begin with, and without assuming complex rule-based mechanisms.

What is arguably the most important step in language acquisition is the
ability to recognize symbolic reference (Deacon 1997). It is the most
compelling difference between humans and other animals that we develop the
capability to operate on triadic Peircean signs (ie., three levels of
representations: concrete, iconic and symbolic). Nothing of the sort is
explained by Optimality Theory. Beckman notes that her exemplar-based account,
of phonological errors by children, might be related to this phenomenon:

\begin{quote}
``Thus, the emergence of language-specific perceptual categories for stop place
of articulation may constitute the first step of the transition into symbolic
behavior in the phonological grammar.  It may be no accident that this happens
at around the age that the infant begins to acquire a comprehension
vocabulary."
\end{quote}

Another problem of Optimality Theory, which it shares with other generative
theories, is the insistence on innate universals. Although in OT the
assumptions are relatively modest, being confined to a set of constraints and
an initial ranking, it is still a claim that must be backed up with empirical
evidence, which has not been forthcoming. In a recent monograph (Evans 2009)
a wealth of evidence against language universals is reviewed. Even the famed
Jakobson typology, from which the most treasured OT examples derive, can no
longer be maintained:

\begin{quote}
As more such rarities [contrastive labial-alveolar consonants, sounding like
`b' and `d' at the same time] accrue, experts on sound systems are abandoning
the Jakobsonian idea of a fixed set of parameters from which languages draw
their phonological inventories, in favour of a model where languages can
recruit their own sound systems from fine phonetic details that vary in almost
unlimited ways -- Evans (2009)
\end{quote}

Such evidence not only suggests that binary constraints may not be adequate,
perhaps it is the whole idea of domain-specific constraints which has got to
go. It is simply not plausible that a constraint such as ``a reflexive element
is preferrable to a pronoun in its binding domain'' has evolved to be part of
the human mind at birth, rather than being the result of lots of exposure
to pronouns and reflexives in appropriate contexts.

Optimality Theory, along with most other theories of language, acknowledges
that experience plays an important part in language learning. But if experience
plays such an important role in learning, why would there need to be a second,
separate mechanism for grammar? The poverty of stimulus argument has by now
lost its appeal, it has become recognized that viewing language as a set of
sentences is too narrow.

\subsection*{Language acquisition as a statistical learning problem}

Let us look at the problem of language acquisition as a statistical
learning problem. We can define comprehension as a suitable reduction of
experience (in different modalities) at the time of an utterance. Certain words
occur in specific contexts and generalizations to their specific word-meanings
can be formed from this; but note that succesful comprehension can already
occur before words have been fully generalized into their appropriate
word-meanings, for example using additional cues from the context and 
redundancy in language. This description does not hold for displaced language
usage, but there is ample reason to assume that language acquisition proceeds
from the concrete to the abstract in gradual succession.

Production would require a generative model of the data, to anticipate whether
the result of an utterance would be as desired. It is well known that
generative models require much more data and are more difficult to compute than
merely discriminative models, just as it is clear from psychology that
recognition is much easier than recall. Production, with its reliance on a
generative model of the data, is more affected by the data sparsity than
comprehension.  Another aspect of production which makes it more difficult is
that it requires some measure of agency, to decide what to say. Instead of
being given an utterance, as in comprehension, both a message and its
realization as an utterance need to be produced --- something which is flatly
ignored in Smolensky (1996), which conceives of production as simply the
reverse of comprehension.  Already we have an indication that the
comprehension-production dilemma could be a side-effect of the learning
problem.

Furthermore, there is a big asymmetry in the two sides of the data to be
learned. On the one hand is the linguistic code, which is highly compressed and
concise (a single phoneme can make a world of difference by turning a positive
in a negative sentence).  On the other hand is the rest of multi-modal
experience, perceptions and intentions, which is a sparse and largely
unstructured mass of data (a blooming, buzzing confusion, quoth William James).
While recognizing ten different realizations of the word ``dog'' as being of
the same word should be difficult enough, it is obvious that between ten breeds
of canines there is much less similarity and consequently recognizing them
should be more difficult. So it should be easier to recognize the word ``dog''
being said than to know when a certain animal can be called a dog. There is
good evidence that the mind operates on such a dual code (Paivio 1971), rather
than on a single, amodal, symbolic representation as is customarily assumed in
generative theories. %Given such asymmetries in the two codes, it is 

Another asymmetry in language and its meaning is that they are a many-to-many
mapping, if they are to be conceived as a mapping at all. Any given utterance
can be highly ambiguous, and any situation can warrant a multitude of
reactions, which can in turn be realized in a variety of utterances. 

The usual argument against such empiricist accounts of language is that it
would immediately trip over examples such as (Chomsky 1975):

\begin{enumerate}
\addtolength{\itemsep}{-.35\baselineskip}
\item John's friends appeared to their wives to hate one another
\item John's friends appealed to their wives to hate one another
\end{enumerate}

The argument being that an empiricist account, based on analogy and
generalization, would fail to recognize the big semantic difference from the
one-letter surface difference. However, this obviously makes a straw man of the
empiricist. Firstly, there is no reason to assume that similarities on surface
forms translate proportionally to semantic similarities. Secondly and more
generally, there exist much more powerful mathematical models than just
analogy and generalization (Kriston 2008). Such models have the big advantage
of being completely domain-general, which offers the potential of completely
eliminating all the speculative talk of abstract mental entities.

\subsection*{Empirical evidence from language acquisition}

The intuitions about the asymmetry of discriminative and generative models are 
confirmed by Naigles (2002). Her research discusses the paradoxical findings 
that pre-verbal infants generalize readily during comprehension while speaking
toddlers demonstrate weak or non-existent generalization in production. Thus,
while comprehension is already on an abstract level, production is still
non-abstract and item-based. The favored resolution to this paradox is to
assume that the abstractions are not yet integrated in meanings. While learning
forms and meanings is easy, linking them appropriately is hard.

Shatz (1978) presents experimental evidence indicating that children's adequate
responses to parent utterances are often fortuitous, relying on a heuristic
bias towards action responses. As long as the majority of parent utterances
allow following such a simple response strategy, it will appear as if the child
has a much more advanced capability of comprehension than it actually has.

In a series of experiments Shatz contrasts children's responses in both neutral
contexts and those in contexts of sequences of either directive or
informing contexts. In the former experiment children show a bias for action
responses. The latter experiment shows that the more advanced children are more
likely to recognize the contexts which call for an informing response, and thus
are more sensitive to the grammatical features of sentences.

The item-based nature of children's linguistic development presents another
problem for OT (and other generative theories). When a constraint is re-ranked,
presumably a general, across-the-board improvement in language use would
be predicted by OT (or a parameter-setting generative theory, for that matter).
In contrast, what happens in practice is that children acquire a select set
of frequently occurring constructions, which they use productively and
generalize selectively in a piecemeal fashion (Tomasello 2000). When children
are taught nonce-verbs, they only produce them in the way they have been
taught, even though they already inflect other verbs correctly. Such findings
clearly suggest that language development pivots around concrete experience
rather than abstract syntactic categories or general constraints.


\subsection*{Conclusion}
The `dilemma' can actually be explained from different angles,
%(perhaps even convergent methodologies?)
without positing two different grammars or ad-hoc performance limitations such
as vocal chord limitations or referring to cognitive capacity. If we view
grammar not as a mechanism but as a memory-based repository that requires
proper organization and sufficient critical mass, it becomes much less
surprising to see the asymmetries that exist. Optimality theory is unlikely to
explain or model language acquisition in a useful sense, due to its strong
assumptions, restricting itself to linguistic competence with constraint
ordering as the sole {\em modus operandi}. At best it can model specific parts
of mechanisms in adult language and the order of acquisition of particular
phenomena. It seems to be a useful framework to summarize knowledge in
linguistics parsimoniously, but linguistics should not work in isolation, and
language should not be viewed as an autonomous system in a vacuum. A realistic
theory of language acquisition cannot skip straight to abstractions, but should
account for them and do justice to the broad psychological context of language.


\subsection*{Bibliography}
\begin{description}
\item[Beckman,] Mary E (2003). 
     \emph{Input Representations (Inside the Mind and Out)}, 
     WCCFL 22 Proceedings, ed. G. Garding and M.
     Tsujimura, pp. 70-94. Somerville, MA: Cascadilla Press.

\item[Chomsky,] Noam (1975). 
	\emph{Reflections on Language}. 
	Pantheon Books, New York.

\item[Deacon,] Terrence W. (1997). 
     \emph{The Symbolic Species}. 
     W. W. Norton, New York.

\item[Evans,] Nicholas \& Levinson, Stephen (2009).
	\emph{The myth of language universals: Language diversity and its importance for cognitive science},
  Behavioral and Brain Sciences, vol.\ 32, %number={05},
  pp.\ 429-448. Cambridge Univ.\ Press.

\item[Friston,] Karl. (2008). 
	\emph{Hierarchical Models in the Brain},
	PLoS Comput Biol 4(11).

\item[Gierut,] Judith A (2006). 
	  \emph{Experimental Validation of OT Solutions to the 
                Comprehension–Production Dilemma}, 
          Clinical Linguistics \& Phonetics, Sept–Oct 2006; 20(7–8): 485–491

%MacWhinney, B. (1982). Basic syntactic processes. Language Development (Vol
% 1): Syntax and semantics., 1:73–136.

\item[Naigles,] Letitia R. (2002). 
	\emph{Form is easy, meaning is hard: 
	resolving a paradox in early child language}. 
	Cognition 86, pp.\ 157–199

\item[Paivio,] A.\ (1971). \emph{Imagery and verbal processes}.
	New York NY: Holt, Rinehart \& Winston.

\item[Shatz,] Marilyn. (1978).
          \emph{On the development of Communicative Understandings:
            An Early Strategy for Interpreting and Responding to Messages}.
           Cognitive Psychology 10, pp.\ 271-301 (1978)

\item[Smolensky,] Paul (1996).
        \emph{On the comprehension-production dilemma in child language},
        in Linguistic Inquiry, Vol. 27, No. 4 (Autumn, 1996), pp. 720-731

\item[Tomasello,] Michael. (2000). 
	\emph{The item-based nature of children's early syntactic development}.
	Trends in cognitive sciences, 4(4):156–163.

\end{description}

\end{document}

- dilemma occurs because of competence theories which focus on mechanism, but
  a symmetric mechanism should not be assumed.
  NB: article talks about comprehension-production difference in competence,
      but this is an oxymoron!
- for memory-based theories there is no dilemma: the gap is a side-effect of
  not having enough critical mass to generalize linguistic input into units
  fit for productive use.
- macwhinney: rote learning
- cite van kampen: no innate syntactic categories

- perhaps pronoun interpretation problem (de hoop & hendriks) is not an actual
  comprehension-performance gap but occurs because grammaticality judgment is
  not developed.

- problem: optimality theory abstracts from actual language usage by
  positing constraints and mechanism operating on them. while this is
  useful in an effort to parsimoniously describe language, it does not
  explain acquisition because
- what is the methodology for constraints? can they be posited arbitrarily,
  can they be arbitrarily complex? eg. "must have evidence for claim if main
  verb is marked as such" in languages that mark source of information. is it
  possible for new constraints to be learned?

- asymmetry in dual code: linguistic code is compact versus perceptual code
  which is very sparse. 10 realisations of "dog" are much more similar to each
  other than seeing 10 different dogs.

- hypothesis: asymmetry greater for more difficult languages?