Acquiring phonemes - North Carolina State University

Acquiring phonemes: What distributional information do infants
receive in child-directed speech?
Emily Moeng
University of North Carolina, Chapel Hill
Summary
Many acquisitionists (e.g., see reviews such as Kuhl, 2004) assume that infants form phonemes by utilizing
Distributional Learning (see below). Although influential, it is claimed that the Distributional Hypothesis
alone cannot arrive at the correct number of phonemes when given data taken from natural language, due to
wide phonetic variability in natural utterances. This project seeks to evaluate a proposal aimed at overcoming
the Overlapping Categories problem, by analyzing both vowels and glides in infant-directed speech in the
Yamaguchi French Corpus (Yamaguchi, 2007) of French in CHILDES (MacWhinney, 2000).
The Distributional Hypothesis
According to the Distributional Hypothesis,
infants acquire phonemes by mapping tokens into
some n-dimensional phonetic space and noting
peaks in token frequencies in this space. If there
are two peaks in token frequency, the learner will
infer that there are two phonemes; if there is one
peak in token frequency, the learner will infer
that there is only one phoneme (see Maye et al.,
2002).
The Overlapping Categories problem and possible solution
Although the Distributional Hypothesis has been influential among acquisitionists, it has been noted by
a number of researchers that the clear frequency distributions utilized in the experiments supporting this
hypothesis are not found in natural speech (the “Overlapping Categories” problem), especially in the case of
vowel phonemes (Swingley, 2009; Bion et al., 2013). It has been suggested that infants are able to overcome
the Overlapping Categories problem by attending to phones which have been prosodically emphasized, and
that these “high quality” tokens will yield distributions similar to those expected by the Distributional
Hypothesis.
Current project
The current project compares the distributions of stops, fricatives, glides, and vowels, and seeks to
evaluate Adriaans and Swingley’s proposal for vowels and glides. Results for vowels are shown below.
It is found that stops and fricatives follow the predictions made by the Distributional Hypothesis, but
that, even when analyzing just the subset of “high quality” glides as indicated by the adult speaker’s
prosody, the greater number of /j/ tokens compared to /w/ and /4/ will yield a distribution which is not
trimodal. This would suggest that prosody indicating “high quality” examples is not sufficient for a purely
Distributional Hypothesis, suggesting the need for further cues utilized by the learner, such as the infant’s
developing lexicon (Feldman et al., 2013). Currently, a quantitative measure of “category overlap” is being
explored.
References
Adriaans, F., & Swingley, D. 2012. Distributional learning of vowel categories is supported by prosody
in infant-directed speech. In Miyake, Peebles, & Cooper (Eds.), Proceedings of the 34th Annual
Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society.
Bion, R. A., Miyazawa, K., Kikuchi, H., & Mazuka, R. 2013. Learning phonemic vowel length from naturalistic recordings of Japanese infant-directed speech. PLoS ONE, 8(2), e51594.
Feldman, N., Myers, E., White, K., Griffiths, T., & Morgan, J. 2013. Word-level information influences
phonetic learning in adults and infants. Cognition, 127(3), 427-438.
Kuhl, P. K. 2000. “A new view of language acquisition.” Proceedings of the National Academy of Sciences
97.22: 11850-11857.
MacWhinney, B. 2000. “The CHILDES project.” Tools for Analyzing Talk. Part 1.
Maye, J., Werker, J. F., & Gerken, L. 2002. Infant sensitivity to distributional information can affect
phonetic discrimination. Cognition, 82, B101-B111.
Swingley, D. 2009. Contributions of infant word learning to language development. Philosophical Transactions of the Royal Society Biological Sciences, 364, 3617-3632.
Yamaguchi, N. 2007. Markedness, Frequency: Can We Predict the Order of Acquisition of Consonants?
Second Oxford Postgraduate Conference in Linguistics, 236-243.