Skip to contents

This package provides dataframes of information about the consonants and vowels in American English. The following datasets collect acquisition (acq) features which (try to) characterize the expected acquisition or speech-motor difficulty of speech sounds. See also data_features_consonants.

Usage

data_acq_consonants

data_acq_vowels

Format

An object of class tbl_df (inherits from tbl, data.frame) with 24 rows and 16 columns.

An object of class tbl_df (inherits from tbl, data.frame) with 17 rows and 8 columns.

Details

Consonant acquisition features

data_acq_consonants provides the following features:

knitr::kable(data_acq_consonants)

phonecmubetwiscbetcm2020_90_age_meancm2020_90_age_sdcm2020_90_age_mincm2020_90_age_maxcm2020_90_num_studiescm2020_90_stages93_eightsk1992_setkd2018_complexityhml84_frequencyhml84_log10fpmmhr82_frequencymhr82_log10fpm
pPp33.26.9244812earlyearly13506944.276690148514.230197
bBb31.47.8244813earlyearly24518314.286323180274.314365
tTt38.59.2246013earlymiddle351885364.847127691084.897970
dDd35.76.7244813earlyearly241022054.581205478384.738214
kKk37.77.3244813earlymiddle24732504.436541254624.464334
gGg36.86.6244813earlymiddle24194223.860027157894.256796
CHtsh53.510.7367212middlemiddle46171473.80592131493.556614
JHdzh51.011.8367213middlemiddle46132203.69296430153.537729
mMm33.26.7244813earlyearly13758504.451689257994.470044
nNn33.17.4244813earlyearly132049394.883358683314.893059
ŋNGng40.310.8245510earlymiddle35136923.70820091024.017578
fFf38.36.3244813earlymiddle24515264.283759107314.089082
vVv50.810.8366612middlemiddle46664904.39448987534.000598
θTHth77.07.4729610latelate46183003.83418443373.695631
ðDHdh69.011.3549612latelate461086024.607571359954.614683
sSs51.316.3248412middlelate461147334.631421337524.586741
zZz56.814.3308411middlelate46544544.307763240274.439141
ʃSHsh55.010.5367212middlelate46217563.90931248443.743645
ʒZHzh70.712.260843latelate4614882.7443361002.058441
hHHh35.07.0244813earlyearly13512354.281300188964.334811
lLl53.810.4246012middlelate35982874.564229293434.525946
rRr66.618.6309612latelate351215484.656481295344.528764
wWw35.26.8244813earlyearly13602514.351697251714.459342
jYj45.811.0306013earlyearly24165103.78948085443.990103

Description of each column:

phone

phone in IPA

cmubet

phone in the CMU alphabet

wiscbet

phone in an older system used by our lab

cm2020_90_age_mean, cm2020_90_age_sd, cm2020_90_age_min, cm2020_90_age_max

Age of acquisition statistics reported by Crowe & McLeod (2020). Statistics are the mean, SD, min and max age (in months) when children reached 90% accuracy on a consonant.

cm2020_90_num_studies

Number of studies used by Crowe & McLeod (2020) to compute the corresponding statistics.

cm2020_90_stage

Developmental stage assigned to the consonant by Crowe & McLeod (2020). Sounds with an age_mean before 48 months are early, before 60 months are middle, and of 60 or older are late.

s93_eights

Developmental stage of Shriberg (1993)—that is, the early 8, middle 8 and late 8 consonants.

k1992_set

Developmental set from Kent (1992). Sets corresponds to the age of 90% mastery in Sander (1972): Set 1 is mastered at age 3-years-old, Set 2 at age 4, Set 3 at age 6, and Set 4 at a later age.

kd2018_complexity

Phonetic complexity scores from Kuruvilla-Dugdale et al. (2018). This scoring system is based on the development description of vowels and consonants in Kent (1992). The scores for individual segments range from 1 for the earliest vowels to 6 for the last-acquired consonants. Under this system, assign a score to each part of a syllable (onset, nucleus, coda) using these scores when the syllable part is a single segment and using scores of 7 and 8 for 2-consonant and 3-consonant clusters, respectively.

hml84_frequency, hml84_log10fpm

Raw frequency and log10 frequency per million of the phoneme in the Hoosier Mental Lexicon (Nusbaum, Pisoni, Pisoni, 1984) word-frequency dictionary.

mhr82_frequency, mhr82_log10fpm

Raw frequency and log10 frequency per million of the phoneme in the Moe, Hopkins, and Rush (1982) word frequency dictionary of first-graders.

Vowel acquisition features

data_acq_vowels provides the following features:

knitr::kable(data_acq_vowels)

phonecmubetwiscbetkd2018_complexityhml84_frequencyhml84_log10fpmmhr82_frequencymhr82_log10fpm
iIYi2824304.487818285374.513850
ɪIHI41953484.862542439994.701884
EYeI4389664.162419166794.280611
ɛEHE3731074.435692313224.554291
æAEae41068384.600459402904.663639
ʌAH^1445554.220629228264.416871
əAH412315684.936412389774.649250
uUWu2573204.330039191884.341471
ʊUHU4135573.70389742883.690696
OWoU2619544.363802173304.297240
ɔAOc3261103.988540120044.137767
ɑAA@1365894.135084173264.297140
AW@U3160793.77799273773.926321
AY@I3392754.165849238074.435146
ɔɪOYcI321402.9021477442.930014
ɝER3^5170483.80340642273.684474
ɚER4^5419664.194631106764.086850
phone

phone in IPA

cmubet

phone in the CMU alphabet

wiscbet

phone in an older system used by our lab

kd2018_complexity

Phonetic complexity scores from Kuruvilla-Dugdale et al. (2018). This scoring system is based on the development description of vowels and consonants in Kent (1992). The scores for individual segments range from 1 for the earliest vowels to 6 for the last-acquired consonants. Under this system, assign a score to each part of a syllable (onset, nucleus, coda) using these scores when the syllable part is a single segment and using scores of 7 and 8 for 2-consonant and 3-consonant clusters, respectively.

hml84_frequency, hml84_log10fpm

Raw frequency and log10 frequency per million of the phoneme in the Hoosier Mental Lexicon (Nusbaum, Pisoni, Pisoni, 1984) word-frequency dictionary.

mhr82_frequency, mhr82_log10fpm

Raw frequency and log10 frequency per million of the phoneme in the Moe, Hopkins, and Rush (1982) word frequency dictionary of first-graders.

Crowe and McLeod (2020) norms for English consonant acquisition

Crowe and McLeod (2020, below as the cm2020_ variables) provides a systematic review and summary statistics for age of acquisition norms for English consonants. They scoured the literature of acquisition ages for individual consonants and computed summary statistics on them. They considered just accuracy of sounds when produced in single words. Their sources include a mix of a journal articles and norms for articulation assessments. They do not weight statistics from individual studies by sample size or sampling procedure.

I prepared the Crowe and McLeod (2020) data by copying the relevant numbers from their Table 2 making the following changes: 1) rounding mean and SD values to 1 decimal point (3 days for ages in months), 2) dropping /ʍ/, 3) using /r/, /g/, /tʃ/, /dʒ/ for IPA characters instead of the specialized characters used in the article.

English language phoneme frequencies

The hml84_frequency column provides the frequency count for the phonemes in the Hoosier Mental Lexicon (Nusbaum, Pisoni, Pisoni, 1984). That is, we count how many times the phonemes appear in each word in the word list and weight them by the word frequency. For example, "ad" has two phonemes and a corpus frequency of 99, so it counts for 99 /æ/ tokens and 99 /d/ tokens.

The HML frequency counts derive from the Brown Corpus of one million English words that were printed/published in 1961. The HML provides frequencies of phonological words, and homophones are combined into a single entry. For example, the word "ad" has a frequency of 99 (11 ad tokens plus 88 add tokens). That's why, I suppose, it's a mental lexicon. Approximately 8,000 words in the HML were not in the K&F frequency word list, and these are apparently assigned a frequency of 1.

The mhr82_frequency column was constructed in a similar way but the frequencies were based on a corpus of words used by first-graders (Moe, Hopkins, & Rush, 1982).

The hml84_log10fpm and mhr82_log10fpm columns provide the frequency in log-10 frequency per million which is more appropriate for analyses. Computing frequency per million normalize the frequency counts across different corpora, and log-frequency is better suited than raw or normalized frequency counts.

I computed these phoneme frequencies independently, but retrieved my copies of the HML and MHR frequency-pronunciation tables from a course by Smith, Beckman and Foltz (2016).

The early 8, middle 8 and late 8 (Shriberg, 1993)

The English consonants are often broken down into three developmental classes, based on Shriberg (1993):

  • Early 8: m b j n w d p h

  • Middle 8: t ŋ k g f v tʃ dʒ,

  • Late 8: ʃ θ s z ð l r ʒ

This classification is included as the s93_eights column.

From these names alone, we might interpret these classes such that sounds in the Early 8 would be acquired before the ones in the Middle 8, and likewise that the Middle 8 would be acquired before the Late 8. But these classes were not created by examining patterns of typical consonant acquisition.

For some context, Shriberg (1993) introduces the Early 8, Middle 8, and Late 8 data by describing the following panel of the article's Figure 7:

About which, Shriberg (1993) says: "The values for this trend, which is a profile of consonant mastery, were taken from a group of 64 3- to 6-year-old speech-delayed children Shriberg, Kwiatkowski, & Gruber, 1992). Severity of involvement of the 24 English consonants is represented as the percentage correct for each consonant sorted in decreasing order from left to right. Notice that the most obvious breaks in this function allow for a division of the 24 consonants into three groups of eight sounds termed the Early-8, averaging over 75% correct, the Middle-8, averaging 25%-75% correct, and the Late-8, including consonants averaging less than 25% correct in continuous conversational speech (/ʒ/ is infrequently represented in young, speech-delayed children's spontaneous conversational speech)."

So, there were 64 3–6-year-old children with speech delays, and consonant sounds were divided into three classes based on how often these children produced the sounds correctly on average in a conversational speech sample. This classification is not so much a measure of the relative ordering of speech sound development as it is the relative difficulty of these sounds for children with a speech delay of unknown origin. It would be more appropriate to replace the levels of Early/Middle/Late with Easy/Medium/Hard.

Phonetic complexity (Kent, 1992; Kuruvilla-Dugdale et al. 2018)

Phonetic complexity measures (k1992_set and kd2018_complexity) assign the speech sounds different complexity levels based on biological principles outlined in Kent (1992). Because Kent (1992) is a book chapter that is not floating around online, it's worthwhile to review the provenance of these complexity measures. In short, Kent (1992) applied interpreted consonant and vowel development data in terms of their motor demands.

Sander (1972) set out to construct a set of developmental norms for typical consonant acquisition in English. His big idea was to include the median age of acquisition as well as the 90th percentile age of acquisition. The median can tell us something about the average acquisition of the speech sounds, and the 90th percentile can set a benchmark for delayed acquisition. There are some quirks of the methodology. First, Sander (1972) was targeting "customary articulation" which was defined using production accuracy average across word positions. So, the age of 50% customary articulation for /t/ is the earliest age when the average of word-initial accuracy, word-medial accuracy and word-final accuracy is greater than 50%. Second, the norms for this study were created by augmenting data from 3–8-year-olds (Templin, 1957; n = 480) with some earlier data for 2-year-olds (Wellman et al. 1931; n = 15).

Sander (1972) presented these acquisition norms in the following figure:

Kent (1992) aimed to explain the course of English sound development in terms of biological and motoric principles. He examined the ages of 90% acquisition from Sander (1972)—that is, the right edges of the bars in the previous figure—and observed that /p m n w h/ are mastered at age 3, /b d k g j f/ at age 4, /t ŋ r l/ at age 6 and /s z ʃ ʒ v θ ð tʃ dʒ/ after age 7. He then described motoric demands in each of these sets of sounds. I'll paraphrase:

  • Set 1 requires fast "ballistic" movements for stops /p m n/, slow "ramp" movements for /w h/, velopharyngeal control for oral-nasal contrast, laryngeal control for voicing contrast.

  • Set 2 adds more stops /b d k g/ and another ramp /j/ and a new place of articulation (velars), but also requires "fine force regulation for frication" for /f/.

  • Set 3 adds more stops /t ŋ/, but also requires tongue "bending" for /r/ and /l/.

  • Set 4 adds more lingual fricatives /s z ʃ ʒ θ ð/ which require tongue bending and fine force control along with /v tʃ dʒ/. Kent does not characterize the motor demands for the affricates /tʃ dʒ/.

Let's pause for a moment and observe that this breakdown is just an attempt to describe the Sander (1972) norms, and it is somewhat underdeveloped. For example, why is /t/ in Set 3 but /d/ in Set 2? It is not answered here, but I think this late mastery is an artefact of Sander's requirement of 90% accuracy averaging over the three word-positions. The medial and final productions of /t/ might require allophonic variation in /t/ (e.g., flapping or glottalization), so mastery of /t/ would require different motor gestures and some phonological knowledge on the part of the child. But in Kent's description /t/ is a later-mastered ballistic movement.

Still, the main point of Kent's description, I think, is that lingual (tongue) consonants are more difficult. Elsewhere in the chapter, Kent (1992) describes how the tongue is a "muscular hydrostat" like an elephant trunk, and bending a hydrostat requires coordination of different muscle directions:

"Gaining motor control over a hydrostat presents some special problems to the young child learning speech. For one, bending the hydrostat is unlike bending a jointed structure such as a finger. The tongue has no joints per se; it flexes by appropriate contraction of its three-dimensional network of intrinsic longitudinal, vertical, and transverse fibers. Bending a hydrostat requires that muscle fibers be shortened on one aspect simultaneously with a resistance to a change in diameter (Smith and Kier 1989). If the diameter change is not resisted, then the hydrostat will shorten on one side but will not bend. To use the tongue in speech, the child must learn to control the tongue to meet skeletal, movement, and shaping requirements, often simultaneously. These special characteristics of the tongue may well play a role in vowel and consonant mastery."

Kim and colleagues (2010) applied these developmental sets (k1992_set) as articulatory complexity levels while examining consonant errors in dysarthric speech. They then asked questions such as whether more complex consonants had more consonant errors than less complex ones (yes) or whether lower intelligibility speakers made more complexity-reducing consonant substitutions than higher intelligibility speakers (apparently so). Examining the speech of 5-year-olds, Allison and Hustad (2014) later used these complexity levels as a way to score the phonetic complexity of sentences. They assigned consonants 1–5 scores (the 1–4 complexity levels with a score of 5 for consonants clusters), and summed up the scores to provide a complexity score for a sentence. Three of the eight 5-year-olds with dysarthria showed a negative effect of sentence complexity on intelligibility.

Kent (1992) also described the yearly developmental progression of vowels. I'll paraphrase again:

  • By age 1: Infants produce vocants (vowel precursors) which correspond to the low-front, central and low-back vowels /æ ɛ ʌ ə ɑ/. Thus, the tongue only moves in the anterior-posterior direction (i.e., there is limited up-down movement).

  • By age 2: Toddlers produce the "maximally dissimilar" corner vowels /i u ɑ/ and produce /o/ and the central vowels /ʌ ə/.

  • By age 3: Children incorporate two lower vowels /ɛ ɔ/ and the diphthongs /aɪ aʊ ɔɪ/ which require gliding movements.

  • By age 4: Children incorporate the remaining non-rhotic vowels /ʊ ɪ e æ/. The appearance of the front vowels suggests that tongue-jaw coordination is a relatively late motor achievement. (/i/ appears earlier because its extreme height is easy.)

  • Lastly: Children incorporate /ɚ ɝ/ last because these r-colored vowels require tongue bending.

Kuruvilla-Dugdale and colleagues (2018) used this description to incorporate vowels into the phonetic complexity scale (kd2018_complexity). The /ʌ ə ɑ/ vocants from age 1 and the vowels from age 2 mark the bottom of the complexity scale. The vowels that are acquired at ages 3, 4 and afterwards are assigned to the consonant complexity levels with the same age of mastery. Finally, consonant clusters serve as the ceiling for the scale:

Table: Phonetic complexity scores from Kuruvilla-Dugdale et al. (2018).

kd2018_complexityconsonantsvowels
1ʌ ə ɑ
2i u oʊ
3p m n h wɛ ɔ aʊ aɪ ɔɪ
4b d k g f jɪ eɪ æ ʊ
5t ŋ l rɝ ɚ
6tʃ dʒ v θ ð s z ʃ ʒ
72-consonant clusters
83-consonant clusters

It is not clear how to apply this scale, so my approach has been to break words into subsyllabic units and assign scores to the syllable onsets, nucleui and codas in each word. For example, "jump" is /dʒ/ + /ʌ/ + /mp/ so it would have complexity of 6 + 1 + 7 = 14, and "jumper" includes a syllable break between the cluster, so it would have a score 6 + 1 + 1 + 1 + 5 = 14.

Kuruvilla-Dugdale and colleagues (2018) used this scoring system to compare intelligibility for low complexity versus high complexity words. For example, for speakers with ALS and mild dysarthria, there was statistically clear reduction in intelligibility for high complexity words but not for low complexity words. I applied this scoring system on single-word intelligibility in children's speech (Mahr & Hustad, 2023). There was a probable but not statistically clear negative effect of complexity on intelligibility over and above the effects of age, word frequency and word neighborhood competition. (Regrettably, I coded the one consonant cluster in the word list with a complexity of 8 instead of 7, but otherwise this is the approach.)

References

Allison, K. M., & Hustad, K. C. (2014). Impact of sentence length and phonetic complexity on intelligibility of 5-year-old children with cerebral palsy. International Journal of Speech-Language Pathology, 16(4), 396–407. https://doi.org/10.3109/17549507.2013.876667

Crowe, K., & McLeod, S. (2020). Children’s English Consonant Acquisition in the United States: A Review. American Journal of Speech-Language Pathology, 29(4), 2155–2169. https://doi.org/10.1044/2020_AJSLP-19-00168

Kent, R. D. (1992). The Biology of Phonological Development. In C. A. Ferguson, L. Menn, & C. Stoel-Gammon (Eds.), Phonological development: Models, research, implications (pp. 65–90). York Press.

Kim, H., Martin, K., Hasegawa-Johnson, M., & Perlman, A. (2010). Frequency of consonant articulation errors in dysarthric speech. Clinical Linguistics and Phonetics, 24(10), 759–770. https://doi.org/10.3109/02699206.2010.497238

Kuruvilla-Dugdale, M., Custer, C., Heidrick, L., Barohn, R., & Govindarajan, R. (2018). A Phonetic Complexity-Based Approach for Intelligibility and Articulatory Precision Testing: A Preliminary Study on Talkers With Amyotrophic Lateral Sclerosis. Journal of Speech, Language, and Hearing Research, 61(9), 2205–2214. https://doi.org/10.1044/2018_JSLHR-S-17-0462

Mahr, T. J., & Hustad, K. C. (2023). Lexical Predictors of Intelligibility in Young Children’s Speech. Journal of Speech, Language, and Hearing Research, 66(8S), 3013–3025. https://doi.org/10.1044/2022_JSLHR-22-00294

Sander, E. K. (1972). When are Speech Sounds Learned? Journal of Speech and Hearing Disorders, 37(1), 55–63. https://doi.org/10.1044/jshd.3701.55

Shriberg, L. D. (1993). Four New Speech and Prosody-Voice Measures for Genetics Research and Other Studies in Developmental Phonological Disorders. Journal of Speech, Language, and Hearing Research, 36(1), 105–140. https://doi.org/10.1044/jshr.3601.105