Chapter 14 Morphological Overgeneralization

Although Chapter 13 examined broad patterns of morphosyntactic development in relation to vocabulary size, we did not conduct specific analyses of the development of morphology per se. In this brief chapter, we rectify that omission by examining patterns of morphology in two cases, plural noun morphology and past tense verb morphology. These two morphological systems are well-studied in the literature because plural (e.g., cats) and past tense forms (e.g., walked) are some of the earliest to be produced by young children (Brown 1973). Moreover, plural and past tense forms are viewed as a window into the mechanisms underlying productive language use because children will sometimes make overgeneralization errors, such as tooths or goed. The developmental time course of children’s overgeneralization errors has been an influential case study for language learning – and mental representation more broadly (Rumelhart and McClelland 1986; Pinker and Prince 1988; Pinker 1991; Elman et al. 1996). We begin by describing cross-sectional patterns of overgeneralization across languages and then turn to characterizing longitudinal change in overgeneralization in two languages which have different morphological systems, English and Norwegian.

14.1 Introduction and Methods

When a child gleefully says Mommy, I brushed my tooths!, a proud parent might rejoice in their child’s accomplishment. At the same time, they might worry because the child has overgeneralized the regular plural -s inflection to an irregular form (teeth). Overgeneralization errors are not uncommon in child speech, although like all aspects of early language development, there is considerable variation across children in how many errors children make (Marcus et al. 1992; Maratsos 1993). Importantly, overgeneralization errors have been viewed as a positive sign that the child has abstracted the regularities of their language (e.g., add -s to indicate there is more than one thing) and applied that regularity in a productive way (i.e., the child has not likely heard the adult use that form).

The developmental timeline of generalization has been investigated extensively for the English plural and past tense. One classic study in child language (Berko 1958) explored children’s ability to be productive with plural and past tense forms in an elicitation task using novel forms (e.g., Here is a wug! Now there are two of them – there are two wugs!). Other classic studies using naturalistic designs (Cazden 1968; Brown 1973) noted that correct irregular forms often appear early in development, alongside correctly inflected forms. Only later in development are overgeneralizations produced, after children have been successful in producing the correct irregular form, apparently unlearning what they already knew. With further development, overgeneralizations are less likely to occur – although they are still more difficult for children with language disorders (Marchman, Wulfeck, and Weismer 1999) and sometimes produced by adults (especially under stressed conditions; McDonald and Roussel 2010).

The particular mechanisms underlying this developmental timeline have been subject to considerable debate in the literature (Marcus et al. 1992; Elman et al. 1996). A particular focus of interest has been the so-called “U-shaped” developmental pattern (correct irregular – overgeneralization – correct irregular) . On one view, the onset of overgeneralizations after correct productions may signify a grammatical rule coming “online,” after a period in which correctly inflected forms were learned by rote (Pinker 1998). Alternatively, the onset of overgeneralization errors may be the consequence of an accumulation of lexical exemplars from which children extract the general patterns (Marchman and Bates 1994; Plunkett and Marchman 1989). Each of these views, as well as those in between, make different predictions about the universal nature of the U-shaped pattern, both across children and across individual noun or verb forms, as well as the sources of variability that might influence the developmental time course of this pattern.

This debate has become a case study for understanding the representational formats which form the foundation for human learning and generalization (Pinker 1998). Yet few studies have had the opportunity to explore patterns of the onset of overgeneralization errors in large datasets like those available in Wordbank. Moreover, most of the publishe work in this debate came from English. For a broader view, it is critical to begin to evaluate patterns of overgeneralization errors crosslinguistically, given that morphological systems vary widely across languages (Marcus 1995; Clahsen et al. 1992).

Here we explore the consistency and variability timecourse of children’s use of plural and past tense overgeneralizations using CDI data from the English and Norwegian samples. Our approach here depends on the fact that the original developers of the CDI were interested in this debate and included overregularizations as options in the morphological sections of the form. For example, an item like foot includes foots and feet as possible options to be checked. To take advantage of this, we coded overgeneralizations on each of the 5 WS-type forms for which we had access to data from these items (two from English dialects). We then used this coding to count the total proportion overregularizations by child, both overall and within noun and verb categories. We first show analyses of these overregularizations for cross-sectional data, and then present more in-depth analyses of longitudinal data in English and Norwegian.

14.2 Cross-Sectional Data

We begin by describing overregularization within the cross-sectional data. For any dataset with longitudinal administrations, we we include only the earliest administration for each child. We then compute the proportion of nouns and verbs that each child is reported to overregularize. The mean proportion of overregularizations (Table 14.1) reported is relatively low overall, both for nouns and verbs, ranging from 2% to 12%. For children who overregularize at least one item, the overall overregularization proportion is of course higher, ranging from 9% to 30%, but even these rates are low – suggesting that those children who overregularize are not overregularizing all or even most forms at the same time.

Table 14.1: Proportion of noun and verb overregularization across languages.
All children
Children who overregularize
Language N Nouns Verbs Nouns Verbs
Danish 3714 0.05 0.03 0.12 0.09
English (American) 3721 0.04 0.02 0.20 0.13
English (Australian) 1497 0.04 0.03 0.20 0.14
Norwegian 5131 0.07 0.05 0.19 0.12
Slovak 1058 0.11 0.12 0.30 0.21

Individual items vary in how often they are overregularized (Tables 14.2 and 14.3): in English, items range from childs and satted, which are virtually never overregularized, to feets and blowed, which are overregularized by 12% and 9% of children, respectively.

Table 14.2: Least overregularized items in each language.
Nouns
Verbs
Language Item Proportion Item Proportion
Danish mænde 0.01 såddede 0.00
English (American) childs 0.01 satted 0.00
English (Australian) childs 0.01 wented 0.01
Norwegian barner 0.02 lyvet 0.00
Slovak k jazerovi 0.04 zjednem 0.01
Table 14.3: Most overregularized items in each language.
Nouns
Verbs
Language Item Proportion Item Proportion
Danish fodder 0.11 drikkede 0.14
English (American) feets 0.12 blowed 0.09
English (Australian) teeths 0.08 falled 0.07
Norwegian boker 0.16 hjelpte 0.14
Slovak bábätkovi 0.22 česaj 0.29

Figure 14.1 shows each child’s proportion of overregularized items plotted by age (for children who overregularize at least one item overall). The major generalization from these data is that curves tend to be surprisingly flat – there are not large, consistent developmental increases for any language. The inset r2 for each panel in this graph show the variance explained by quadratic models of overregularization by age, quantitatively confirming the visual impression of limited developmental change – the largest of these r2 values is only 0.05.

Each child's proportion of items overregularized as a function of their age size in each language (curves show model fits -- overregularization proportion from quadratic and linear terms for age).

Figure 14.1: Each child’s proportion of items overregularized as a function of their age size in each language (curves show model fits – overregularization proportion from quadratic and linear terms for age).

Overregularization is more tightly correlated with vocabulary size than with age (Figure 14.2), with r2 values ranging from 0.45 to 0.70. However, the relationship between overregularization and vocabulary tends to be far weaker than the relationship between correct morphological inflection and vocabulary (as described in Chapter 13), for which r2 is around 0.9.

Each child's proportion of overregularization as a function of their vocabulary size in each language (curves show model fits -- overregularization proportion from quadratic and linear terms for vocabulary size).

Figure 14.2: Each child’s proportion of overregularization as a function of their vocabulary size in each language (curves show model fits – overregularization proportion from quadratic and linear terms for vocabulary size).

Another perspective on the structure of these data comes from comparing overregularization across lexical categories, as shown in Figure 14.3. While a substantial proportion of children (0.03–0.19) do not overregularize any items, those who do are doing so across nouns and verbs quite consistently, for all of the languages in our sample. r2 values for the relationship between noun and verb overregularizations range from 0.36 to 0.59.

Each child's proportion of verbs overregularized against proportion of nouns overregularized (lines show linear model fits -- verb proportion from noun proportion).

Figure 14.3: Each child’s proportion of verbs overregularized against proportion of nouns overregularized (lines show linear model fits – verb proportion from noun proportion).

In summary, our data suggests that when aggregated across children, overregularization does not appear to have a set relationship with age, and does relate to vocabulary size, though not nearly as strongly as does correct morphological inflection. Additionally, overregularization is highly variable across children, but relatively stable within children but across lexical categories.

14.3 Longitudinal Data

Next, we ask whether individual children’s overregularization trajectories can paint a clearer picture of the overall developmental time course of overregularization. As in other chapters, we make use of the longitudinal data from the American English and Norwegian datasets. These data allow us to examine changes in generalization across individuals. Since data are sparser in the English data than the Norwegian, we pursue slightly different approaches in our analysis.

14.3.1 English

In the American English dataset, out of children who overregularize at least one item, there are 85 with three longitudinal administrations, 2 with four administrations, and none with more than that. For each child, we compare their three overregularization values – youngest, middle, and oldest (with the two middle values averages for the four administration children) – to categorize their overregularization trajectory as:

  • increase (/) if youngest to middle stays the same or increases and middle to oldest stays the same or increases
  • decrease (\) if youngest to middle stays the same or decreases and middle to oldest stays the same or decreases
  • recovery (Λ) if youngest to middle increases and middle to oldest decreases
  • retreat (V) if youngest to middle decreases and middle to oldest increases

Figure 14.4 shows the trajectories of each child, grouped by this classification. For both nouns and verbs, the vast majority of children (72% and 84%, respectively) increase in overregularization over this age range, a substantial minority (17% and 16%) recover from overregularization, and very few decrease or retreat. So we see that by 30 months, most children who have shown any sign of overregularization to date are continuing to overregularize more and more, while some are recovering from an earlier peak rate.

Empirical overregularization trajectories for American English children with at least three observations, categorized by overall shape.

Figure 14.4: Empirical overregularization trajectories for American English children with at least three observations, categorized by overall shape.

14.3.2 Norwegian

Because longitudinal data are so much more plentiful in the Norwegian dataset, we can quantify children’s trajectories more rigorously. For the 449 number of children who overregularize at least one item and have at least 4 administrations, we fit a logistic regression for each child predicting how many items they overregularize from their age. For each child, we select either a model with a linear effect of age or both linear and quadratic effects of age based on which one fits the data better (i.e. has the lower AIC). We then classify each child’s trajectory as:

  • increase (/) if the best fit model is linear and the effect of age is positive
  • decrease () if the best fit model is linear and the effect of age is negative
  • recovery (Λ) if the best fit model is quadratic and the quadratic effect of age is negative
  • retreat (V) if the best fit model is quadratic and the quadratic effect of age is positive

These classifications are analogous to the three datapoint ones from the English data, but are taking advantage of the higher density in Norwegian to have smoothed trajectories and model-based estimates. Figure 14.5 shows the grouped trajectories of each child. In this dataset, which extend to 36 months, many more children exhibit recovery trajectories for both nouns and verbs (47% and 49%) and fewer exhibit increase trajectories (44% and 46%), for an approximately even split between the two types. While the English and Norwegian data may differ in a variety of ways, it’s plausible that the difference reflects that many children are recovering from overregularizing between 30 and 36 months (an age range that is not represented in the English dataset).

Model fit overregularization trajectories for Norweigian children with at least 4 observations, categorized by overall shape.

Figure 14.5: Model fit overregularization trajectories for Norweigian children with at least 4 observations, categorized by overall shape.

Additionally, we see evidence that trajectory shape is correlated between nouns and verbs: out of the children who have recovery noun trajectories, 62% also have recovery verb trajectories; conversely, out of the children who have increase noun trajectories, 59% also have increase verb trajectories (see Figure 14.6). This echoes the observation from the cross-sectional data that while substantial variation between children, there is some degree of consistency within child across lexical categories.

Proportion of children exhbiting each verb overrregularization trajectory type as a function of noun overregularization trajectory type.

Figure 14.6: Proportion of children exhbiting each verb overrregularization trajectory type as a function of noun overregularization trajectory type.

Lastly, for a more detailed look at recovery from the overregularization, we examine the trajectories of the 30 Norwegian children with the most observations (at least 9) who have recovery-type trajectories for both nouns and verbs (Figure 14.7). We observe that for many of these children, the trajectories for nouns and verbs do not just have the same overall type, but also hang together over the course of development. The median age when these children are most likely to overregularize is 29 months for nouns and 30 months for verbs, consistent with the speculation above that 30 months marks the beginning of more widespread recovery.

Model fit overregularization trajectories for Norwegian children with recovery-type trajectories and at least 9 observations. Marks in each line show the inflection point of the recovery.

Figure 14.7: Model fit overregularization trajectories for Norwegian children with recovery-type trajectories and at least 9 observations. Marks in each line show the inflection point of the recovery.

14.4 Conclusions

This chapter examined overregularization data from the noun and verb morphology items of the Words and Sentences form in English and Norwegian. A central takeaway from our analyses is that there is tremendous heterogeneity in these data – some children are never reported to overgeneralize within the age range of the sample, while others do substantially more, without a set developmental timecourse across children. Children with larger vocabularies tend to overregularize more. Overregularization also tends to proceed in tandem for nouns and verbs, in both cross-sectional and longitudinal data. Furthermore, individual children’s trajectories can be fruitfully categorized by overall shape, leading to the observation that for children who overregularize, approximately half begin recovery towards correct inflection at around 30 months and half continue to overregularize more and more through 36 months.

One important caveat to these findings is the possibility that parents are less keen observers of morphological generalization than they are of vocabulary growth more broadly. It is possible that the lack of systematicity we observed is due to inconsistency in which parents report overgeneralization – perhaps only some parents are sensitized to the somewhat meta-linguistic observation that their child is using a frequent ending incorrectly (e.g., foots). This kind of bias would be consistent with the noun-verb overregularization correlation we observed – only systematic validation outside of the CDI would truly dispel this worry.

On the other hand, the kind of variability we observed is not unlike the variability observed in naturalistic studies of overgeneralization (e.g., Marcus et al. 1992). Thus our work here suggests caution in the received narrative of morphological generalization as just another stage in the predictable progression of language learning. Unlike, say, word combination (cf. Chapter 13), we do not see as clear and consistent developmental progression in this more elusive behavior.