Research on grammar induction goes back at least to the work of Harris (1951) and Solomonoff (1960) and several of their contempories. Much of the initial enthusiasm for this line of research disappeared, due to the negative learnability results of Gold (1967) as well as developments in linguistic theory (e.g. Chomsky, 1965). For many years, unsupervised induction of grammar was widely viewed as both impossible and irrelevant. Further developments of grammar induction heuristics, such as by Olivier (1968) and Wolff (1982), occurred therefore largely in isolation and received little attention in either computational linguistics or general linguistics. Instead, emphasis in grammar learning research shifted to parameter setting algorithms (Carroll & Charniak, 1992; Wexler & Culicover, 1980).
The supposed irrelevance of grammar induction algorithms is based on the fact that the dominant linguistic theories of the last decades assume extensive innate knowledge. If children don't do grammar induction, why design computer programs that do? Evidence for this view comes - in addition to formal learnability results - from empirical observations in child language acquisition. Typically, such arguments have the form (see e.g. Crain, 1991): the child correctly uses construction X very early in life, even though the primary linguistic data it has received up to that point does not provide enough evidence to choose between X and several logical alternatives. Thus, it is concluded, the child must have prior (innate) knowledge of X. It is now recognized that this ``knowledge of X'' is an emergent result of the interaction between cognitive abilities, learning abilities, and the structure, meaning and pragmatics of the linguistic data the child receives (MacWhinney, 1999; Jackendoff, 2002). This interaction is precisely what grammar induction algorithms model. Current theories of language (e.g. Steedman, 2000; Tomasello, 2000; Hauser et al., 2002; Jackendoff, 2002), emphasize that language acquisition is based on a complex induction process.
The supposed impossibility of grammar induction is based on a widespread misinterpretation of negative learnability results. Gold (1967) showed that e.g. the class of context-sensitive languages is not identifiable in the limit. Even we if accept identification in the limit as the appropriate criterion for learnability, Gold's results - albeit a major achievement - mean nothing more than, in his own words: ``The class of possible natural languages is much smaller than one would expect from our present models of syntax. That is, even if English is context-sensitive, it is not true that any context-sensitive language can occur naturally'' (Gold, 1967).
In no way does Gold's proof show - as has often been claimed - that all non-trivial classes of languages (or the class of natural languages, for that matter) are unlearnable. In fact, Angluin (1980) has shown that non-trivial classes of formal languages are identifiable in the limit, and in the statistical learning tradition many other positive learnability results have been obtained (e.g. Horning, 1969). Nevertheless, restrictions on the set of possible target grammars for learning are necessary (see Angluin & Smith, 1983, for a thorough review). Zuidema (2003) argues that such restrictions are inherent in the system of iterated learning of language (Brighton, 2002; Kirby & Hurford, 2002). The consequences of this view will be explored in some detail below.