Chomsky's poverty of stimulus argument is, if anything, strengthened by LLMs. You need to read the entire internet to make statistical methods work at producing grammatical texts. Children don't read the entire internet but do produce grammatical texts. Therefore &c. QED.
I think this is greatly complicated by the fact that the human brain has been "pre-trained" (in the deep learning sense) by hundreds of millions of years of evolution.
A pre-trained LLM also can also learn new concepts from extremely few examples. Humans may still be much smarter but I think there's a lot of reason to believe that the mechanics are similar.
The poverty of the stimulus (POS) argument is that "evolutionary pre-training" in the form (recursive) grammar is fundamentally required and can not be inferred from the stimulus.
The argument is based on multiple questionable assumptions of Chomskian linguistics:
- Humans actually learn grammar in the Chomskian way
- Syntax is separate from semantics, so only language (utterances) can be learned from uttrances, and not e.g. what is seen in the environment
- At least in the Gold's formalization of the argument language is learned only from "positive examples", so e.g. the learner can't observe that some does not understand some utterance
One could argue for a (very) weak form of POS that there has to be some kind of "inductive bias" in the learning system, but this applies to all learning as shown by Kant. The inductive bias can be very generic.
It seems to be a persistent myth (possibly revived more recently due to Norvig?) that Chomsky's POS argument has some interesting connection to Gold's theorem. The two things have only a very loose logical connection (Gold's theorem is in no sense a formalization of any claim of Chomsky's), and Chomsky himself never based any of his arguments for innateness on Gold's theorem. Here is a secondary source making the same point (search for 'Gold'): https://stevenpinker.com/files/pinker/files/jcl_macwhinney_c...
The assumption that syntax is 'separate from semantics' also does not figure in any of Chomsky's POS arguments. Chomsky argued that syntax was separate from semantics only in the fairly uncontroversial sense that there are properly syntactic primitives (e.g. 'noun', 'chain', 'c-command') that do not reduce entirely to semantic or phonological notions. But even if that were untrue, it would not undermine POS arguments, which for the most part can be run without any specific assumptions about the syntax/semantics boundary. Indeed, semantic and conceptual knowledge provides an equally fertile source of POS problems.
Yeah, I don't necessarily buy the whole Chomskian program. I'm willing to be persuaded that the reason kids learn to speak despite their individual poverty of stimulus is that there was sufficient empirically experience stimulus over evolutionary time. The Chomskian grammar stuff seems way too Platonic to be a description of human neuroanatomy. But be that as it may, it's clear the stimulus it takes to train an LLM is orders of magnitude greater than the stimulus necessary to train an individual child, so children must have a different process for language acquisition.
Children do get ~6000 hours a year of stimulus. Spoken, unspoken, written, and body language. Even then they aren't able to form language proficiently until 5 or 6 years old. Does the internet contain 30,000 hours of stimulus?
That's astonishing. If you watched all of them, how much new information would you learn? I suspect a large portion of them are the same information presented differently; for example a news story duplicated by hundreds of different channels.
Yeah, I imagine every moment of communication a child receives is new information not just baby talk about getting the spoon in their mouth and asking them if they have pooped.
I'm sure someone else could calculate the informational density of all of the text on the internet vs. 30,000 hours of sight, smell, touch, sound, etc density. My intuition tells me it's not even close.
Does the information contained in smell and touch contribute to the acquisition of language? Keep in mind you'd be arguing that people born without a sense of smell take longer to develop language, or are otherwise deficient in it in some way. I'm doubtful. It's certainly tricky to measure full sight / sound vs. text, but luckily we don't have to, because we also have video online, which, surprise surprise, utterly dwarfs 30,000 hours of sight and sound in terms of total information.
One qualitative difference is that the child's 30,000 hours is realtime, interactive, and often bespoke to the individual and context. All the videos on youtube are static and impersonal.
I think what he's saying is that "real world" interaction is so high bandwidth it dwarfs internet (screen based) stimulation. Not saying I agree just that he's not comparing hours being alive to hours of youtube