What Children Can Do That Large Language Models Cannot (Yet)
Paper by Yiu et al (2023).
They argue that LLMs and vision models should not be thought of as individual agents, but rather as new cultural technologies, similar to writing, print, libraries, the Internet, or language. LLMs offer a new means for cultural production and evolution. They aggregate large amounts of information previously generated by humans and extract patterns from that information.
The authors claim that this is very different from the truth-seeking epistemic processes that underly perception and action systems that intervene on the external world and generate new information about it. These truth-seeking epistemic processes are found in some AI systems (model-based RL systems, robotics).
Instead, LLMs allow (like cultural learning and imitation) for the faithful transmission of representations from one agent to another, regardless of the accuracy of those representations.
Not sure I buy that these two processes are so fundamentally different, but I find the shift in perspective interesting and potentially fruitful. Of course, even if they are right about current frontier models, the big question is how long we should expect that to remain true for..
They also make some draw some interesting further parallels with cultural evolution:
This contrast between transmission and truth is in turn closely related to the imitation/innovation contrast in discussions of cultural evolution in humans. Cultural evolution depends on the balance between these two different kinds of cognitive mechanisms. Imitation allows the transmission of knowledge or skill from one person to another. Innovation produces novel knowledge or skill through contact with a changing world. Imitation means that each individual agent does not have to innovate—they can take advantage of the cognitive discoveries of others. But imitation by itself would be useless if some agents did not also have the capacity to innovate. It is the combination of the two that allows cultural and technological progress.
They connect it to the debate over embodiment:
large language and vision models provide us with an opportunity to discover which representations and cognitive capacities, in general, human or artificial, can be acquired purely through cultural transmission itself and which require independent contact with the external world—a long-standing question in cognitive science.
I’ve always been a bit skeptical of views that emphasize embodiment, but not quite sure why.
Deep learning models trained on large data sets today excel at imitation in a way that far outstrips earlier technologies and so represent a new phase in the history of cultural technologies. Large language models such as Anthropic’s Claude and OpenAI’s ChatGPT can use the statistical patterns in the text in their training sets to generate a variety of new text, from emails and essays to computer programs and songs. GPT-3 can imitate both natural human language patterns and particular styles of writing close to perfectly. It arguably does this better than many people.
Although the imitative behavior of large language and vision models can be viewed as the abstract mapping of one pattern to another, human imitation appears to be mediated by goal representation and the understanding of causal structure from a young age. It would be interesting to see whether large models also replicate these features of human imitation.
They then examine whether LLMs can innovate in various contexts. The first concerns novel tool use:
So far, we have found that both children aged 3 to 7 years old presented with animations of the scenario and adults can recognize common superficial relationships between objects when they are asked which objects should go together. But they can also discover new functions in everyday objects to solve novel physical problems and so select the superficially unrelated but functionally relevant object . In ongoing work, we have found that children demonstrate these capacities even when they receive only a text description of the objects, with no images.
Using exactly the same text input that we used to test our human participants, we queried OpenAI’s GPT4, gpt-3.5-turbo, and text-davinci-003 models; Anthropic’s Claude; and Google’s FLAN-T5 (XXL). As we predicted we found that these large language models are almost as capable of identifying superficial commonalities between objects as humans are. They are sensitive to the superficial associations between the objects, and they excel at our imitation tasks—they generally respond that the ruler goes with the compass. However, they are less capable than humans when they are asked to select a novel functional tool to solve a problem.
From this they conclude:
This suggests that simply learning from large amounts of existing language may not be sufficient to achieve tool innovation.
Maybe, but couldn’t it also turn out that the problem goes away with further scaling? They do address this later in the paper:
But a child does not interact with the world better by increasing their brain capacity. Is building the tallest tower the ultimate way to reach the moon? Putting scale aside, what are the mechanisms that allow humans to be effective and creative learners? What in a child’s “training data” and learning capacities is critically effective and different from that of LLMs? Can we design new AI systems that use active, self-motivated exploration of the real external world as children do? And what we might expect the capacities of such systems to be? Comparing these systems in a detailed and rigorous way can provide important new insights about both natural intelligence and AI.
But this doesn’t really answer the objection. It could still be that scaling up will allow for the recognition of new patterns in a way that solves these issues.
In another study, they found that children (including 4 year olds) were better than LLMs at discovering causal relationships.
Overall found this paper interesting, even if I remain unconvinced about many things.