Agent Infrastructure

There’s a new paper on agent infrastructure, i.e. “technical systems and shared protocols external to agents that are designed to mediate and influence their interactions with and impacts on their environments.”

The authors claim three main functions for agent infrastructure:

Attribution: Attributing actions, properties, and other information to agents or users.
Interaction: Shaping how agents interact with counterparties.
Response: Addressing problems that occur during interaction with an agent.

For each function, they list a few different agent infrastructures that could plausibly help.

Attribution:

Identity binding: Associates an agent or its actions with a real-world identity, Such as a human or corporation.

Certification: Makes, verifies, and revokes claims about an agent (instance), such as what data the agent is collecting, which tools it can access, and the level of autonomy it has been authorized to exercise.

Agent IDs: Identifies instances of agents and links to useful attributes, such as the credentials above.

Interaction:

Agent channels: Isolates agent traffic from all other digital traffic in interactions with an existing digital service (e.g., Airbnb).

Oversight layers: Enables actors (e.g., a user) to intervene upon an agent’s actions.

Inter-agent communication: Helps facilitate joint activities amongst groups of agents.

Commitment devices: Enforce commitments between agents.

Response:

Incident reporting: Enables actors (e.g., humans, agents interacting with other agents) to report incidents.

Rollbacks: Helps void or undo an agent’s actions.

The agent infrastructure we adopt will provide many of the key parameters for cultural evolution among AI agents. Inter-agent communication and commitment devices can be used to encourage cooperation. Oversight layers, incident reporting, and rollbacks could potentially be used to ensure that cultural evolution does not move in undesirable directions. Etc.

In principle it will be possible for a large population of advanced AI agents to cooperate and coordinate much more efficiently than humans could. This should remind us to be bullish on automated R&D.

In general, this is a topic I will be following closely. One line of research could be to just take different proposed agent infrastructures, insert them into a model with cultural evolution, and see what happens.

January 24, 2025

Cumulative Cultural Evolution in Some AI Algorithms

Mesoudi & Thornton (2018) provide the following criteria for when a population exhibits cumulative cultural evolution:

(i) a change in behaviour (or product of behaviour, such as an artefact), typically due to asocial learning, followed by (ii) the transfer via social learning of that novel or modified behaviour to other individuals or groups, where (iii) the learned behaviour causes an improvement in performance, which is a proxy of genetic and/or cultural fitness, with (iv) the previous three steps repeated in a manner that generates sequential improvement over time.

Eureka

Eureka is an algorithm for designing reward functions that allow RL agents to learn various behaviors. It works as follows. First, it runs K LLMs. As context, each LLM is given a description of the task the RL agent is supposed to carry out, the source code for the environment in which that task will be carried out, as well as an initial prompt. Each LLM outputs a reward function as executable Python code. Eureka then evaluates these reward functions using the fitness function for the environment and selects the best-performing reward function among the K outputs. It then generates a new prompt based on this best-performing reward function. This new prompt includes not only the reward function itself but also automated feedback that summarizes the policy training dynamics. After this, Eureka runs a new iteration of K LLMs, which are now also given the new prompt. This process is repeated for N iterations, after which Eureka outputs the best-performing reward function.

How does this process look in terms of cumulative cultural evolution? The first iteration of LLMs engages in purely asocial learning. The best-performing reward function is then transferred via social learning to the next iteration of LLMs, where it causes an improvement in performance. This second iteration of LLMs engages in further asocial learning, and the best-performing reward function is again transferred via social learning to the next iteration, causing an improvement in performance. Thus the first three steps are repeated in a way that leads to sequential improvement over time, and all four conditions for cumulative cultural evolution are satisfied.

Promptbreeder

Promptbreeder is an algorithm for generating prompts. Starting with an initial task prompt P (e.g. “Solve the math problem”), it generates a mutated task prompt P’ by running an LLM on the initial task prompt P and a mutation prompt M (e.g. “Make a variant of the prompt”). In this way, it initializes a population of mutated task-prompts and evolves the population by sampling two individuals, taking the individual with higher fitness, mutating it using a variety of mutation operators, and overwriting the loser with the mutated copy of the winner. In addition to mutating task prompts, Promptbreeder also mutates the mutation prompts themselves. (Note: I’ve skipped over a lot of complexity here, but for present purposes this should be enough.)

The initial iteration of mutated task prompts can be seen as generated by asocial learning. Social learning then happens when losers are overwritten by winners (though this also involves a mutation step). This leads to a performance improvement—experimentally, the authors found that fitness continues to increase throughout the run. As this process is repeated, leading to sequential improvement, all criteria for cumulative cultural evolution are met. (I haven’t here touched upon the fact that the mutation prompts are themselves mutating, which is interesting in its own right, but not necessary for establishing the presence of cumulative cultural evolution.)

March 28, 2024

On the Foundations of Norm Psychology

Humans are remarkably cooperative. This large-scale cooperation—extending far beyond immediate kin—has allowed us to achieve a dominant position on Earth. But how is this cooperation sustained in the face of the apparent temptation to defect?

One crucial component is our reliance on norms that designate actions as required, permissible, or forbidden. Such norms are sometimes codified into law, but they need not be. In many cases, it’s in our rational self-interest to follow norms: violating them might expose us to punishment or make others less willing to cooperate with us in the future. But it seems we are often motivated to follow norms even when defection would be in our rational self-interest, e.g. helping a stranger or tipping at a restaurant you’ll never visit again.

How do we learn to follow norms and punish defectors? And how do we acquire intrinsic motivation to do so? Beginning with Sripada and Stich (2006), some researchers have proposed that we possess an innate, domain-specific norm psychology. This norm psychology picks up cues that norms guide some behaviors in our local cultural environment and infers what these norms are. It also provides us with the ability to detect norm violations, as well as intrinsic motivation to follow norms and punish violators. On this view, as norms or proto-norms began to arise in human communities, this new cultural environment provided intense genetic selection pressures favoring greater ability to learn and follow norms, in a case of gene-culture coevolution.

Heyes (2024) agrees that our ability to process and follow norms was critical to the emergence of large-scale cooperation, but disagrees with the further claim that this ability is due to an innate, domain-specific norm psychology. Instead, she thinks that the necessary innate features—such as greater intelligence, increased social tolerance and motivation, and attentional biases that from early infancy make us especially notice what other people do—are all domain-general and that the necessary domain-specific features are all culturally learned (“cognitive gadgets”) rather than genetically inherited. In this way, her criticism parallels her earlier criticism of the “California school” (or Boyd-Richerson-Henrich school) of cultural evolution (Heyes 2018), which holds that many of our cultural learning biases have been genetically selected for.

Her criticism includes the following points:

People often confuse common or frequent behavior with required or permissible behavior. So either we lack a specialized circuit for dealing with norms, or this circuit frequently malfunctions. (Though it’s not obvious to me why this would favor the view that the specialized circuits are culturally learned rather than innate.)
In many models, a domain-general process like reinforcement learning appears sufficient for learning norms.
Subsequent studies have shown that our intrinsic motivation to follow norms is not as strong as previously thought. While people may be inclined to engage in costly punishment of defectors in laboratory settings, they are less likely to do so in the wild, where they are not under observation by experimenters. (Though again, it’s not obvious to me why this would favor a “cognitive gadgets” account. Perhaps the idea is that a genetically inherited norm psychology would not be rigid enough to accommodate this flexibility?)

While I initially came to cultural evolution via the Henrich-Boyd-Richerson approach, I’ve found some of Heyes’ criticism compelling. But what is at stake in this debate? One interesting possibility is that if Heyes is right, it should be easier to generate cooperative, norm-following behavior in AI agents. After all, creating the appropriate innate, domain-specific modules from scratch might be difficult, as it seems to require major revisions to the overall architecture. By contrast, if norm psychology is culturally learned, it might suffice to expose agents to appropriate stimulus and feedback.

March 25, 2024

Enlightenment and Utopianism

one of the main problems of the Enlightenment stems from the assumption that the mind alone is the source of meaning. This often led to an unwarranted intellectual conﬁdence. Many appeared insufﬁciently aware of the limitations of the historically conditioned individual mind and tended to identify it with the universal, transcendental reason. Kant, who coined the term ‘‘transcendental ego,’’ warned against equating it with the empirical self. Others did not adequately distinguish them and ended up with an idea of reason that badly needed to be desublimated. One effect of it was that they tended to overestimate the realistic chances of their projects. Utopian treatises on perpetual peace, on a permanent international brotherhood, on the uniﬁcation of all sciences, and on the future extinction of crime, bear the sign of a naive presumptuousness. For a brief period many intellectuals, especially but not exclusively in France, expected that the French Revolution, the ultimate utopia, was about to realize all the Enlightenment’s hopes.

Louis Dupré, The Enlightenment and the Intellectual Foundations of Modern Culture, p. 336

February 19, 2024

The changing meaning of atheism

The term ‘‘atheism’’ has rarely preserved the same meaning for a long time. Socrates was condemned for one kind of atheism and Epicurus was accused of another. Both of them believed in gods and today we regard neither as an atheist. Spinoza, that most religious thinker, was considered an atheist because he changed the relation between divine immanence and transcendence, though he continued to maintain a distinction between the two. In the eighteenth century, critics became less inclined to brand as atheist anyone who was not an orthodox Christian or Jew. Yet new candidates for the title appeared. In the preface to his long poem, Creation . . . Demonstrating the Existence and Providence of God (1702), Richard Blackmore states that two sorts of men have rightly been called atheists: ‘‘those who frankly and in plain terms have denied the being of a God; and those who though they asserted his being, denied those attributes and perfections, which the idea of a God includes.’’

Louis Dupré, The Enlightenment and the Intellectual Foundations of Modern Culture, p. 256

February 15, 2024

Parfit’s Triple Theory

I never cared much for Parfit’s Triple Theory. When I engaged with On What Matters (which I admittedly did very little), I was still too much of a consequentialist to be interested in this project.

I’ve since become more interested in contractualist framings of morality, in large part as a result of reading a lot about the history of humanity and cultural evolution and coming to see that morality—insofar as it exists—is a social technology that evolved to solve problems of cooperation and coordination.

From this perspective, integrating our three primary ethical frameworks—contractualism, consequentialism, and deontology—seems like a worthwhile exercise in self-understanding, even if the end result won’t be a robustly yet non-metaphysically real morality.

(Thoughts prompted by reading Levine et al’s Resource-rational contractualism: A triple theory of moral cognition)

February 14, 2024

Page 1 of 9 Older →