Quantifying the effects of environment and population diversity in multi-agent reinforcement learning

Generalization is a major challenge for multi-agent reinforcement learning. How well does an agent perform when placed in novel environments and in interactions with new co-players? In this paper, we investigate and quantify the relationship between generalization and diversity in the multi-agent domain. Across the range of multi-agent environments considered here, procedurally generating training levels significantly improves agent performance on held-out levels. However, agent performance on the specific levels used in training sometimes declines as a result. To better understand the effects of co-player variation, our experiments introduce a new environment-agnostic measure of behavioral diversity. Results demonstrate that population size and intrinsic motivation are both effective methods of generating greater population diversity. In turn, training with a diverse set of co-players strengthens agent performance in some (but not all) cases.

Kevin R. McKee, Joel Z. Leibo, Charlie Beattie & Richard Everett

September 12, 2023

Large language models show human-like content biases in transmission chain experiments

As the use of Large Language Models (LLMs) grows, it is important to examine if they exhibit biases in their output. Research in Cultural Evolution, using transmission chain experiments, demonstrates that humans have biases to attend to, remember, and transmit some types of content over others. Here, in five pre-registered experiments with the same methodology, we find that the LLM chatGPT-3 shows biases analogous to humans for content that is gender-stereotype consistent, social, negative, threat-related, and biologically counterintuitive, over other content. The presence of these biases in LLM output suggests that such content is widespread in its training data, and could have consequential downstream effects, by magnifying pre-existing human tendencies for cognitively appealing, and not necessarily informative, or valuable, content

Alberto Acerbi & Joseph Stubbersfield

September 11, 2023

Two Types of Aggression in Human Evolution

Two major types of aggression, proactive and reactive, are associated with contrasting expression, eliciting factors, neural pathways, development, and function. The distinction is useful for understanding the nature and evolution of human aggression. Compared with many primates, humans have a high propensity for proactive aggression, a trait shared with chimpanzees but not bonobos. By contrast, humans have a low propensity for reactive aggression compared with chimpanzees, and in this respect humans are more bonobo-like. The bimodal classification of human aggression helps solve two important puzzles. First, a long-standing debate about the significance of aggression in human nature is misconceived, because both positions are partly correct. The Hobbes–Huxley position rightly recognizes the high potential for proactive violence, while the Rousseau–Kropotkin position correctly notes the low frequency of reactive aggression. Second, the occurrence of two major types of human aggression solves the execution paradox, concerned with the hypothesized effects of capital punishment on self-domestication in the Pleistocene. The puzzle is that the propensity for aggressive behavior was supposedly reduced as a result of being selected against by capital punishment, but capital punishment is itself an aggressive behavior. Since the aggression used by executioners is proactive, the execution paradox is solved to the extent that the aggressive behavior of which victims were accused was frequently reactive, as has been reported. Both types of killing are important in humans, although proactive killing appears to be typically more frequent in war. The biology of proactive aggression is less well known and merits increased attention.

Richard Wrangham

September 8, 2023

Property rights and long-run economic growth

Quantitative analysis of the model assigns a major role to changes in property rights in explaining growth over the very long run. As one example, the number of new ideas produced in a year rises by a factor of 110,000 in the simulated economy between 25,000 B.C. and the 20th century. A factor of 108 of this increase is due to the fact that the 20th century has a larger population base from which inventors are drawn; a factor of 4 of this increase is attributed to knowledge spillovers, i.e. to the notion that it is easier to produce ideas today because of discoveries made in the past. The remaining factor of 245 is assigned to an increase in the property rights variable, the fraction of resources used to compensate inventive effort.

Charles Jones

September 8, 2023

Why arms control is so rare

Arming is puzzling for the same reason war is: it produces outcomes that could instead be realized through negotiation, without the costly diversion of resources arming entails. Despite this, arms control is exceedingly rare historically, so that arming is ubiquitous and its costs to humanity are large. We develop and test a theory that explains why arming is so common and its control so rare. The main impediment to arms control is the need for monitoring that renders a state’s arming transparent enough to assure its compliance but not so much as to threaten its security. We present evidence that this trade-off has undermined arms control in three diverse contexts: Iraq’s weapons programs after the Gulf War, great power competition in arms in the interwar period, and superpower military rivalry during the Cold War. These arms races account for almost 40% of all global arming in the past two centuries.

Coe and Vaynman (2019)

September 7, 2023

Preparing for the (non-existent?) future of work

We analyze how to set up institutions that future-proof our society for a scenario of ever-more-intelligent autonomous machines that substitute for human labor and drive down wages. We lay out three concerns arising from such a scenario, culminating in the economic redundancy of labor, and evaluate recent predictions and objections to these concerns. Then we analyze how to allocate work and income if these concerns start to materialize. As the income produced by autonomous machines rises and the value of labor declines, we find that it is optimal to phase out work, beginning with workers who have low labor productivity and job satisfaction, since they have comparative advantage in enjoying leisure. This is in stark contrast to welfare systems that force individuals with low labor productivity to work. If there are significant wage declines, avoiding mass misery will require other ways of distributing income than labor markets, whether via sufficiently well-distributed capital ownership or via benefits. Recipients could still engage in work for its own sake if they enjoy work amenities such as structure, purpose, and meaning. If work gives rise to positive externalities such as social connections or political stability, or if individuals undervalue the benefits of work because of internalities, then there is a role for public policy to encourage work. However, we conjecture that in the long run, it would be more desirable for society to develop alternative ways of providing these benefits.

Anton Korinek and Megan Juelfs

September 7, 2023