Categorizing Catastrophic AI Risks

I kind of like this categorization by Dan Hendrycks and coauthors:

Malicious use. Actors could intentionally harness powerful AIs to cause widespread harm. Specific risks include bioterrorism enabled by AIs that can help humans create deadly pathogens; the deliberate dissemination of uncontrolled AI agents; and the use of AI capabilities for propaganda, censorship, and surveillance. To reduce these risks, we suggest improving biosecurity, restricting access to the most dangerous AI models, and holding AI developers legally liable for damages caused by their AI systems.

AI race. Competition could pressure nations and corporations to rush the development of AIs and cede control to AI systems. Militaries might face pressure to develop autonomous weapons and use AIs for cyberwarfare, enabling a new kind of automated warfare where accidents can spiral out of control before humans have the chance to intervene. Corporations will face similar incentives to automate human labor and prioritize profits over safety, potentially leading to mass unemployment and dependence on AI systems. We also discuss how evolutionary dynamics might shape AIs in the long run. Natural selection among AIs may lead to selfish traits, and the advantages AIs have over humans could eventually lead to the displacement of humanity. To reduce risks from an AI race, we suggest implementing safety regulations, international coordination, and public control of general-purpose AIs.

Organizational risks. Organizational accidents have caused disasters including Chernobyl, Three Mile Island, and the Challenger Space Shuttle disaster. Similarly, the organizations developing and deploying advanced AIs could suffer catastrophic accidents, particularly if they do not have a strong safety culture. AIs could be accidentally leaked to the public or stolen by malicious actors. Organizations could fail to invest in safety research, lack understanding of how to reliably improve AI safety faster than general AI capabilities, or suppress internal concerns about AI risks. To reduce these risks, better organizational cultures and structures can be established, including internal and external audits, multiple layers of defense against risks, and state-of-the-art information security.

Rogue AIs. A common and serious concern is that we might lose control over AIs as they become more intelligent than we are. AIs could optimize flawed objectives to an extreme degree in a process called proxy gaming. AIs could experience goal drift as they adapt to a changing environment, similar to how people acquire and lose goals throughout their lives. In some cases, it might be instrumentally rational for AIs to become power-seeking. We also look at how and why AIs might engage in deception, appearing to be under control when they are not. These problems are more technical than the first three sources of risk. We outline some suggested research directions for advancing our understanding of how to ensure AIs are controllable.

The paper presents a decent overview of these scenarios for the uninitiated. It does a good job explaining how these things could happen, though is less convincing on whether they’re likely to.

Date

August 17, 2023

Up next

AI, Evolution, and the Benefits of Variation From Natural Selection Favors AIs Over Humans: In static environments, variation is not as useful. But in most real-world scenarios, where things