What is an “AGI fire alarm”?

4 min read

Suggest changes in Google Docs

An “AGI fire alarm” is an event signaling that AGI may be imminent. The concept is related to that of a “warning shot”, but distinct in that “fire alarms”:

Relate to AI attaining high levels of capability, rather than directly to AI making dangerous decisions. “Fire alarms”, unlike “warning shots”, don’t typically involve damage or AI misbehavior.
Are about social psychology, at least as introduced by Eliezer Yudkowsky’s 2017 article that coined the term. The article, which cites Latane and Darley’s 1968 experiment on how people in groups react to smoke, argues that sometimes, in the presence of metaphorical “smoke”, people fail to react for fear of looking foolish, and tell themselves that the evidence isn’t strong enough, until a “fire alarm” creates common knowledge about a metaphorical “fire”. Later uses of the term have sometimes left out the social psychology aspect.

Whereas a typical example of a “warning shot” might be an AI trying to break out of a lab, a typical example of a “fire alarm” might be an AI winning a gold medal in an International Math Olympiad — if it were widely agreed that, once AI systems achieved IMO gold^[1], one could reasonably expect them to do general human-level reasoning soon after. In the past, many have proposed that AI's ability to outperform humans in particular complex tasks — such as playing chess^[2], playing Go, passing a Turing test, creating art, or driving cars — would indicate that AGI is imminent. However, there was never widespread agreement on any indicator, and these benchmarks have now been achieved without having widely been perceived as signaling imminent AGI.^[3] The thesis of Yudkowsky’s article is that we don’t agree on any such symbolic threshold, and probably won’t until it’s too late.

Some people saw the release of ChatGPT as having some of the characteristics of an “AGI fire alarm” — it raised public awareness of AI and its capabilities, which was followed by a significant increase in focus on AI safety.

Katja Grace presents an alternate view, noting that literal fire alarms are often ignored, arguing that their main function is not to create common knowledge, and exploring alternative mechanisms by which they might work. She concludes that metaphorical “fire alarms” aren’t very common or important, but that the problem of people becoming “groupstruck” — e.g., because they’re ashamed of being seen as fearful — is one we have many ways of addressing.

Google DeepMind created an AI that was one point away from an IMO gold medal in 2024. There is no consensus that once AI gets one more point on the IMO that AGI will be imminent. ↩︎
Edgar Allan Poe wrote about The Turk and posited that no pure machine could adapt to the unpredictable complexities of chess. He was correct that The Turk was not a pure machine, but eventually his thesis that no machine could play chess was proven wrong. ↩︎
Problems considered difficult enough to require AGI are informally known as AI-complete or AI-hard. ↩︎