Getting Started with AI-Alignment
Or at least what I know about it
While I’m not personally working on AI-alignment, I think it’s likely pretty important, and want to help increase the knowledge-base around it and access-to-information about it. This post is a short resource list based on personal readings and conversations I’ve had with friends interested in AI alignment.
At a high-level, what is AI-alignment
One day we will likely transfer control of important applications. If you can imagine that these AI could be as intelligent as human beings, but lack the cultural, ethical, moral, or biological constraints that other humans have, it’s fairly easy to imagine that these AI’s goals may be drastically different from our own. AI-alignment is the challenge of trying to align the goals of AI agents with the goals of humans. In particular, this challenge focuses on doing so while we still have a moderate understanding of how they work, and good control over what they work on.
A quick example of an AI-alignment challenge
The most common introduction to problems in AI alignment is the paperclip maximizer. This relates to Instrumental Convergence, where an intelligent actor is given a straightforward goal, but may act in surprising or harmful ways.
Instrumental convergence posits that an intelligent agent with unbounded but apparently harmless goals can act in surprisingly harmful ways. For example, a computer with the sole, unconstrained goal of solving an incredibly difficult mathematics problem like the Riemann hypothesis could attempt to turn the entire Earth into one giant computer in an effort to increase its computational power so that it can succeed in its calculations - Wikipedia
Hopefully this simple example of a challenge in AI alignment sparks your interest a bit. From here on, all I can do is point you to a bunch of other resources on AI-alignment which do a much better job discussing it.
The most accessible resource:
The biggest collection of other resources:
Resources if you are starting from scratch:
AI alignment, why it’s hard and where to start (by MIRI), and the corresponding youtube link.
Concrete Problems in AI Safety. (This is an academic paper circa 2016, so it may not be a good place to start, and is likely out-of-date).
Links to Dr. Paul Christiano (an important person in the field):
Interview with Dr. Paul Christiano about AI alignment work at OpenAI (circa 2018). Interview on 80,000 hours podcast.