Getting Started with AI-Alignment

Or at least what I know about it

Aug 19, 2021

While I’m not personally working on AI-alignment, I think it’s likely pretty important, and want to help increase the knowledge-base around it and access-to-information about it. This post is a short resource list based on personal readings and conversations I’ve had with friends interested in AI alignment.

Image generated by Replicate’s Stable Diffusion model with the prompt “artificial intelligence - blue computers”.

At a high-level, what is AI-alignment

One day we will likely transfer control of important applications. If you can imagine that these AI could be as intelligent as human beings, but lack the cultural, ethical, moral, or biological constraints that other humans have, it’s fairly easy to imagine that these AI’s goals may be drastically different from our own. AI-alignment is the challenge of trying to align the goals of AI agents with the goals of humans. In particular, this challenge focuses on doing so while we still have a moderate understanding of how they work, and good control over what they work on.

A good place to learn more about AI alignment is Stuart’s book Human Compatible. Rohin Shah summarized it in the Alignment Newsletter #69.

A quick example of an AI-alignment challenge

The most common introduction to problems in AI alignment is the paperclip maximizer. This relates to Instrumental Convergence, where an intelligent actor is given a straightforward goal, but may act in surprising or harmful ways.

Instrumental convergence posits that an intelligent agent with unbounded but apparently harmless goals can act in surprisingly harmful ways. For example, a computer with the sole, unconstrained goal of solving an incredibly difficult mathematics problem like the Riemann hypothesis could attempt to turn the entire Earth into one giant computer in an effort to increase its computational power so that it can succeed in its calculations - Wikipedia

Hopefully this simple example of a challenge in AI alignment sparks your interest a bit. From here on, all I can do is point you to a bunch of other resources on AI-alignment which do a much better job discussing it.

Educational Resources

The most accessible resource:

Robert Miles’ Youtube Channel

The biggest collection of other resources:

Vika’s AI safety resources

Resources if you are starting from scratch:

AI alignment, why it’s hard and where to start (by MIRI), and the corresponding youtube link.
Concrete Problems in AI Safety. (This is an academic paper circa 2016, so it may not be a good place to start, and is likely out-of-date).

Forums:

Links to Dr. Paul Christiano (an important person in the field):

Interview with Dr. Paul Christiano about AI alignment work at OpenAI (circa 2018). Interview on 80,000 hours podcast.
Dr. Paul Christiano’s website

Some organizations working on this

Anthropic
OpenAI
Machine Intelligence Research Institute (MIRI)
Center for Human-Compatible Artificial Intelligence (CHAI)
AI Objectives
Effective Altruism AGI Educational Fellowship. (AGI is Artificial General Intelligence).
CHAI internship

Technically Private

Discussion about this post