AI Safety Research
This wiki is in very early stages and is a work in progress.
This page contains all notes on AI safety research over and above what we put in our career profile. Read the profile first, here.
- 1 Profile type
- 2 What is this career path?
- 3 Personal fit
- 4 Career capital
- 5 Exploration value
- 6 Role impact
- 7 Alternatives
- 8 Past experience
- 9 Take action
- 10 Best resources
- 11 Research process
What is this career path?
Many experts believe that there’s a significant chance we’ll create superintelligence - artificially intelligent machines with abilities surpassing those of humans across the board - sometime in the next century. Massive improvements in artificial intelligence have the potential to provide huge benefits for humanity - but also have huge risks. AI safety research is a growing paradigm within the field of AI that focuses on increasing the chance that, if and when superintelligence arrives, it won’t pose a threat to humanity’s existence. AI safety research is a broad, interdisciplinary field - covering technical aspects of how to actually create safe AI systems, as well as broader strategic, ethical and policy issues (more on the different types of AI safety research below.)
There’s increasing consensus from AI experts and notable figures - including Elon Musk, Stuart Russell, Stephen Hawking, and Bill Gates - that AI may pose one of the most serious threats to humanity. But at present, there are very few researchers working explicitly on AI safety, compared to tens of thousands of computer science researchers working on making machines more powerful and autonomous. This means that an additional researcher with the right skills and motivation right now has the potential to make a huge difference. It also seems like a particularly good time to get involved, as there seems to be growing concern about and support for AI safety research - partly owing to the success of Nick Bostrom’s book Superintelligence. In particular, funding for research seems to be growing at the moment - Elon Musk recently made a $10 million grant to fund research projects on AI safety, the Open Philanthropy Project have expressed interest in it as a cause area.
We won’t argue in depth for the importance of the AI cause here, so this profile will be most relevant to people who are already convinced of the importance of this cause but aren’t sure whether they can contribute, or how to do so. If you want to find out more about the risks of artificial intelligence, we recommend reading Superintelligence and/or this popular introduction to concerns about AI.
What are the major sub-options within this path?
Different types of AI safety research
People in the field often mean different things by “AI safety research” but here are some of the key areas.
- Strategic research: Includes figuring out what questions other AI safety researchers should be focusing on, learning from historical precedents such as threats from nuclear and other dangerous tech, and thinking about possible policy responses. This is likely to the most accessible to someone with a less technical background, but you still need to have a decent understanding of the technical issues involved. Strategy research might be well-suited to someone with expertise in policy or economics who can also understand the technical issues.
- Superintelligence theory/Forecasting: Understanding how superintelligence might come about, what kinds of scenarios are possible, and what we should expect when superintelligence arrives. This consists largely of theoretical computer science work.
- Technical work and AI safety engineering: Asking the question, “If we were going to build an AI, how would we make it safe?”, and then figuring out how we might implement different solutions. Work here is also highly technical, involving computer science, machine learning, and some logic. For both this and superintelligence theory, some philosophical competence is also useful.
- Within technical work, it may also be useful to distinguish two further classes of work. "Class 1" technical research focuses on trying to find practical ways to implement things we already know how to do at least in principle. "Class 2" technical research focuses on trying to figure out how to do, even in principle, things we have no idea how to do - things like giving an AI goals that won't result in perverse instantiation.
Some specific questions and topics that experts have suggested might be promising to work on:
- Nick Bostrom (from conversation notes with GiveWell):
- Research outlined in MIRI’s technical research agenda
- Paul Christiano’s recent work on the structure of approval-directed agents
- Topics within mainstream computer science, including inverse reinforcement learning, studying how concepts generalize, studying how diff algorithms would behave if they were run on systems with no constraints with computing power
- Some questions that may be easier to work on in academia but seem potentially more promising for safety than AI more broadly:
- Improving our ability to understand the behavior of complex machine learning systems.
- Developing general principles of rational behavior (self-referential reasoning, decision theory, logically bounded probabilistic reasoning...) so that we can build AI systems that operate according to principles we understand rather than principles discovered by brute force search.
- Improving our ability to infer the preferences of users and obtain a useful representation.
- Facilitating formal verification of software in general, or machine learning software in particular.
The broad consensus from experts we spoke to is that all different types of research are important, and we need people doing all of them - as well as looking for new avenues we might not have even thought about yet. So if you’re interested in doing AI safety research, it’s probably best to choose between these areas based on pragmatic considerations like your personal fit, interest, and what seems like an available option for you.
Different career paths in AI safety research
There are three broad categories of specific career paths:
- Working in an AI lab in academia
- Good for keeping other options open and for general career capital - allows you to develop prestige and a good network
- The main downside is that you may be more limited in what you can work on, unless you’re able to find a lab with a lot of flexibility and interest in AI safety. That said, as funding for AI safety research increases, more opportunities to do AI safety research in academic settings may arise.
- Whether academia is a good fit may depend a lot on your career goals and preferred working style - if you like making incremental progress on tractable problems rather than trying to approach huge issues from the ‘top down’, then academia might be a good fit.
- Give you a lot more flexibility on what you work on, and the ability to work on problems from the ‘top down’ - i.e. trying to generate solutions to the largest, most pressing problems.
- A number of people also believe that this is where the most pressing talent bottleneck is.
- The main downside of these options is that they may provide less flexible career capital, due to being less widely recognised.
- Some people we spoke to raised the concern that going into industry might not be a great idea if you end up simply adding to the pool of researchers working on making machines more powerful - rather than working on making them safer. The extent of this will depend partly on how safety-concerned the industry you’re working on is - DeepMind, for example, seems to have a large number of people significantly concerned about AI safety. DeepMind is also the leader in AI research, so we especially recommend working there.
- Since industry is where a lot of the AI developments will come from, it seems especially important to have people who are concerned about safety working there. It also seems valuable to have strong connections and lines of communication between those working on increasing the capabilities of AI and those working on safety.
Again, the general consensus seems to be that these options are on a fairly level playing field and so you are probably best off choosing based on what you’re personally best suited for and most excited about.
What is it like day-to-day?
What are the people like?
Do you need a PhD?
There was some disagreement among people we spoke to about this question.
Many people we spoke to said that getting a PhD in a relevant field - probably computer science - is generally a good idea. Getting a PhD has a lot of benefits, including allowing you to develop an academic network, learn generally useful skills in computer science, and get experience doing research. If you’re not totally sure whether you want to do AI safety research or something else, a PhD in computer science also allows you to keep other options open in academia and industry. See our career profile on computer science PhDs for more. There are also some cognitive science labs in the US that have some focus on AI research, which might be a good option if you don’t have a CS PhD - though you’d probably still need some sort of math or CS background. (Tenenbaum at MIT, Goodman at Stanford, Griffiths at Berkeley)
That said, most of the people we spoke to also agreed that a PhD isn’t totally necessary for this kind of research - and the main downside of doing a PhD is that it can take a long time (3-4 years in the UK, 5-7 in the US.) A few people were concerned that for someone who can potentially contribute to AI safety research, a PhD may not be worth the cost - they would be better off learning the most directly relevant skills some other way. This seems to apply most to organisations like MIRI where the skills required to do their research aren't necessarily those you would learn in a CS PhD program.
Circumstances under which you might not want to get a PhD:
- You’re already in a position to contribute directly to AI safety research - especially if an organisation working on AI safety is interested in hiring you. There seems to be a talent bottleneck in AI safety right now, and since it’s such a pressing issue, early efforts could be disproportionately valuable.
- You’re not particularly intrinsically motivated by the idea of doing a computer science PhD (though this might mean you should check whether you’re going to be motivated by AI safety research, too!)
- You can’t find an advisor who will support you in either developing general CS skills, or working directly on something relevant to AI safety.
- You think you’re in a particularly good position to learn and do research that’s directly relevant to AI safety on your own. We’d be wary of this, though: only do this if you have a community you can stay connected with, collaborate with, get feedback from, and you know you can be self-motivated.
Additional thoughts on PhDs:
- Seems pretty useful for credentialization but not entirely necessary if you’re doing relevant research and publishing papers anyway.
- A research output but no PhD might only be a modest disadvantage
- Would need to be super exceptional
- Deep Mind would be most accepting of non-PhD maybe, or AI startups
Who should especially consider this option?
There’s a common belief we’ve come across that you need to be some kind of super-genius to even consider doing AI safety research. Our recent conversations with experts suggest this is misleading - you may need to have exceptional technical ability for some parts of AI safety research, but the number of people for whom AI safety research is worth exploring in general is much broader.
It’s probably worth at least considering AI safety research if you fit most of the following:
- You’re highly interested in and motivated by the issues. Even if you’re a math prodigy, if you can’t bring yourself to read Superintelligence, it’s unlikely to be a good fit. A good way to test this is simply to try reading some of the relevant books and papers (more on this below.)
- It’s also worth being aware that this kind of research has less clear feedback than more applied work, and less of an established community to judge your progress than other academic work. This means you’re likely to face more uncertainty about whether you’re making progress, and you may face scepticism from people outside the community about the value of your work. It’s therefore worth bearing this in mind when thinking about whether this is something you’ll be able to work productively on for an extended time period.
One message we got from a number of people is that if you’re very interested in AI safety research and not sure if you’d be able to contribute, the best thing to do is just to dive in and explore the area more. The best way to find out if you’re going to be able to contribute is just to try. More on how to do this precisely below.
See our career profiles for more information on these options.
- Academia (if you get a PhD)
- Software engineering
- Tech startups
- Quantitative finance for earning to give (if you’ve developed relevant technical skills)
Direct impact potential
If you’re interested but not sure how you can contribute, the best way to start is just to begin exploring.
- Read lots
- Bostrom’s Superintelligence
- Relevant research papers: those on Bostrom’s website, those put out by FHI and MIRI, and perhaps also academic papers in AI and machine learning
- MIRI’s research guide and associated reading list
- Blogs: MIRI’s agentfoundations, Paul Christiano’s blog, some parts of LessWrong, others...?
- Papers and books in broader fields that are likely to be relevant: especially philosophy, cognitive science and economics
- Email the authors of papers with questions - most researchers are very willing to engage with well thought-out questions!
- Comment on blogs online and engage in online discussions
- Reach out to anyone in your network/community who you might be able to discuss these ideas with.
- Consider going to a MIRIx workshop if there’s one nearby
- Look for questions that capture you, things you disagree with, or places you think something is missing
- Pick an open problem and see if you can make any progress on it
- Google’s DeepMind sometimes offer internships.
- Organisations like MIRI and FHI tend not to offer internships yet, but are often happy to have talented researchers interested in AI safety visit their offices and spend time talking to them.
For this research, we spoke to a number of experts in AI safety research, including Daniel Dewey, Andrew Snyder-Beattie and Owain Evans of the Future of Humanity Institute, Nate Soares, Executive Director of the Machine Intelligence Research Institute, a source at an AI research company.