Human Compatible: Artificial Intelligence and the Problem of Control

Name: Human Compatible: Artificial Intelligence and the Problem of Control
Author: Stuart Russell

by Stuart Russell

Recommended by

"I’ve been thinking a lot about AI and working with computer science colleagues to try and understand the implications for public policy, so I’ve read a lot of AI books. Actually, just as we were saying about economics books, there’s a whole clutch of AI books because, again, the interest is there. People want to understand what’s going on. Human Compatible is a really clearly written one. It explains enough about how AI works, but also what some of the challenges are. The particular challenge the book focuses on is how to program AI systems so they do what we really want them to do, rather than just what we write down in the code that they then implement. It’s actually very difficult. Think about the effects of setting targets for public services, and how easily they got gamed. When you give a hospital a minimum amount of time before they admit people, they will do things like park patients in the waiting room or on trolleys so that the clock doesn’t start ticking too quickly. Or if you give an ambulance service a certain amount of time to get to patients, they will game it so they get there in time to meet their targets. It’s the same with AI. If you set them an explicit objective—which you have to do because how else are you going to get them to do something—how do you stop them gaming things in that way and delivering something that you don’t really want? Get the weekly Five Books newsletter One example is image recognition. AIs are potentially really good at recognizing tumours and rogue cells, but they’re often trained on images of tumours where doctors have put little rulers in to show how big the tumour is. Some developers found that they had trained ruler recognition systems rather than tumour recognition systems. You’ve just got to think very carefully about what objective you’re coding into the AI. So this book is about that problem, as well as a potential solution, which is making the system uncertain about what it is the humans really want. This is at a really early stage of research in the AI community, but it’s a very important question for policymakers who are thinking about using AI in public policy. Yes, he doesn’t believe they’re going to kill all the humans to produce the maximum number of paper clips, to use the famous Nick Bostrom example . He’s not a pessimist of that kind. He’s more saying that if we’re not careful, we’ll get some adverse outcomes that we don’t really like. Because some of the areas where AI systems are being used are things like criminal justice, policing and making decisions about social care. The consequences of getting it wrong on people’s lives are just huge so we shouldn’t be making mistakes."

The Best Economics Books of 2019 · fivebooks.com

Paula Boddington

"Stuart Russell is in a different grand tradition of worrying about AI, where a superintelligence takes us over and turns us all into fodder or whatever it wants. Elsewhere on Five Books , someone recommended Nick Bostrom’s Superintelligence . Bostrom was concerned with what’s become known as ‘perverse instantiation.’ It’s the idea that we get a machine to do something and it does something that we didn’t intend. It’s like The Sorcerer’s Apprentice , with a machine following instructions to the letter, and you end up in a terrible situation. Bostrom famously envisaged a superintelligence that we gave instructions to make paperclips, that would just carry on trying to make as many as possible and eventually turn us all into paperclips. Stuart Russell is a leading person in AI and he really knows what he’s talking about. He’s turning his attention to how we might engineer AI so that it doesn’t end up turning us all into paper clips. He’s building on Bostrom and giving a different answer. What I like about Stuart Russell’s book is that it’s terribly clear. I actually disagree with many of the philosophical suppositions that he makes, but he spells them out incredibly clearly. There’s also a really good reason for reading books that you might strongly disagree with, because it can help to further debate. You can work out why you disagree. One of the big dangers of AI—and you can see this with things like Chat GPT —is that it makes us lazy or controls us in too many ways. We really need to stress our human ability to think for ourselves. Disagreement is really important in ethics. It’s unlikely we’re ever going to completely agree, but if we can understand the basis of disagreement, then that’s the best way forward, really, for maintaining our humanity. There are some people working in AI who are not too bothered by the prospect that AI might eradicate humans, because they don’t think that humans are the be-all and end-all of value. They think that it might be better to be transhumanist or post-humanist or that there could be more value in the world if we were somehow all replaced by machines. Stuart Russell is on the side of humans. He’s trying to work out how we can develop AI to make certain that it’s always going to be at least roughly aligned with human values, so that humans are going to be kept at least reasonably happy with whatever it is that the AI is doing. I’m with Stuart Russell, there, I’m afraid: I’m on the side of humans. His solution is to try to engineer AI so that it’s always like a tool, it’s always a machine for us. It’s always got to keep an eye on human preferences and also not assume it knows what human preferences are. We have to make it always doubt that it’s got it right, to keep checking in with humans about what it is they actually want and that it’s really understood what it is they’re trying to do. In The Sorcerer’s Apprentice , it would be as if the broom stopped and said, ‘Are you sure you want me to do this?’ And we would say, ‘No! Can you reprogram?’ That’s generally a good approach. I disagree with him for a number of reasons. One is the problem of how the AI is going to be checking back on what our preferences are. He suggests that the AI could observe human behavior and extrapolate from our behavior what our preferences are and what we’re trying to do. But it’s very difficult to extrapolate motivation and intention. Human beings often do things that are irrational. We’re not quite doing what it is we really want to do. Philosophers have grappled with this. You could, perhaps, look at second-order preferences. For example, you might reach for another doughnut but your meta preference might actually be not to eat it and still fit into your dress. Nonetheless, there’s a huge amount of complexity in how you would count a preference as a higher preference, and how we would extrapolate real human motivation and value from our behavior. Stuart Russell also feels that value is subjective, that it comes entirely from the preferences of humans. I’m not really sure that that’s correct, either, He’s following a very dominant strain of thought within philosophy, the idea that value stems from wishes and desires and preferences. That’s a really common stream of thought that I happen to disagree with. In terms of the solution he offers, that AI will be able to extrapolate human preferences from behavior, that’s a real problem. Maybe that’s somewhere we could bring in wisdom from social scientists, for example, or novelists or psychologists. That’s where they can say, ‘Well, actually, this is a bit simplistic.’ So there are some problems with this book. I’m glad he’s trying, though. The book is very, very readable and well worth reading. The textbook grew out of some teaching I’ve been doing on a Master’s course, on philosophy and AI. The audience I’d envisaged was students who were maybe working in computing or engineering, but I’ve tried to make it as broad as possible. There are now lots of arts and social science courses where people are studying elements of ethical issues in AI. As I’ve mentioned, one of the things we need to encourage is interdisciplinary conversations. I also tried to write it clearly and accessibly so that any interested members of the public could just read it on their own. The emphasis really is on trying to open up dialogues, to show a range of different views and try to encourage readers that they’ve got things they can bring to the debate. There is a huge range of questions to look at that I try to cover in the book. They’re interlinked with each other, so I try to cross-reference. For example, in the section where I’m talking about Stuart Russell’s book, I look back at how philosophy of mind and psychology have understood the human mind and the problems in philosophical and psychological behaviorism that we just discussed. I’ve tried to be as open-minded as possible, but obviously my own views are going to be in there. One of the things that’s so interesting about what we’re facing now, with AI and related technologies, is that it makes us ask deep questions about human nature. So I’ve included different ways of approaching what it is to be a human being, of understanding what intelligence is, why we value it, and how that relates to the questions we’re looking at. Yes, the book tries to be an introduction to ethics. I’ve tried to make it so that you can skim through summaries of each section. I’ve tried to make it so you can navigate around different parts of the book, as easily as possible. I included exercises. Some of the exercises are imaginative things, just to get people thinking and can be done on your own. Some can be done as class exercises. It’s about getting people to realize that everybody’s got something they can contribute to the debate because it affects everybody. [End of our 2023 update. The original December 2017 interview appears below] ___________________________ Well, that’s a good starting point because there are different sorts of questions that are being asked in the books I’m looking at. One of the kinds of questions we’re asking is: what sorts of ethical issues is artificial intelligence , AI, going to bring? AI encompasses so many different applications that it could raise a really wide variety of different questions. For instance, what’s going to happen to the workforce if AI makes lots of people redundant? That raises ethical issues because it affects people’s well-being and employment. There are also questions about whether we somehow need to build ethics into the sorts of decisions that AI devices are making on our behalf, especially as AI becomes more autonomous and more powerful. For example, one question that is debated a lot at the moment is: what sorts of decisions should be programmed into autonomous vehicles? If they come to a decision where they’re going to have to crash one way or the other, or kill somebody, or kill the driver, what sort of ethics might go into that? But there are also ethical questions about AI in medicine. For instance, there’s already work developing virtual psychological therapies using AI such as cognitive behavioural therapy. This might be useful since it seems people may sometimes open up more freely online. But, obviously, there are going to be ethical issues in how you’re going to respond to someone saying that they’re going to kill themselves, or something along those lines. There are various ethical issues about how you program that in. “AI is pushing us to the limits of various questions about what it is to be a human in the world” I suppose work in AI can be divided into whether you’re talking about the sorts of issues which we’re facing now or in the very near future. The issues we are facing now concern ‘narrow’ AI which is focused on particular tasks. But there is also speculative work about whether we might develop an artificial general intelligence or, even then, going on from that to a superintelligence. But if we’re looking at an artificial general intelligence which would be mimicking human intelligence in general, depending on whether we’re retaining control of it or even if we are not retaining control of it, lots of people are arguing that we need to build in some kind of ethics or some way to make certain that the AI isn’t going to do something like turn back and decide to rebel – the sort of thing that happens in many of the Isaac Asimov robot stories. So there are many ethical questions that arise from AI, both as it is now and as it will be in the future. But there is also a different range of questions raised by AI because AI is, in many ways, pushing us to the limits of various questions about what it is to be a human in the world. Some of the ethical questions in AI are precisely about how we think of ourselves in the world. To give an example of that, if you can imagine some science fiction future where you’ve got robots doing everything for us, people are also talking about how you can make robots not just to do mundane jobs and some quite complex jobs, but also creative tasks. If you live in a world where the robots are doing all the creative tasks, for instance if you have robots who are writing music that is better than a human could write, or at least as good as a human could write, it just raises fundamental questions of why on earth we are here. Why are all those youngsters in garage bands thrashing out their not tremendously good riffs? Why are they doing that if a robot or a machine could do it better? Working in this area is really interesting because it pushes us to ask those kinds of questions. Questions like: what is the nature of human agency? How do we relate to other people? How do we even think of ourselves? So, there are lots of deep ethical issues that come up in this area. In the last two or three years, a number of prominent individuals have voiced concerns about the need to try to ensure that AI develops in ways that are beneficial. The Future of Life Institute, based in the USA, has a programme of grants, funded by Elon Musk and the Open Philanthropy project, given to 35 projects working on different questions related to the issue of developing beneficial AI. I’ve been working on a project examining the groundwork to how we might develop codes of ethics for AI, and what role such codes might have. I’ve got a book on this topic due out soon . First of all, I’ll say that I’ve chosen five books which I thought, as a package, would give quite a good introduction to the range of issues in AI as there’s a big range of issues and quite a wide range of approaches. Heartificial Intelligence , my first choice, gives a general overview of the issues which we’re presented with. Havens is a really interesting writer. He was formerly an actor and he’s worked a lot in tech journalism, so he knows a lot about tech. He’s also one of the people who is currently leading the IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems . So, he’s really got his finger on the pulse about what the technological developments are and how people are thinking about it in ethics. He wrote this book a couple of years ago, developing it from an article he’d written for Mashable about ethics in AI. There are several things I really like about it, one of them is that it covers a broad range of issues. He’s also focussing very much on things which are happening now, or else are not in the very distant future but that you could envisage being realised perhaps within our lifetimes. For instance, he asks how we might relate to robots that are in the house. Technicians are already developing robots that can keep an eye on people or do various domestic tasks, so it’s just extrapolating from that. What he does in the book is start off each chapter with a fictional scenario which is quite well extrapolated and then discusses it. I think one of the things that we do need, in looking at AI, is precisely science fiction to engage our imaginations. Havens does that in quite a lot of detail. He doesn’t then just leave it as fiction but goes on to discuss its implications. For one thing, we don’t know what the facts are. Secondly, we need to think about how we might feel about them because one of the things that AI is doing is changing how we interact with the world and changing how we interact with other people. So, there are as yet no facts out there about this for most of the scenarios he discusses. It is a question of how we feel running through these different situations. His first scenario is quite a good illustration of that. He imagines himself as a father a little way into the future and he’s a bit nervous because his daughter is going out on her first date. And it turns out that her first date is with a robot. So, instead of being a bit nervous and thinking you’re not quite sure that you like the boy with a reputation from the wrong side of the tracks, instead, his daughter is on a date with a robot. He’s looking about how we might feel about our prejudices about going out with a robot, and what the advantages and disadvantages of going out with a robot might be. You can imagine yourself in that situation. It does seem a bit far fetched that somebody would want to do that, but then, on the other hand, thinking about it now and knowing how many teenage boys behave, you might think that a lot of teenage girls might prefer to go out with a robot. He might be programmed not to date rape them, for instance… Yes. He’s got an example about a home robot vacuum cleaner. The example works nicely because it’s about how different forces can work towards a bad outcome. It’s not just the AI itself, but poor communication, pressures from online reviews of products, making decisions too quickly, and also how AI is coming into a world where we’re already steeped in technology. Commercial pressure from bad Amazon reviews leads to a change in a robot vacuum cleaner’s algorithm which leads to one such cleaner unplugging the power source for a baby monitor in order to retain its own power – and a baby chokes alone and unheard. The baby monitor is not AI, but this shows we need to think about how AI is nested within dependence on technology in general. In the olden days, like when I had my babies, I was a complete nervous wreck of a mother and wouldn’t go more than two feet away from them. But if you were relying on technology, you might then become more relaxed. The book is really good at looking at exploring the minutiae of our interactions with technology. I think it’s so important to think about how we might go step by step by step into situations that we wouldn’t have wanted to go into when viewed from the outside. It’s really good at looking at how AI might affect how we view ourselves and our notions of responsibility. “If you’re interacting with a robot who is programmed to be nice to you, it might stop us maturing” Havens looks really closely at how we might become reliant on robots, especially if they’re programmed to interact with us in a certain way. If you interact with another human being, they might or might not be nice back to you. It’s a bit hit and miss, as you may have found. There’s always that element of uncertainty. But if you’re interacting with a robot who is programmed in a certain way, it might stop us maturing if you’re over-dependent on them – we assume they’re more or less infallible. So, he’s considering that. He’s also considering what he calls the possible loss of alterity if we’re reliant too much on machines. If you’re interacting with another human being, there’s the idea that the other human being is always to some extent an unknown – they’ve always got their own consciousness, and motivations. Maybe far into the future AI might also be similarly developed but we’re nowhere near that kind of situation yet. But, in between getting there, we might have robots that are very human-like performing particular tasks, and we really need to look about how that might significantly affect how we interact with the world. Yes. That’s precisely the sort of thing that Havens and other people are concerned about because we need to look at whether or not we’re losing skills. Yes. So, another thing that I like about the book is that Havens is not at all anti-tech. He’s actually quite keen on it. Among the things he wants to do is to avoid polarisation. That’s something else that stories can help with because they can be pretty nuanced. Rather than saying ‘this is terrible’ which a lot of people are doing, or ‘this is going to be brilliant,’ which a lot of other people are doing as well, it’s a matter of looking at the subtleties of how we might be able to use tech for our own advantage, to use it so that it is truly beneficial. Of course, you can have pluses and minuses. Another way of putting that is to say that you might have a gain from tech, but if what you’ve lost is of equal value, then you haven’t really moved anywhere. You have just made something different, without being in a better position. There’s something else quite important about this book. John Havens is also very interested in psychology. His father was a psychiatrist, and has always influenced him. The book is influenced by positive psychology, and he’s interested in how we might use AI in line with our values and how we might check whether the AI we’re using is against our own values or not. Not everybody will necessarily buy the particular details of the positive psychology that underpins his stance, but I think it’s a really interesting way of looking at the issues. He’s looking at the grounded details about how we might think that AI is beneficial or not, and he’s linking that to how we might measure and think about human happiness and human welfare and wellbeing. I think that is incredibly important. Thinking in ethical terms, one of the attractions of something like utilitarianism or consequentialism in looking at something that’s new and that we don’t really understand very much is that you can then look at the harms and the benefits and add them up, rather than having preconceptions about whether doing something in one way is right or wrong. But if AI is changing how we’re thinking about ourselves in relating to other people, how on earth do we even count or measure what the harms and benefits are? So, you’ve got to go and look at the detail, at what effects and impacts it’s having on us. A psychological level is a very good place to start."

Ethics for Artificial Intelligence Books · fivebooks.com

Tom Chatfield

"Yes, this is a book by a computer scientist who has thought deeply about the history and future trajectory of artificial intelligence , and who is very literate in the cutting edge discourse around it. He is no Luddite, in the pejorative sense, and in fact embraces many consequentialist ideas because they are extremely fit for some purposes: if you are talking about a system, you do need to take a close pragmatic interest in its inputs and outputs. One of the reasons I find this book important is that it’s very clearly written. It’s not a long or a difficult book. Russell zeroes in on the significance of doubt, and this is a fundamental point that I think a lot of people miss in the field of computer science, which is simply that any kind of real-world problem (what should we do? what might we do? what would be the best outcome?) is computationally intractable. In other words, there will always be some uncertainty. There is such a thing as a perfect game of chess. It’s very difficult to compute, but there is such a thing. A rules-based, deterministic game can be optimised. But human life can’t be optimised. It can be improved, but it can’t be optimised. ‘What should I have for lunch?’ is not a question that has an optimal answer. Given all the calculation time in the universe, you still couldn’t calculate the perfect, super-optimal lunch. Of course, we all know the world is probabilistic. Yet, when it comes to computation, a lot of people seem to forget this and start looking for ‘the answer’ when it comes to complex social challenges, the future of AI, and so on. Russell emphasises the significance of the fact that there will always be a plurality of different priorities, and you’ll never be able to come up with the priority. Then, crucially, he talks in a practical way about the importance of building doubt into machines. AIs, he argues, should be doubtful to the extent that we are about doubtful about what goal is worth pursuing. And they should also be willing to be switched off. The people who construct them should prioritise the existence of an off switch, and a process of constructive doubt, over trying to optimise technology towards a supreme, transcendent final goal for humanity. Human Compatible is a humane book about the challenges and opportunities of AI. It’s also a very non-hype book. By which I mean, it talks about the trajectory of artificial intelligence, the enormous potentials of the pattern recognition on a vast scale that it offers, without indulging fantasies. AI has a great potential to do good and to help us solve problems, but the point is not that it or anybody will ever know best, but rather that we should ensure the values and tendencies encoded into powerful systems are compatible with human thriving, and the thriving of life on this planet. And that compatibility must in turn entail doubt, plurality, and an open interrogation of aims and purposes, not optimisation. The thing I worry about, perhaps more than Russell, is that some technologists seem determined to reason backwards from a hypothetical, imagined future. There’s the so-called ‘singularity,’ the point beyond which computers become self-improving, and potentially solve all human problems. So the only thing that matters is getting to the singularity and having good rather than bad super machines. Then, at the point of singularity or beyond it, death will be solved, the losses of our ecosystem will be redressed, there can be an infinite number of people living infinitely magnificent lives, and so on. Or, if we do things wrong, everyone will be in computer hell. The problem with all this is that you’re reasoning backwards from a fixed conclusion you’ve arrived at through hand-waving and unacknowledged emotion. You’re engaging in metaphysical, or even eschatological speculation, while insisting that it’s all perfectly logical and evidence-based. It’s really important, I think, to resist focusing on imaginary problems or treating hypotheticals with high degrees of certainty. This is precisely the opposite of what we need to do, which is focusing on real problems and opportunities, while preserving uncertainty and constructive doubt: while putting actual ideas and theories to meaningful, scientific tests."

The Ethics of Technology · fivebooks.com