Why will we be extra wrong about AI values?

I recently discussed the unlikelihood of an AI taking off and leaving the rest of society behind. The other part I mentioned of Singularitarian concern is that powerful AIs will be programmed with the wrong values. This would be bad even if the AIs did not take over the world entirely, but just became a powerful influence. Is that likely to happen?

Don’t get confused by talk of ‘values’. When people hear this they often think an AI could fail to have values at all, or that we would need to work out how to give an AI values. ‘Values’ just means what the AI does. In the same sense your refrigerator might value making things inside it cold (or for that matter making things behind it warm). Every program you write has values in this sense. It might value outputting ‘#t’ if and only if it’s given a prime number for instance.

The fear then is that a super-AI will do something other than what we want. We are unfortunately picky, and most things other than what we want, we really don’t want. Situations such as being enslaved by an army of giant killer robots, or having your job taken by a simulated mind are really incredibly close to what you do want compared to situations such as your universe being efficiently remodeled into stationery. If you have a machine with random values and the ability to manipulate everything in the universe, the chance of it’s final product having humans and tea and crumpets in it is unfathomably unlikely. Some SIAI members seem to believe that almost anyone who manages to make a powerful general AI will be so incapable of giving it suitable values as to approximate a random selection from mind design space.

The fear is not that whoever picks the AI’s goals will do so at random, but rather that they won’t forsee the extent of the AI’s influence, and will pick narrow goals that may as well be random when they act on the world outside the realm they were intended. For instance an AI programmed to like finding really big prime numbers might find methods that are outside the box, such as hacking computers to covertly divert others’ computing power to the task. If it improves its own intelligence immensely and copies itself we might quickly find ourselves amongst a race of superintelligent creatures whose only value is to find prime numbers. The first thing they would presumably do is stop this needless waste of resources worldwide on everything other than doing that.

Having an impact outside the intended realm is a problem that could exist for any technology. For a certain time our devices do what we want, but at some point they diverge if left long enough, depending on how well we have designed them to do what we want. In the past a car driving itself would diverge from what you wanted at the first corner, whereas after more work they diverge at the point another car gets in their way, and after more work they will diverge at the point that you unexpectedly need to pee.

Notice that at all stages we know over what realm the car’s values coincide with ours, and design it to run accordingly. The same goes with just about all the technology I can think of. Because your toaster’s values and yours diverge as soon as you cease to want bread heated, your toaster is programmed to turn off at that point and not to be very powerful.

Perhaps the concern about strong AI having the wrong goals is like saying ‘one day there will be cars that can drive themselves. It’s much easier to make a car that drives by itself than to make it steer well, so when this technology is developed, the cars will probably have the wrong goals and drive off the road.’ The error here is assuming that the technology will be used outside the realm it does what we want because the imagined amazing prototype can and programming what we do want it to do seems hard. In practice we hardly ever encounter this problem because we know approximately what our creations will do, and can control where they are set to do something. Is AI different?

One suggestion it might be different comes from looking at technologies that intervene in very messy systems. Medicines, public policies and attempts to intervene in ecosystems for instance are used without total knowledge of their effects, and often to broader and iller effects than anticipated. If it’s hard to design a single policy with known consequences, and hard to tell what the consequences are, safely designing a machine which will intervene in everything in ways you don’t anticipate is presumably harder. But it seems effects of medicine and policy aren’t usually orders of magnitude larger than anticipated. Nobody accidentally starts a holocaust by changing the road rules. Also in the societal cases, the unanticipated effects are often from society reacting to the intervention, rather than from the mechanism used having unpredictable reach. e.g. it is not often that a policy which intends to improve childhood literacy accidentally improves adult literacy as well, but it might change where people want to send their children to school and hence where they live and what children do in their spare time. This is not such a problem, as human reactions presumably reflect human goals. It seems incredibly unlikely that AI will not have huge social effects of this sort.

Another suggestion that human level AI might have the ‘wrong’ values is that the more flexible and complicated things are the harder it is to predict them in all of the circumstances they might be used. Software has bugs and failures sometimes because those making it could not think of every relevant difference in situations it will be used. But again, we have an idea of how fast these errors turn up and don’t move forward faster than enough are corrected.

The main reason that the space in which to trust technology to please us is predictable is that we accumulate technology incrementally and in pace with the corresponding science, so have knowledge and similar cases to go by. So another reason AI could be different is that there is a huge jump in AI ability suddenly. As far as I can tell this is the basis for SIAI concern. For instance if after years of playing with not very useful code, a researcher suddenly figures out a fundamental equation of intelligence and suddenly finds the reachable universe at his command. Because he hasn’t seen anything like it, when he runs it he has virtually no idea how much it will influence or what it will do. So the danger of bad values is dependent on the danger of a big jump in progress. As I explained previously, a jump seems unlikely. If artificial intelligence is reached more incrementally, even if it ends up being a powerful influence in society, there is little reason to think it will have particularly bad values.

10 responses to “Why will we be extra wrong about AI values?

  1. The concern depends not just on a big jump in AI capacity, but on the developer not appreciating the dangers from such a big jump. While this seems possible, it hardly seems the most likely outcome.

  2. You are quite right that general AI is not likely to have worse moralities than any other optimizers we have developed. The key difference, I think, comes in because general AI applies to a much broader domain than any other optimizer. If you’re designing, say, an autopilot, it’s reasonably easy to specify what you want: you want the plane to take off, go to altitude X, avoid colliding with other planes, fly to the destination, and then land on the runway. For any given sufficiently narrow task, we can give the AI a good description of what it is that we want it to do, and have a reasonable expectation that it will do something good, as opposed to something bad, most of the time.

    However, with something more complicated, eg., ending death, it’s much more difficult to specify what we want. Consider this “Wish for Immortality”, from the Open-Source Wish Project:

    “I wish to live in the locations of my choice, in a physically healthy, uninjured, and apparently normal version of my current body containing my current mental state, a body which will heal from all injuries at a rate three sigmas faster than the average given the medical technology available to me, and which will be protected from any diseases, injuries or illnesses causing disability, pain, or degraded functionality or any sense, organ, or bodily function for more than ten days consecutively or fifteen days in any year; at any time I may rejuvenate my body to a younger age, by saying a phrase matching this pattern five times without interruption, and with conscious intent: ‘I wish to be age,’ followed by a number between one and two hundred, followed by ‘years old,’ at which point the pattern ends – after saying a phrase matching that pattern, my body will revert to an age matching the number of years I started and I will commence to age normally from that stage, with all of my memories intact; at any time I may die, by saying five times without interruption, and with conscious intent, ‘I wish to be dead’; the terms ‘year’ and ‘day’ in this wish shall be interpreted as the ISO standard definitions of the Earth year and day as of 2006. ”

    You can see the list of all the special cases that have to be avoided: we want to not die, but we also want to not just get older and frailer forever, and we don’t want to slowly slip into dementia for all eternity. We also don’t want to keep accumulating scars from various minor injuries, until our bodies are completely dysfunctional. If we get run over by a car, we don’t want to be stuck in a paralyzed body. We don’t want to be trapped in prison forever as part of a “life sentence”. We want to be able to kill ourselves if desired, but we don’t want someone else to be able to trick us into saying something that would lead to our deaths, etc., etc., etc.

    Hence, even if general AI has the same, or even substantially better, values relative to broadness of scope, it will still probably spell our doom if it gets too powerful, because as the scope gets broader, the actions determined by any given sets of values diverge more and more.

  3. A prime-number-finder or reproductive-success maximizer with some understanding of humans and concern for large scopes would have instrumental reason to behave in human-desirable ways when weak in order to get into a position to obtain increased power to attain its goals without being shut off. So an enormous variety of goal systems (e.g. ones created with relatively little understanding of the dynamics of values, or through opaque evolutionary methods) could appear to work great in testing environments and early deployments and then undergo a phase transition when sufficiently numerous/powerful/intelligent.

    If corporate developers have an ill-understood AI that appears to be behaving nicely, and capable of earning astronomical amounts of money (or achieving other aims) if deployed, how much will they resist sales for safety precautions? What if there are many competing corporations and governments, so that safety investments face a commons problem and an arms race problem?

    Once diverse AI systems are deployed in the world, there are further huge coordination problems for existing interests to restrict the growth of AI interest groups.

  4. Pingback: uberVU - social comments

  5. The crucial difference that makes AI values important is not even speed, it’s autonomy of development. Even if AI is really slow, what matters is if there is no stopping it. In all other developments, it’s possible for humans to extend control by either being directly as a narrow spot in the control loop, or by having enough external control over the dynamic, so that is anything goes wrong it’s possible to change the course.

    With AI, the problem is that there is at least some component in the dynamic that lives on its own. Even if you are able to exert some pressure on the earlier stages of development, there is a core of reflectively consistent value that doesn’t care about your wishes and waits for its turn. If this dynamic ever comes to a position of power, its values come to a forefront, no matter how effectively it could be controlled or “tamed” before that.

    Thus, I contend that it’s reflective consistency, resistance to “social” pressure, that makes arbitrary values dangerous, no matter what is said of speed and power. Of course, with speed and power it only gets worse.

  6. > The fear is not that whoever picks the AI’s goals will do so at random, but rather that they won’t forsee the extent of the AI’s influence, and will pick narrow goals that may as well be random when they act on the world outside the realm they were intended.

    It would be nice if we could progress to the level where we only had to worry about things going wrong. But all of the ideas I’ve heard on what AIs should do still have the problem that the designers’ goals are bad goals.

  7. > The error here is assuming that the technology will be used outside the realm it does what we want because the imagined amazing prototype can and programming what we do want it to do seems hard. In practice we hardly ever encounter this problem because we know approximately what our creations will do, and can control where they are set to do something. Is AI different?

    What do you make of the suggested brute-force approaches to AI? whole brain emulation seems like a plausible path (and in fact, more plausible than any of the existing software), yet it doesn’t seem to require any sort of sophisticated ‘value’ programming. And humans have historically demonstrated that their values are not to be trusted…

Comment!

This site uses Akismet to reduce spam. Learn how your comment data is processed.