Tag Archives: AI

More convergent instrumental goals

Nick Bostrom follows Steve Omohundro in exploring the types of instrumental goals that intelligences with arbitrary ultimate goals might converge on. This is important for two reasons. First, it means predicting the behavior of arbitrary intelligences might be a tiny bit easier than you’d think. Second, because it draws attention to the difficulty of creating a creature that doesn’t want to get mixed up in taking resources and seeking longevity and that sort of thing.

Between Nick and Steve we have these convergent instrumental goals:

  1. Self-preservation
  2. Preserve your values
  3. self-improvement
  4. rationality
  5. other cognitive enhancement
  6. technological perfection
  7. get resources
  8. Avoid counterfeit utility

I think acquiring information is included in cognitive enhancement here, though to me it seems big and different enough that I’d put it by itself.

I’d like to add three more, incidentally all to do with interacting with other intelligences. So not relevant if you are the one all powerful singleton, but  handy if there are a number of AIs which are of a similar enough level that they might be useful to one another (like sheep and humans are, or humans and humans are). Here’s my list:

Spread your values
This can be achieved by making other entities which have at least some of the same values, or more traditionally by buying advertising space, exercising social influence, or having children. This makes sense for the same reason that ensuring that you maintain your values in the future does. If you think of yourself as just your values (as Bostrom mentions might be appropriate for some kinds of intelligences), spreading values can come under the heading of self-preservation perhaps (or reproduction). But usually we don’t think like this. If I convince you that Bach is pretty good, we don’t think of this as me replacing part of you with part of me. So I think it’s worth having a different category.

Social influence
If you can’t get others to have your values, the next best thing is to get them to do what you want anyway. This can be done in many ways, e.g. deceit, deals, forming relationships with useful people, sending good signals, being likable. Perhaps manipulating others comes under perfection of technology: making friends is in a way like better programming your computer to do what you type in. Both can be seen as investing some effort in arranging a complex system such that in future you will get what you want from it with less input. But not that close, so I give these things a new category. The extent to which it is worth influencing other intelligences depends on how useful they are. If the world only had me and sheep and stupider things in it, influencing sheep would be handy, but not nearly as handy as social skills are to me in this world. The next can be thought of as a subcategory of this one.

Acquire commonly valued goods
Nick and Steve mention acquiring ‘resources’, but from their elaboration it seems like they mean goods that are direct inputs to the intelligence’s goals. An instrumental goal to acquiring directly useful resources is acquiring stuff that is valuable to someone else. This apparently roundabout route is common because of comparative advantage. So while an AI would presumably like to have energy and fast hardware, it would also like a yacht or intellectual property rights for a new kind of frozen dinner, assuming anyone exists who will trade. With few assumptions, acquiring something like the latter will be easier than acquiring the former directly. This is pretty much what humans spend their working lives on.

When to explain

It is commonly claimed that humans’ explicit conscious faculties arose for explaining to others about themselves and their intentions. Similarly when people talk about designing robots that interact with people, they often mention the usefulness of designing such robots to be able to explain to you why it is they changed your investments or rearranged your kitchen.

Perhaps this is a generally useful principle for internally complex units dealing with each other: have some part that keeps an overview of what’s going on inside and can discuss it with others.

If so, the same seems like it should be true of companies. However my experience with companies is that they are often designed specifically to prevent you from being able to get any explanations out of them. Anyone who actually makes decisions regarding you seems to be guarded by layers of people who can’t be held accountable for anything. They can sweetly lament your frustrations, agree that the policies seem unreasonable, sincerely wish you a nice day, and most importantly, have nothing to do with the policies in question and so can’t be expected to justify them or change them based on any arguments or threats you might make.

I wondered why this strategy should be different for companies, and a friend pointed out that companies do often make an effort at more high level explanations of what they are doing, though not necessarily accurate: vision statements, advertisements etc. PR is often the metaphor for how the conscious mind works after all.

So it seems the company strategy is more complex: general explanations coupled with avoidance of being required to make more detailed ones of specific cases and policies. So, is this strategy generally useful? Is it how humans behave? Is it how successful robots will behave?*

Inspired by an interaction with ETS, evidenced lately by PNC and Verizon

*assuming there is more than one

Why focus on making robots nice?

From Michael Anderson and Susan Leigh Anderson in Scientific American:

Today’s robots…face a host of ethical quandaries that push the boundaries of artificial intelligence, or AI, even in quite ordinary situations.

Imagine being a resident in an assisted-living facility…you ask the robot assistant in the dayroom for the remote …But another resident also wants the remote …The robot decides to hand the remote to her. …This anecdote is an example of an ordinary act of ethical decision making, but for a machine, it is a surprisingly tough feat to pull off.

We believe that the solution is to design robots able to apply ethical principles to new and unanticipated situations… for them to be welcome among us their actions should be perceived as fair, correct or simply kind. Their inventors, then, had better take the ethical ramifications of their programming into account…

It seems there are a lot of articles focussing on the problem that some of the small  decisions robots will make will be ‘ethical’. There are also many fearing that robots may want to do particularly unethical things, such as shoot people.

Working out how to make a robot behave ‘ethically’ in this narrow sense (arguably all behaviour has an ethical dimension) is an odd problem to set apart from the myriad other problems of making a robot behave usefully. Ethics doesn’t appear to pose unique technical problems. The aforementioned scenario is similar to ‘non-ethical’ problems of making a robot prioritise its behaviour. On the other hand, teaching a robot when to give a remote control to a certain woman is not especially generalisable to other ethical issues such as teaching it which sexual connotations it may use in front of children, except in sharing methods so broad as to also include many more non-ethical behaviours.

The authors suggests that robots will follow a few simple absolute ethical rules like Asimov’s. Perhaps this could unite ethical problems as worth considering together. However if robots are given such rules, they will presumably also be following big absolute rules for other things. For instance if ‘ethics’ is so narrowly defined as to include only choices such as when to kill people and how to be fair, there will presumably be other rules about the overall goals when not contemplating murder. These would matter much more than the ‘ethics’. So how to pick big rules and guess their far reaching effects would again not be an ethics-specific issue. On top of that, until anyone is close to a situation where they could be giving a robot such an abstract rule to work from, the design of said robots is so open as to make the question pretty pointless except as a novel way of saying ‘what ethics do I approve of?’.

I agree that it is useful to work out what you value (to some extent) before you program a robot to do it, particularly including overall aims. Similarly I think it’s a good idea to work out where you want to go before you program your driverless car to drive you there. This doesn’t mean there is any eerie issue of getting a car to appreciate highways when it can’t truly experience them. It also doesn’t present you with any problem you didn’t have when you had to drive your own car – it has just become a bit more pressing.

Rainbow Robot

Making rainbows has much in common with other manipulations of water vapor. Image by Jenn and Tony Bot via Flickr

Perhaps, on the contrary, ethical problems are similar in that humans have very nuanced ideas about them and can’t really specify satisfactory general principles to account for them. If the aim is for robots to learn how to behave just from seeing a lot of cases, without being told a rule, perhaps this is a useful category of problems to set apart? No – there are very few things humans deal with that they can specify directly. If a robot wanted to know the complete meaning of almost any word it would have to deal with a similarly complicated mess.

Neither are problems of teaching (narrow) ethics to robots united in being especially important, or important in similar ways, as far as I can tell. If the aim is about something like treating people well, people will be much happier if the robot gives the remote control to anyone rather than ignoring them all until it has finished sweeping the floors than if it gets the question of who to give it to correct. Yet how to get a robot to prioritise floor cleaning below remote allocating at the right times seems an uninteresting technicality, both to me and seemingly to authors of popular articles. It doesn’t excite any ‘ethics’ alarms. It’s like wondering how the control panel will be designed in our teleportation chamber: while the rest of the design is unclear, it’s a pretty uninteresting question. When the design is more clear, to most it will be an uninteresting technical matter. How robots will be ethical or kind is similar, yet it gets a lot of attention.

Why is it so exciting to talk about teaching robots narrow ethics? I have two guesses. One, ethics seems such a deep and human thing, it is engaging to frighten ourselves by associating it with robots. Two, we vastly overestimate the extent to which value of outcomes to reflects the virtue of motives, so we hope robots will be virtuous, whatever their day jobs are.

SIA says AI is no big threat

Artificial Intelligence could explode in power and leave the direct control of humans in the next century or so. It may then move on to optimize the reachable universe to its goals. Some think this sequence of events likely.

If this occurred, it would constitute an instance of our star passing the entire Great Filter. If we should cause such an intelligence explosion then, we are the first civilization in roughly the past light cone to be in such a position. If anyone else had been in this position, our part of the universe would already be optimized, which it arguably doesn’t appear to be. This means that if there is a big (optimizing much of the reachable universe) AI explosion in our future, the entire strength of the Great Filter is in steps before us.

This means a big AI explosion is less likely after considering the strength of the Great Filter, and much less likely if one uses the Self Indication Assumption (SIA).

The large minimum total filter strength contained in the Great Filter is evidence for larger filters in the past and in the future. This means evidence against the big AI explosion scenario, which requires that the future filter is tiny.

SIA implies that we are unlikely to give rise to an intelligence explosion for similar reasons, but probably much more strongly. As I pointed out before, SIA says that future filters are much more likely to be large than small. This is easy to see in the case of AI explosions. Recall that SIA increases the chances  of hypotheses where there are more people in our present situation. If we precede an AI explosion, there is only one civilization in our situation, rather than potentially many if we do not. Thus the AI hypothesis is disfavored (by a factor the size of the extra filter it requires before us).

What the Self Sampling Assumption (SSA), an alternative principle to SIA, says depends on the reference class. If the reference class includes AIs, then we should strongly not anticipate such an AI explosion. If it does not, then we strongly should (by the doomsday argument). These are both basically due to the Doomsday Argument.

In summary, if you begin with some uncertainty about whether we precede an AI explosion, then updating on the observed large total filter and accepting SIA should make you much less confident in that outcome. The Great Filter and SIA don’t just mean that we are less likely to peacefully colonize space than we thought, they also mean we are less likely to horribly colonize it, via an unfriendly AI explosion.

Light cone eating AI explosions are not filters

Some existential risks can’t account for any of the Great Filter. Here are two categories of existential risks that are not filters:

Too big: any disaster that would destroy everyone in the observable universe at once, or destroy space itself, is out. If others had been filtered by such a disaster in the past, we wouldn’t be here either. This excludes events such as simulation shutdown and breakdown of a metastable vacuum state we are in.

Not the end: Humans could be destroyed without the causal path to space colonization being destroyed. Also much of human value could be destroyed without humans being destroyed. e.g. Super-intelligent AI would presumably be better at colonizing the stars than humans are. The same goes for transcending uploads. Repressive totalitarian states and long term erosion of value could destroy a lot of human value and still lead to interstellar colonization.

Since these risks are not filters, neither the knowledge that there is a large minimum total filter nor the use of SIA increases their likelihood.  SSA still increases their likelihood for the usual Doomsday Argument reasons. I think the rest of the risks listed in Nick Bostrom’s paper can be filters. According to SIA averting these filter existential risks should be prioritized more highly relative to averting non-filter existential risks such as those in this post. So for instance AI is less of a concern relative to other existential risks than otherwise estimated. SSA’s implications are less clear – the destruction of everything in the future is a pretty favorable inclusion in a hypothesis under SSA with a broad reference class, but as always everything depends on the reference class.

Might law save us from uncaring AI?

Robin has claimed a few times that law is humans’ best bet for protecting ourselves from super-intelligent robots. This seemed unlikely to me, and he didn’t offer much explanation. I figured laws would protect us while AI was about as intellectually weak as us, but not if when it was far more powerful. I’ve changed my mind somewhat though, so let me explain.

When is it efficient to kill humans?

At first glance, it looks like creatures with the power to take humans’ property would do so if the value of the property minus the cost of stealing it was greater than the value of anything the human might produce with it. When AI is so cheap and efficient that the human will be replaced immediately, and the replacement will use resources enough better to make up for the costs of stealing and replacement, the human is better dead. This might be soon after humans are overtaken. However such reasoning is really imagining one powerful AI’s dealings with one person, then assuming that generalizes to many of each. Does it?

What does law do?

In a group of agents where none is more powerful than the rest combined, and there is no law, basically the strongest coalition of agents gets to do what they want, including stealing others’ property. There is an ongoing cost of conflict, so overall the group would do better if they could avoid this situation, but those with power at a given time benefits from stealing, so it goes on. Law  basically lets everyone escape the dynamic of groups dominating one another (or some of it) by everyone in a very large group pre-committing to take the side of whoever is being dominated in smaller conflicts. Now wherever the strong try to dominate the weak, the super-strong awaits to crush the strong. Continue reading

‘Cheap’ goals won’t explode intelligence

An intelligence explosion is what hypothetically happens when a clever creature finds that the best way to achieve its goals is to make itself even cleverer first, and then to do so again and again as its heightened intelligence makes the the further investment cheaper and cheaper. Eventually the creature becomes uberclever and can magically (from humans’ perspective) do most things, such as end humanity in pursuit of stuff it likes more. This is predicted by some to be the likely outcome for artificial intelligence, probably as an accidental result of a smart enough AI going too far with any goal other than forwarding everything that humans care about.

In trying to get to most goals, people don’t invest and invest until they explode with investment. Why is this? Because it quickly becomes cheaper to actually fulfil a goal at than it is to invest more and then fulfil it. This happens earlier the cheaper the initial goal. Years of engineering education prior to building a rocket will speed up the project, but it would slow down the building of a sandwich.

A creature should only invest in many levels of intelligence improvement when it is pursuing goals significantly more resource intensive than creating many levels of intelligence improvement. It doesn’t matter that inventing new improvements to artificial intelligence gets easier as you are smarter, because everything else does too.  If intelligence makes other goals easier a the same rate as it makes building more intelligence easier, no goal which is cheaper than building a given amount of intelligence improvement with your current intelligence could cause  an intelligence explosion of that size.

Plenty of questions anyone is currently looking for answers to, such as ‘how do we make super duper nanotechnology?’, ‘how do we cure AIDS?’, ‘how do I get really really rich?’ and even a whole bunch of math questions are likely easier than inventing multiple big advances in AI. The main dangerous goals are infinitely expensive questions such as ‘how many digits of pi can we work out?’ and ‘please manifest our values maximally throughout as much of the universe as possible’. If someone were to build a smart AI and set it to solve any of those relatively cheap goals, it would not accidentally lead to an intelligence explosion. The risk is only with the very expensive goals.

The relative safety of smaller goals here could be confused with the relative safety of goals that comprise a small part of human values. A big fear with an intelligence explosion is that the AI will only know about a few of human goals, so will destroy everything else humans care about in pursuit of them. Notice that these are two different parameters: the proportion of the set of important goals the intelligence knows about and the expense of carrying out the task. Safest are cheap tasks where the AI knows about many of our values it may influence. Worst are potentially infinitely expensive goals with a tiny set of relevant values, such as any variation on ‘do as much of x as you can’.

Everyone else prefers laws to values

How do you tell what a superhuman AI's values are? ( picture: ittybittiesforyou - see bottom)

How do you tell what a superhuman AI's values are? ( picture: ittybittiesforyou - see bottom)

Robin Hanson says that it is more important to have laws than shared values. I agree with him when ‘shared values’ means that shared indexical values remain about different people, e.g. If you and I share a high value of orgasms, you value you having orgasms and I value me having orgasms. Unless we are dating it’s all the same to me if you prefer croquet to orgasms. I think the singularitarians aren’t talking about this though. They want to share values in such a way that AI wants them to have orgasms. In principle this would be far better than having different values and trading. Compare gains from trading with the world economy to gains from the world economy’s most heartfelt wish being to please you. However I think that laws will get far more attention than values overall in arranging for an agreeable robot transition, and rightly so. Let me explain, then show you how this is similar to some more familiar situations.

Greater intelligences are unpredictable

If you know exactly what a creature will do in any given situation before it does it, you are at least as smart as it (if we don’t include it’s physical power as intelligence). Greater intelligences are inherently unpredictable. If you know the intelligence is trying to do, then you know what kind of outcome to expect, but guessing how it will get there is harder. This should be less so for lesser intelligences, and more so for more different intelligences. I will have less trouble guessing what a ten year old will do in chess against me than a grand master, though I can guess the outcome in both cases. If I play someone with a significantly different way of thinking about the game they may also be hard to guess.

Unpredictability is dangerous

This unpredictability is a big part of the fear of a superhuman AI. If you don’t know what path an intelligence will take to the goal you set it, you don’t know whether it will affect other things that you care about. This problem is most vividly illustrated by the much discussed case where the AI in question is suddenly very many orders of magnitude smarter than a human. Imagine we initially gave it only a subset of our values, such as our yearning to figure out whether P = NP, and we assume that it won’t influence anything outside its box. It might determine that the easiest way to do this is to contact outside help, build powerful weapons, take more resources by force, and put them toward more computing power. Because we weren’t expecting it to consider this option, we haven’t told it about our other values that are relevant to this strategy, such as the popular penchant for being alive.

I don’t find this type of scenario likely, but others do, and the problem could arise at a lesser scale with weaker AI. It’s a bit like the problem that every genie owner in fiction has faced. There are two solutions. One is to inform the AI about all of human values, so it doesn’t matter how wide it’s influence is. The other is to restrict its actions. SIAI interest seems to be in giving the AI human values (whatever that means), then inevitably surrendering control to it. If the AI will inevitably likely be so much smarter than humans that it will control everything fovever almost immediately, I agree that values are probably the thing to focus on. But consider the case where AI improves fast but by increments, and no single agent becomes more powerful than all of human society for a long time.

Unpredictability also makes it hard to use values to protect from unpredictability

When trying to avoid the dangers of unpredictability, the same unpredictability causes another problem for using values as a means of control. If you don’t know what an entity will do with given values, it is hard to assess whether it actually has those values. It is much easier to assess whether it is following simpler rules. This seems likely to be the basis for human love of deontological ethics and laws. Utilitarians may get better results in principle, but from the perspective of anyone else it’s not obvious whether they are pushing you in front of a train for the greater good or specifically for the personal bad. You would have to do all the calculations yourself and trust their information. You also can’t rely on them to behave in any particular way so that you can plan around them, unless you make deals with them, which is basically paying them to follow rules, so is more evidence for my point.

‘We’ cannot make the AI’s values safe.

I expect the first of these things to be a particular problem with greater than human intelligences. It might be better in principle if an AI follows your values, but you have little way to tell whether it is. Nearly everyone must trust the judgement, goodness and competency of whoever created a given AI, be it a person or another AI. I suspect this gets overlooked somewhat because safety is thought of in terms of what to do when *we* are building the AI. This is the same problem people often have thinking about government. They underestimate the usefulness of transparency there because they think of the government as ‘we’. ‘We should redistribute wealth’ may seem unproblematic, whereas ‘I should allow an organization I barely know anything about to take my money on the vague understanding that they will do something good with it’ does not. For people to trust AIs the AIs should have simple enough promised behavior that people using them can verify that they are likely doing what they are meant to.

This problem gets worse the less predictable the agents are to you. Humans seem to naturally find rules more important for more powerful people and consequences more important for less powerful people. Our world also contains some greater than human intelligences already: organizations. They have similar problems to powerful AI. We ask them to do something like ‘cheaply make red paint’ and often eventually realize their clever ways to do this harm other values, such as our regard for clean water. The organization doesn’t care much about this because we’ve only paid it to follow one of our values while letting it go to work on bits of the world where we have other values. Organizations claim to have values, but who can tell if they follow them?

To control organizations we restrict them with laws. It’s hard enough to figure out whether a given company did or didn’t give proper toilet breaks to its employees. It’s virtually impossible to work out whether their decisions on toilet breaks are as close to optimal according some popularly agreed set of values.

It may seem this is because values are just harder to influence, but this is not obvious. Entities follow rules because of the incentives in place rather than because they are naturally inclined to respect simple constraints. We could similarly incentivise organizations to be utilitarian if we wanted. We just couldn’t assess whether they were doing it. Here we find rules more useful and values less for these greater than human intelligences than we do for humans.

We judge and trust friends and associates according to what we perceive to be their values. We drop a romantic partner because they don’t seem to love us enough even if they have fulfilled their romantic duties. But most of us will not be put off using a product because we think the company doesn’t have the right attitude, though we support harsh legal punishments for breaking rules. Entities just a bit superhuman are too hard to control with values.

You might point out here that values are not usually programmed specifically in organizations, whereas in AI they are. However this is not a huge difference from the perspective of everyone who didn’t program the AI. To the programmer, giving an AI all of human values may be the best method of avoiding assault on them. So if the first AI is tremendously powerful, so nobody but the programmer gets a look in, values may matter most. If the rest of humanity still has a say, as I think they will, rules will be more important.

Why will we be extra wrong about AI values?

I recently discussed the unlikelihood of an AI taking off and leaving the rest of society behind. The other part I mentioned of Singularitarian concern is that powerful AIs will be programmed with the wrong values. This would be bad even if the AIs did not take over the world entirely, but just became a powerful influence. Is that likely to happen?

Don’t get confused by talk of ‘values’. When people hear this they often think an AI could fail to have values at all, or that we would need to work out how to give an AI values. ‘Values’ just means what the AI does. In the same sense your refrigerator might value making things inside it cold (or for that matter making things behind it warm). Every program you write has values in this sense. It might value outputting ‘#t’ if and only if it’s given a prime number for instance.

The fear then is that a super-AI will do something other than what we want. We are unfortunately picky, and most things other than what we want, we really don’t want. Situations such as being enslaved by an army of giant killer robots, or having your job taken by a simulated mind are really incredibly close to what you do want compared to situations such as your universe being efficiently remodeled into stationery. If you have a machine with random values and the ability to manipulate everything in the universe, the chance of it’s final product having humans and tea and crumpets in it is unfathomably unlikely. Some SIAI members seem to believe that almost anyone who manages to make a powerful general AI will be so incapable of giving it suitable values as to approximate a random selection from mind design space.

The fear is not that whoever picks the AI’s goals will do so at random, but rather that they won’t forsee the extent of the AI’s influence, and will pick narrow goals that may as well be random when they act on the world outside the realm they were intended. For instance an AI programmed to like finding really big prime numbers might find methods that are outside the box, such as hacking computers to covertly divert others’ computing power to the task. If it improves its own intelligence immensely and copies itself we might quickly find ourselves amongst a race of superintelligent creatures whose only value is to find prime numbers. The first thing they would presumably do is stop this needless waste of resources worldwide on everything other than doing that.

Having an impact outside the intended realm is a problem that could exist for any technology. For a certain time our devices do what we want, but at some point they diverge if left long enough, depending on how well we have designed them to do what we want. In the past a car driving itself would diverge from what you wanted at the first corner, whereas after more work they diverge at the point another car gets in their way, and after more work they will diverge at the point that you unexpectedly need to pee.

Notice that at all stages we know over what realm the car’s values coincide with ours, and design it to run accordingly. The same goes with just about all the technology I can think of. Because your toaster’s values and yours diverge as soon as you cease to want bread heated, your toaster is programmed to turn off at that point and not to be very powerful.

Perhaps the concern about strong AI having the wrong goals is like saying ‘one day there will be cars that can drive themselves. It’s much easier to make a car that drives by itself than to make it steer well, so when this technology is developed, the cars will probably have the wrong goals and drive off the road.’ The error here is assuming that the technology will be used outside the realm it does what we want because the imagined amazing prototype can and programming what we do want it to do seems hard. In practice we hardly ever encounter this problem because we know approximately what our creations will do, and can control where they are set to do something. Is AI different?

One suggestion it might be different comes from looking at technologies that intervene in very messy systems. Medicines, public policies and attempts to intervene in ecosystems for instance are used without total knowledge of their effects, and often to broader and iller effects than anticipated. If it’s hard to design a single policy with known consequences, and hard to tell what the consequences are, safely designing a machine which will intervene in everything in ways you don’t anticipate is presumably harder. But it seems effects of medicine and policy aren’t usually orders of magnitude larger than anticipated. Nobody accidentally starts a holocaust by changing the road rules. Also in the societal cases, the unanticipated effects are often from society reacting to the intervention, rather than from the mechanism used having unpredictable reach. e.g. it is not often that a policy which intends to improve childhood literacy accidentally improves adult literacy as well, but it might change where people want to send their children to school and hence where they live and what children do in their spare time. This is not such a problem, as human reactions presumably reflect human goals. It seems incredibly unlikely that AI will not have huge social effects of this sort.

Another suggestion that human level AI might have the ‘wrong’ values is that the more flexible and complicated things are the harder it is to predict them in all of the circumstances they might be used. Software has bugs and failures sometimes because those making it could not think of every relevant difference in situations it will be used. But again, we have an idea of how fast these errors turn up and don’t move forward faster than enough are corrected.

The main reason that the space in which to trust technology to please us is predictable is that we accumulate technology incrementally and in pace with the corresponding science, so have knowledge and similar cases to go by. So another reason AI could be different is that there is a huge jump in AI ability suddenly. As far as I can tell this is the basis for SIAI concern. For instance if after years of playing with not very useful code, a researcher suddenly figures out a fundamental equation of intelligence and suddenly finds the reachable universe at his command. Because he hasn’t seen anything like it, when he runs it he has virtually no idea how much it will influence or what it will do. So the danger of bad values is dependent on the danger of a big jump in progress. As I explained previously, a jump seems unlikely. If artificial intelligence is reached more incrementally, even if it ends up being a powerful influence in society, there is little reason to think it will have particularly bad values.

How far can AI jump?

I went to the Singularity Summit recently, organized by the Singularity Institute for Artificial Intelligence (SIAI). SIAI’s main interest is in the prospect of a superintelligence quickly emerging and destroying everything we care about in the reachable universe. This concern has two components. One is that any AI above ‘human level’ will improve its intelligence further until it takes over the world from all other entities. The other is that when the intelligence that takes off is created it will accidentally have the wrong values, and because it is smart and thus very good at bringing about what it wants, it will destroy all that humans value. I disagree that either part is likely. Here I’ll summarize why I find the first part implausible, and there I discuss the second part.

The reason that an AI – or a group of them – is a contender for gaining existentially risky amounts of power is that it could trigger an intelligence explosion which happens so fast that everyone else is left behind. An intelligence explosion is a positive feedback where more intelligent creatures are better at improving their intelligence further.

Such a feedback seems likely. Even now as we gain more concepts and tools that allow us to think well we use them to make more such understanding. AIs fiddling with their architecture don’t seem fundamentally different. But feedback effects are easy to come by. The question is how big this feedback effect will become. Will it be big enough for one machine to permanently overtake the rest of the world economy in accumulating capability?

In order to grow more powerful than everyone else you need to get significantly ahead at some point. You can imagine this could happen either by having one big jump in progress or by having slightly more growth over a long period of time. Having slightly more growth over a long period is staggeringly unlikely to happen by chance, so it needs to share some cause too. Anything that will give you higher growth for long enough to take over the world is a pretty neat innovation, and for you to take over the world everyone else has to not have anything close. So again, this is a big jump in progress. So for AI to help a small group take over the world, it needs to be a big jump.

Notice that no jumps have been big enough before in human invention. Some species, such as humans, have mostly taken over the worlds of other species. The seeming reason for this is that there was virtually no sharing of the relevant information between species. In human society there is a lot of information sharing. This makes it hard for anyone to get far ahead of everyone else. While you can see there are barriers to insights passing between groups, such as incompatible approaches to a kind of technology by different people working on it, these have not so far caused anything like a gap allowing permanent separation of one group.

Another barrier to a big enough jump is that much human progress comes from the extra use of ideas that sharing information brings. You can imagine that if someone predicted writing they might think ‘whoever creates this will be able to have a superhuman memory and accumulate all the knowledge in the world and use it to make more knowledge until they are so knowledgeable they take over everything.’ If somebody created writing and kept it to themselves they would not accumulate nearly as much recorded knowledge as another person who shared a writing system. The same goes for most technology. At the extreme, if nobody shared information, each person would start out with less knowledge than a cave man, and would presumably end up with about that much still. Nothing invented would be improved on. Systems which are used tend to be improved on more. This means if a group hides their innovations and tries to use them alone to create more innovation, the project will probably not grow as fast as the rest of the economy together. Even if they still listen to what’s going on outside, and just keep their own innovations secret, a lot of improvement in technologies like software comes from use. Forgoing information sharing to protect your advantage will tend to slow down your growth.

Those were some barriers to an AI project causing a big enough jump. Are the reasons for it good enough to make up for them?

The main argument for an AI jump seems to be that human level AI is a powerful and amazing innovation that will cause a high growth rate. But this means it is a leap from what we have currently, not that it is especially likely to be arrived at in one leap. If we invented it tomorrow it would be a jump, but that’s just evidence that we won’t invent it tomorrow. You might argue here that however gradually it arrives, the AI will be around human level one day, and then the next it will suddenly be a superpower. There’s a jump from the growth after human level AI is reached, not before. But if it is arrived at incrementally then others are likely to be close in developing similar technology, unless it is a secret military project or something. Also an AI which recursively improves itself forever will probably be preceded by AIs which self improve to a lesser extent, so the field will be moving fast already. Why would the first try at an AI which can improve itself have infinite success? It’s true that if it were powerful enough it wouldn’t matter if others were close behind or if it took the first group a few goes to make it work. For instance if it only took a few days to become as productive as the rest of the world added together, the AI could probably prevent other research if it wanted. However I haven’t heard any good evidence it’s likely to happen that fast.

Another argument made for an AI project causing a big jump is that intelligence might be the sort of thing for which there is a single principle. Until you discover it you have nothing, and afterwards you can build the smartest thing ever in an afternoon and can just extend it indefinitely. Why would intelligence have such a principle? I haven’t heard any good reason. That we can imagine a simple, all powerful principle of controlling everything in the world isn’t evidence for it existing.

I agree human level AI will be a darn useful achievement and will probably change things a lot, but I’m not convinced that one AI or one group using it will take over the world, because there is no reason it will be a never before seen size jump from technology available before it.