Tag Archives: AI

More convergent instrumental goals

Nick Bostrom follows Steve Omohundro in exploring the types of instrumental goals that intelligences with arbitrary ultimate goals might converge on. This is important for two reasons. First, it means predicting the behavior of arbitrary intelligences might be a tiny bit easier than you’d think. Second, because it draws attention to the difficulty of creating a creature that doesn’t want to get mixed up in taking resources and seeking longevity and that sort of thing.

Between Nick and Steve we have these convergent instrumental goals:

  1. Self-preservation
  2. Preserve your values
  3. self-improvement
  4. rationality
  5. other cognitive enhancement
  6. technological perfection
  7. get resources
  8. Avoid counterfeit utility

I think acquiring information is included in cognitive enhancement here, though to me it seems big and different enough that I’d put it by itself.

I’d like to add three more, incidentally all to do with interacting with other intelligences. So not relevant if you are the one all powerful singleton, but  handy if there are a number of AIs which are of a similar enough level that they might be useful to one another (like sheep and humans are, or humans and humans are). Here’s my list:

Spread your values
This can be achieved by making other entities which have at least some of the same values, or more traditionally by buying advertising space, exercising social influence, or having children. This makes sense for the same reason that ensuring that you maintain your values in the future does. If you think of yourself as just your values (as Bostrom mentions might be appropriate for some kinds of intelligences), spreading values can come under the heading of self-preservation perhaps (or reproduction). But usually we don’t think like this. If I convince you that Bach is pretty good, we don’t think of this as me replacing part of you with part of me. So I think it’s worth having a different category.

Social influence
If you can’t get others to have your values, the next best thing is to get them to do what you want anyway. This can be done in many ways, e.g. deceit, deals, forming relationships with useful people, sending good signals, being likable. Perhaps manipulating others comes under perfection of technology: making friends is in a way like better programming your computer to do what you type in. Both can be seen as investing some effort in arranging a complex system such that in future you will get what you want from it with less input. But not that close, so I give these things a new category. The extent to which it is worth influencing other intelligences depends on how useful they are. If the world only had me and sheep and stupider things in it, influencing sheep would be handy, but not nearly as handy as social skills are to me in this world. The next can be thought of as a subcategory of this one.

Acquire commonly valued goods
Nick and Steve mention acquiring ‘resources’, but from their elaboration it seems like they mean goods that are direct inputs to the intelligence’s goals. An instrumental goal to acquiring directly useful resources is acquiring stuff that is valuable to someone else. This apparently roundabout route is common because of comparative advantage. So while an AI would presumably like to have energy and fast hardware, it would also like a yacht or intellectual property rights for a new kind of frozen dinner, assuming anyone exists who will trade. With few assumptions, acquiring something like the latter will be easier than acquiring the former directly. This is pretty much what humans spend their working lives on.

When to explain

It is commonly claimed that humans’ explicit conscious faculties arose for explaining to others about themselves and their intentions. Similarly when people talk about designing robots that interact with people, they often mention the usefulness of designing such robots to be able to explain to you why it is they changed your investments or rearranged your kitchen.

Perhaps this is a generally useful principle for internally complex units dealing with each other: have some part that keeps an overview of what’s going on inside and can discuss it with others.

If so, the same seems like it should be true of companies. However my experience with companies is that they are often designed specifically to prevent you from being able to get any explanations out of them. Anyone who actually makes decisions regarding you seems to be guarded by layers of people who can’t be held accountable for anything. They can sweetly lament your frustrations, agree that the policies seem unreasonable, sincerely wish you a nice day, and most importantly, have nothing to do with the policies in question and so can’t be expected to justify them or change them based on any arguments or threats you might make.

I wondered why this strategy should be different for companies, and a friend pointed out that companies do often make an effort at more high level explanations of what they are doing, though not necessarily accurate: vision statements, advertisements etc. PR is often the metaphor for how the conscious mind works after all.

So it seems the company strategy is more complex: general explanations coupled with avoidance of being required to make more detailed ones of specific cases and policies. So, is this strategy generally useful? Is it how humans behave? Is it how successful robots will behave?*

Inspired by an interaction with ETS, evidenced lately by PNC and Verizon

*assuming there is more than one

Why focus on making robots nice?

From Michael Anderson and Susan Leigh Anderson in Scientific American:

Today’s robots…face a host of ethical quandaries that push the boundaries of artificial intelligence, or AI, even in quite ordinary situations.

Imagine being a resident in an assisted-living facility…you ask the robot assistant in the dayroom for the remote …But another resident also wants the remote …The robot decides to hand the remote to her. …This anecdote is an example of an ordinary act of ethical decision making, but for a machine, it is a surprisingly tough feat to pull off.

We believe that the solution is to design robots able to apply ethical principles to new and unanticipated situations… for them to be welcome among us their actions should be perceived as fair, correct or simply kind. Their inventors, then, had better take the ethical ramifications of their programming into account…

It seems there are a lot of articles focussing on the problem that some of the small  decisions robots will make will be ‘ethical’. There are also many fearing that robots may want to do particularly unethical things, such as shoot people.

Working out how to make a robot behave ‘ethically’ in this narrow sense (arguably all behaviour has an ethical dimension) is an odd problem to set apart from the myriad other problems of making a robot behave usefully. Ethics doesn’t appear to pose unique technical problems. The aforementioned scenario is similar to ‘non-ethical’ problems of making a robot prioritise its behaviour. On the other hand, teaching a robot when to give a remote control to a certain woman is not especially generalisable to other ethical issues such as teaching it which sexual connotations it may use in front of children, except in sharing methods so broad as to also include many more non-ethical behaviours.

The authors suggests that robots will follow a few simple absolute ethical rules like Asimov’s. Perhaps this could unite ethical problems as worth considering together. However if robots are given such rules, they will presumably also be following big absolute rules for other things. For instance if ‘ethics’ is so narrowly defined as to include only choices such as when to kill people and how to be fair, there will presumably be other rules about the overall goals when not contemplating murder. These would matter much more than the ‘ethics’. So how to pick big rules and guess their far reaching effects would again not be an ethics-specific issue. On top of that, until anyone is close to a situation where they could be giving a robot such an abstract rule to work from, the design of said robots is so open as to make the question pretty pointless except as a novel way of saying ‘what ethics do I approve of?’.

I agree that it is useful to work out what you value (to some extent) before you program a robot to do it, particularly including overall aims. Similarly I think it’s a good idea to work out where you want to go before you program your driverless car to drive you there. This doesn’t mean there is any eerie issue of getting a car to appreciate highways when it can’t truly experience them. It also doesn’t present you with any problem you didn’t have when you had to drive your own car – it has just become a bit more pressing.

Rainbow Robot

Making rainbows has much in common with other manipulations of water vapor. Image by Jenn and Tony Bot via Flickr

Perhaps, on the contrary, ethical problems are similar in that humans have very nuanced ideas about them and can’t really specify satisfactory general principles to account for them. If the aim is for robots to learn how to behave just from seeing a lot of cases, without being told a rule, perhaps this is a useful category of problems to set apart? No – there are very few things humans deal with that they can specify directly. If a robot wanted to know the complete meaning of almost any word it would have to deal with a similarly complicated mess.

Neither are problems of teaching (narrow) ethics to robots united in being especially important, or important in similar ways, as far as I can tell. If the aim is about something like treating people well, people will be much happier if the robot gives the remote control to anyone rather than ignoring them all until it has finished sweeping the floors than if it gets the question of who to give it to correct. Yet how to get a robot to prioritise floor cleaning below remote allocating at the right times seems an uninteresting technicality, both to me and seemingly to authors of popular articles. It doesn’t excite any ‘ethics’ alarms. It’s like wondering how the control panel will be designed in our teleportation chamber: while the rest of the design is unclear, it’s a pretty uninteresting question. When the design is more clear, to most it will be an uninteresting technical matter. How robots will be ethical or kind is similar, yet it gets a lot of attention.

Why is it so exciting to talk about teaching robots narrow ethics? I have two guesses. One, ethics seems such a deep and human thing, it is engaging to frighten ourselves by associating it with robots. Two, we vastly overestimate the extent to which value of outcomes to reflects the virtue of motives, so we hope robots will be virtuous, whatever their day jobs are.

SIA says AI is no big threat

Artificial Intelligence could explode in power and leave the direct control of humans in the next century or so. It may then move on to optimize the reachable universe to its goals. Some think this sequence of events likely.

If this occurred, it would constitute an instance of our star passing the entire Great Filter. If we should cause such an intelligence explosion then, we are the first civilization in roughly the past light cone to be in such a position. If anyone else had been in this position, our part of the universe would already be optimized, which it arguably doesn’t appear to be. This means that if there is a big (optimizing much of the reachable universe) AI explosion in our future, the entire strength of the Great Filter is in steps before us.

This means a big AI explosion is less likely after considering the strength of the Great Filter, and much less likely if one uses the Self Indication Assumption (SIA).

The large minimum total filter strength contained in the Great Filter is evidence for larger filters in the past and in the future. This means evidence against the big AI explosion scenario, which requires that the future filter is tiny.

SIA implies that we are unlikely to give rise to an intelligence explosion for similar reasons, but probably much more strongly. As I pointed out before, SIA says that future filters are much more likely to be large than small. This is easy to see in the case of AI explosions. Recall that SIA increases the chances  of hypotheses where there are more people in our present situation. If we precede an AI explosion, there is only one civilization in our situation, rather than potentially many if we do not. Thus the AI hypothesis is disfavored (by a factor the size of the extra filter it requires before us).

What the Self Sampling Assumption (SSA), an alternative principle to SIA, says depends on the reference class. If the reference class includes AIs, then we should strongly not anticipate such an AI explosion. If it does not, then we strongly should (by the doomsday argument). These are both basically due to the Doomsday Argument.

In summary, if you begin with some uncertainty about whether we precede an AI explosion, then updating on the observed large total filter and accepting SIA should make you much less confident in that outcome. The Great Filter and SIA don’t just mean that we are less likely to peacefully colonize space than we thought, they also mean we are less likely to horribly colonize it, via an unfriendly AI explosion.

Light cone eating AI explosions are not filters

Some existential risks can’t account for any of the Great Filter. Here are two categories of existential risks that are not filters:

Too big: any disaster that would destroy everyone in the observable universe at once, or destroy space itself, is out. If others had been filtered by such a disaster in the past, we wouldn’t be here either. This excludes events such as simulation shutdown and breakdown of a metastable vacuum state we are in.

Not the end: Humans could be destroyed without the causal path to space colonization being destroyed. Also much of human value could be destroyed without humans being destroyed. e.g. Super-intelligent AI would presumably be better at colonizing the stars than humans are. The same goes for transcending uploads. Repressive totalitarian states and long term erosion of value could destroy a lot of human value and still lead to interstellar colonization.

Since these risks are not filters, neither the knowledge that there is a large minimum total filter nor the use of SIA increases their likelihood.  SSA still increases their likelihood for the usual Doomsday Argument reasons. I think the rest of the risks listed in Nick Bostrom’s paper can be filters. According to SIA averting these filter existential risks should be prioritized more highly relative to averting non-filter existential risks such as those in this post. So for instance AI is less of a concern relative to other existential risks than otherwise estimated. SSA’s implications are less clear – the destruction of everything in the future is a pretty favorable inclusion in a hypothesis under SSA with a broad reference class, but as always everything depends on the reference class.