More convergent instrumental goals

Nick Bostrom follows Steve Omohundro in exploring the types of instrumental goals that intelligences with arbitrary ultimate goals might converge on. This is important for two reasons. First, it means predicting the behavior of arbitrary intelligences might be a tiny bit easier than you’d think. Second, because it draws attention to the difficulty of creating a creature that doesn’t want to get mixed up in taking resources and seeking longevity and that sort of thing.

Between Nick and Steve we have these convergent instrumental goals:

  1. Self-preservation
  2. Preserve your values
  3. self-improvement
  4. rationality
  5. other cognitive enhancement
  6. technological perfection
  7. get resources
  8. Avoid counterfeit utility

I think acquiring information is included in cognitive enhancement here, though to me it seems big and different enough that I’d put it by itself.

I’d like to add three more, incidentally all to do with interacting with other intelligences. So not relevant if you are the one all powerful singleton, but  handy if there are a number of AIs which are of a similar enough level that they might be useful to one another (like sheep and humans are, or humans and humans are). Here’s my list:

Spread your values
This can be achieved by making other entities which have at least some of the same values, or more traditionally by buying advertising space, exercising social influence, or having children. This makes sense for the same reason that ensuring that you maintain your values in the future does. If you think of yourself as just your values (as Bostrom mentions might be appropriate for some kinds of intelligences), spreading values can come under the heading of self-preservation perhaps (or reproduction). But usually we don’t think like this. If I convince you that Bach is pretty good, we don’t think of this as me replacing part of you with part of me. So I think it’s worth having a different category.

Social influence
If you can’t get others to have your values, the next best thing is to get them to do what you want anyway. This can be done in many ways, e.g. deceit, deals, forming relationships with useful people, sending good signals, being likable. Perhaps manipulating others comes under perfection of technology: making friends is in a way like better programming your computer to do what you type in. Both can be seen as investing some effort in arranging a complex system such that in future you will get what you want from it with less input. But not that close, so I give these things a new category. The extent to which it is worth influencing other intelligences depends on how useful they are. If the world only had me and sheep and stupider things in it, influencing sheep would be handy, but not nearly as handy as social skills are to me in this world. The next can be thought of as a subcategory of this one.

Acquire commonly valued goods
Nick and Steve mention acquiring ‘resources’, but from their elaboration it seems like they mean goods that are direct inputs to the intelligence’s goals. An instrumental goal to acquiring directly useful resources is acquiring stuff that is valuable to someone else. This apparently roundabout route is common because of comparative advantage. So while an AI would presumably like to have energy and fast hardware, it would also like a yacht or intellectual property rights for a new kind of frozen dinner, assuming anyone exists who will trade. With few assumptions, acquiring something like the latter will be easier than acquiring the former directly. This is pretty much what humans spend their working lives on.

5 responses to “More convergent instrumental goals

  1. Have you read David Schmidtz? Arizona philosopher with extensive work on instrumental meta-ethics. Rational Choice and Moral Agency is his main work on this topic.

  2. re: “Avoid counterfeit utility”. I’m confused on this point. I can see that one wouldn’t want to unknowingly develop new desires accidentally, particularly if they’re skew to the things it really cares about. But it seems like there are things an arbitrary intelligence might desire (think sex and chocolate, or whatever floats *your* boat) that can’t easily be changed. So living with and satisfying those desires seems a reasonable pursuit, as long as you keep it in bounds. Artificial intelligences might have similar drives based on their fundamental architecture. (Or if they have tastes and emotions, which there are some arguments for.)

    It would be valuable to them to be able to distinguish the sources of their desires, and make trade-offs about how much to spend on satisfying them. But it won’t always be the case that unjustifiable drives can be ignored or subverted costlessly.

    • I think avoiding counterfeit utility means avoiding wireheading. If you have certain goals, you might cause yourself to think or feel like they were satisfied by messing around with your head instead of manipulating the world.

  3. I couldn’t found a direct mention to ‘avoiding counterfeit utility’ or anything related in the cited papers.

Comment!

This site uses Akismet to reduce spam. Learn how your comment data is processed.