The Multiverse According to Ben: Quasi-Mathematical Speculations on Contraction Maps and Hypothetical Friendly Super-AIs

While eating ramen soup with Ruiting in the Tai Po MegaMall tonight, I found myself musing about the possible use of the contraction mapping theorem to understand the properties of AGI systems that create other AGI systems that create other AGI systems that … etc. ….

It's a totally speculative line of thinking, that may be opaque to anyone without a certain degree of math background.

But if it pans out, it ultimately could provide an answer to the question: When can an AGI system, creating new AGI systems or modifying itself in pursuit of certain goals, be reasonably confident that its new creations are going to continue respecting the goals for which they were created?

This question is especially interesting when the goals in question are things like "Create amazing new things and don't harm anybody in the process." If we create an AGI with laudable goals like this, and then it creates a new AGI with the same goals, etc. -- when can we feel reasonably sure the sequence of AGIs won't diverge dramatically from the original goals?

Anyway, here goes…

Suppose that one has two goals, G and H

Given a goal G, let us use the notation " agi(G, C) " to denote the goal of creating an AGI system, operating within resources C, that will adequately figure out how to achieve goal G

Let d(,) denote a distance measure on the space of goals. One reasonable hypothesis is that, if

d(G,H) = D

then generally speaking,

d( agi(G,c), agi(H,c) ) < k D

for some k …. That is: because AGI systems are general in capability and good at generalization, if you change the goal of an AGI system by a moderate amount, you have to change the AGI system itself by less than that amount…

If this is true, then we have an interesting consequence…. We have the consequence that

F(X) = agi(X,C)

is a contraction mapping on the space of goals. This means that, if we are working with a goal space that is a complete metric space, we have a fixed point G* so that

F(G*) = G*

i.e. so that

G* = agi(G*,C)

The fixed point G* is the goal of the following form:

G* = the goal of finding an AGI that can adequately figure out how to achieve G*

Of course, to make goal space a complete metric space one probably needs to admit some uncomputable goals, i.e. goals only computable using infinitely long computer programs. So a goal like G* can never quite be achieved using ordinary computers, but only approximated.

Anyway, G* probably doesn't seem like a very interesting goal… apart from a certain novelty value….

However, one can vary on the above argument in a way that makes it possibly more useful.

Suppose we look at

agi(G,I,C)

-- i.e., the goal of creating an AGI that can adequately figure out how to achieve goals G and I within resources C.

Then it may also be the case that

d( agi(G,I,C), agi(H, I, C) ) < k d(G,H)

If so, then we can show the existence of a fixed point goal G* so that

G* = agi(G*, I, C)

or in words,

G* = the goal of finding an AGI that can adequately figure out how to achieve both goal G* and goal I

The contraction mapping theorem shows that if we start with a goal G close enough to G*, we can converge toward G* via an iteration such as

G, I
agi(G, I, C)
agi( agi(G,I,C), I, C)
agi( agi( agi(G,I,C), I, C) , I, C)
…
etc.

At each stage of the iteration, the AGI becomes more and more intelligent, as it's dealing with more and more abstract learning problems. But according to the contraction mapping theorem, the AGI systems in the series are getting closer and closer to each other -- the process is converging.

So then we have the conclusion: If one starts with a system smart enough to solve the problem agi(G,I, C) reasonably well for the given I and C -- then ongoing goal-directed creation of new AGI systems will lead to new systems that respect the goals for which they were created.

Which may seem a bit tautologous! But the devil actually lies in the details -- which I have omitted here, because I haven't figured them out! The devil lies in the little qualifiers "acceptably" and "reasonably well" that I've used above. Exactly how well does the problem agi(G,I,C) need to be solved for the contraction mapping property to hold?

And of course, it may be that the contraction mapping property doesn't actually hold in the simple form given above -- rather, some more complex property similar in spirit may hold, meaning that one has to use some generalization of the contraction mapping theorem, and everything becomes more of a mess, or at least subtler.

So, all this is not very rigorous -- at this stage, it's more like philosophy/poetry using the language of math, rather than real math. But I think it points in an interesting direction. It suggests to me that, if we want to create a useful mathematics of AGIs that try to achieve their goals by self-modifying or creating new AGIs, maybe we should be looking at the properties of mappings like agi() on the metric space of goals. This is a different sort of direction than standard theoretical computer science -- it's an odd sort of discrete dynamical systems theory dealing with computational iterations that converge to infinite computer programs compactly describable as hypersets.

Anyway this line of thought will give me interesting dreams tonight ... I hope it does the same for you ;-) ...

The Multiverse According to Ben

Wednesday, May 15, 2013

Quasi-Mathematical Speculations on Contraction Maps and Hypothetical Friendly Super-AIs

9 comments: