Text extracted via OCR from the original document. May contain errors from the scanning process.
136 7 A Formal Model of Intelligent Agents
weight to a situation S based on the ease with which one agent in a society can communicate
S to another agent in that society, using multimodal communication (including verbalization,
demonstration, dramatic and pictorial depiction, etc.).
Finally, we present a formal measure of the “generality” of an intelligence, which precisiates
the informal distinction between “general AT’ and “narrow AI.”
7.3.1 Biased Universal Intelligence
To define universal intelligence, Legg and Hutter consider the class of environments that are
reward-summable, meaning that the total amount of reward they return to any agent is bounded
by 1. Where 7; denotes the reward experienced by the agent from the environment at time i,
the expected total reward for the agent 7 from the environment j: is defined as
Vr=B(S ori) <1
1
To extend their definition in the direction of greater realism, we first introduce a second-order
probability distribution v, which is a probability distribution over the space of environments
u. The distribution v assigns each environment a probability. One such distribution v is the
Solomonoff-Levin universal distribution in which one sets v = 2-*; but this is not the only
distribution v of interest. In fact a great deal of real-world general intelligence consists of the
adaptation of intelligent systems to particular distributions v over environment-space, differing
from the universal distribution.
We then define
Definition 4 The biased universal intelligence of an agent a is its expected performance
with respect to the distribution v over the space of all computable reward-summable environ-
ments, E, that is,
Y(m) = SO vr
pew
Legg and Hutter’s universal intelligence is obtained by setting v equal to the universal
distribution.
This framework is more flexible than it might seem. E.g. suppose one wants to incorporate
agents that die. Then one may create a special action, say agge, corresponding to the state of
death, to create agents that
e in certain circumstances output action ages
e have the property that if their previous action was ageg, then all of their subsequent actions
must be ages
and to define a reward structure so that actions aggg always bring zero reward. It then follows
that death is generally a bad thing if one wants to maximize intelligence. Agents that die will
not get rewarded after they’re dead; and agents that live only 70 years, say, will be restricted
from getting rewards involving long-term patterns and will hence have specific limits on their
intelligence.
HOUSE_OVERSIGHT_013052