Text extracted via OCR from the original document. May contain errors from the scanning process.
learning system can be trained to follow strategies that produce those outcomes. Wiener
hinted at this idea in the 1950s, but the intervening decades have developed it into a fine
art. Modern machine-learning systems can find extremely effective strategies for playing
computer games—from simple arcade games to complex real-time strategy games—by
applying reinforcement-learning algorithms. Inverse reinforcement learning turns this
approach around: By observing the actions of an intelligent agent that has already
learned effective strategies, we can infer the rewards that led to the development of those
strategies.
In its simplest form, inverse reinforcement learning is something people do all the
time. It’s so common that we even do it unconsciously. When you see a co-worker go to
a vending machine filled with potato chips and candy and buy a packet of unsalted nuts,
you infer that your co-worker (1) was hungry and (2) prefers healthy food. When an
acquaintance clearly sees you and then tries to avoid encountering you, you infer that
there’s some reason they don’t want to talk to you. When an adult spends a lot of time
and money in learning to play the cello, you infer that they must really like classical
music—whereas inferring the motives of a teenage boy learning to play an electric guitar
might be more of a challenge.
Inverse reinforcement learning is a statistical problem: We have some data—the
behavior of an intelligent agent—and we want to evaluate various hypotheses about the
rewards underlying that behavior. When faced with this question, a statistician thinks
about the generative model behind the data: What data would we expect to be generated
if the intelligent agent was motivated by a particular set of rewards? Equipped with the
generative model, the statistician can then work backward: What rewards would likely
have caused the agent to behave in that particular way?
If you’re trying to make inferences about the rewards that motivate human
behavior, the generative model is really a theory of how people behave—how human
minds work. Inferences about the hidden causes behind the behavior of other people
reflect a sophisticated model of human nature that we all carry around in our heads.
When that model is accurate, we make good inferences. When it’s not, we make
mistakes. For example, a student might infer that his professor is indifferent to him if the
professor doesn’t immediately respond to his email—a consequence of the student’s
failure to realize just how many emails that professor receives.
Automated intelligent systems that will make good inferences about what people
want must have good generative models for human behavior: that is, good models of
human cognition expressed in terms that can be implemented on a computer.
Historically, the search for computational models of human cognition is intimately
intertwined with the history of artificial intelligence itself. Only a few years after Norbert
Wiener published Ze Human Use of Human Beings, Logic Theorist, the first
computational model of human cognition and also the first artificial-intelligence system,
was developed by Herbert Simon, of Carnegie Tech, and Allen Newell, of the RAND
Corporation. Logic Theorist automatically produced mathematical proofs by emulating
the strategies used by human mathematicians.
The challenge in developing computational models of human cognition is making
models that are both accurate and generalizable. An accurate model, of course, predicts
human behavior with a minimum of errors. A generalizable model can make predictions
across a wide range of circumstances, including circumstances unanticipated by its
93
HOUSE_OVERSIGHT_016313