Skip to main content
Skip to content
Case File
d-15465House OversightOther

Academic essay on AI value alignment by Tom Griffiths

Date
November 11, 2025
Source
House Oversight
Reference
House Oversight #016312
Pages
1
Persons
0
Integrity
No Hash Available

Summary

The passage is a scholarly discussion of artificial intelligence and value alignment with no specific allegations, names, transactions, or actionable leads involving powerful actors. It offers no nove Discusses the need for AI systems to understand human preferences. Mentions value alignment and inverse‑reinforcement learning. Uses hypothetical examples (dessert‑only meals, dog meat) to illustrate

This document is from the House Oversight Committee Releases.

View Source Collection

Tags

academic-discoursemachine-learninghouse-oversightartificial-intelligencevalue-alignment
Ask AI about this document

Search 264K+ documents with AI-powered analysis

Extracted Text (OCR)

EFTA Disclosure
Text extracted via OCR from the original document. May contain errors from the scanning process.
THE ARTIFICIAL USE OF HUMAN BEINGS Tom Griffiths Tom Griffiths is Henry R. Luce Professor of Information, Technology, Consciousness, and Culture at Princeton University. He is co-author (with Brian Christian) of Algorithms to Live By. When you ask people to imagine a world that has successfully, beneficially incorporated advances in artificial intelligence, everybody probably comes up with a slightly different picture. Our idiosyncratic visions of the future might differ in the presence or absence of spaceships, flying cars, or humanoid robots. But one thing doesn’t vary: the presence of human beings. That’s certainly what Norbert Wiener imagined when he wrote about the potential of machines to improve human society by interacting with humans and helping to mediate their interactions with one another. Getting to that point doesn’t just require coming up with ways to make machines smarter. It also requires a better understanding of how human minds work. Recent advances in artificial intelligence and machine learning have resulted in systems that can meet or exceed human abilities in playing games, classifying images, or processing text. But if you want to know why the driver in front of you cut you off, why people vote against their interests, or what birthday present you should get for your partner, you’re still better off asking a human than a machine. Solving those problems requires building models of human minds that can be implemented inside a computer— something that’s essential not just to better integrate machines into human societies but to make sure that human societies can continue to exist. Consider the fantasy of having an automated intelligent assistant that can take on such basic tasks as planning meals and ordering groceries. To succeed in these tasks, it needs to be able to make inferences about what you want, based on the way you behave. Although this seems simple, making inferences about the preferences of human beings can be a tricky matter. For example, having observed that the part of the meal you most enjoy is dessert, your assistant might start to plan meals consisting entirely of desserts. Or perhaps it has heard your complaints about never having enough free time and observed that looking after your dog takes up a considerable amount of that free time. Following the dessert debacle, it has also understood that you prefer meals that incorporate protein, so it might begin to research recipes that call for dog meat. It’s not a long journey from examples like this to situations that begin to sound like problems for the future of humanity (all of whom are good protein sources). Making inferences about what humans want is a prerequisite for solving the AI problem of value alignment—aligning the values of an automated intelligent system with those of a human being. Value alignment is important if we want to ensure that those automated intelligent systems have our best interests at heart. If they can’t infer what we value, there’s no way for them to act in support of those values—and they may well act in ways that contravene them. Value alignment is the subject of a small but growing literature in artificial- intelligence research. One of the tools used for solving this problem is inverse- reinforcement learning. Reinforcement learning is a standard method for training intelligent machines. By associating particular outcomes with rewards, a machine- 92

Forum Discussions

This document was digitized, indexed, and cross-referenced with 1,400+ persons in the Epstein files. 100% free, ad-free, and independent.

Annotations powered by Hypothesis. Select any text on this page to annotate or highlight it.