Academic discussion of AGI value alignment frameworks

Date

November 11, 2025

Source

House Oversight

Reference

House Oversight #013123

Pages

Persons

Integrity

No Hash Available

Summary

The passage is a scholarly overview of AI alignment concepts with no specific allegations, names, transactions, or actionable leads involving powerful individuals or institutions. Mentions various AI alignment theories (CEV, CAV, CBV). References academic authors and a US military‑funded robotics book. Discusses philosophical challenges of formalizing human values.

This document is from the House Oversight Committee Releases.

View Source Collection

Extracted Text (OCR)

EFTA Disclosure

Text extracted via OCR from the original document. May contain errors from the scanning process.

12.2 Review of Current Thinking on the Risks of AGI 207 Formalization of human morality has vexed moral philosophers for quite some time. Finally, it is unclear the extent to which such a proof could be created in a generic, environment-independent way — but if the proof depends on properties of the physical environment, then it would re- quire a formalization of the environment itself, which runs up against various problems such as the complexity of the physical world and also the fact that we currently have no complete, consistent theory of physics. Kaj Sotala has provided a list of 14 objections to the Friendly AI concept, and suggested answers to each of them [Sot11]. Stephen Omohundro [Omo08] has argued that any advanced AI system will very likely demonstrate certain "basic AI drives", such as desiring to be rational, to self-protect, to acquire resources, and to preserve and protect its utility function and avoid counterfeit utility; these drives, he suggests, must be taken carefully into account in formulating approaches to Friendly AI. The problem of formally or at least very carefully defining the goal of Friendliness has been considered from a variety of perspectives, none showing dramatic success. Yudkowsky [Yud04] has suggested the concept of "Coherent Extrapolated Volition", which roughly refers to the extrapolation of the common values of the human race. Many subtleties arise in specifying this concept — e.g. if Bob Jones is often possessed by a strong desire to kill all Martians, but he deeply aspires to be a nonviolent person, then the CEV approach would not rate "killing Martians" as part of Bob’s contribution to the CEV of humanity. Goertzel [Goel0a] has proposed a related notion of Coherent Aggregated Volition (CAV), which eschews the subtleties of extrapolation, and simply seeks a reasonably compact, coherent, consistent set of values that is fairly close to the collective value-set of humanity. In the CAV approach, "killing Martians" would be removed from humanity’s collective value-set because it’s uncommon and not part of the most compact/coherent/consistent overall model of human values, rather than because of Bob Jones’ aspiration to nonviolence. One thought we have recently entertained is that the core concept underlying CAV might be better thought of as CBV or "Coherent Blended Volition." CAV seems to be easily misin- terpreted as meaning the average of different views, which was not the original intention. The CBV terminology clarifies that the CBV of a diverse group of people should not be thought of as an average of their perspectives, but as something more analogous to a "conceptual blend" [F'T02] — incorporating the most essential elements of their divergent views into a whole that is overall compact, elegant and harmonious. The subtlety here (to which we shall return below) is that for a CBV blend to be broadly acceptable, the different parties whose views are being blended must agree to some extent that enough of the essential elements of their own views have been included. The process of arriving at this sort of consensus may involve extrapolation of a roughly similar sort to that considered in CEV. Multiple attempts at axiomatization of human values have also been attempted, e.g. with a view toward providing near-term guidance to military robots (see e.g. Arkin’s excellent though chillingly-titled book Governing Lethal Behavior in Autonomous Robots [Ark09b], the result of US military funded research). However, there are reasonably strong arguments that human values (similarly to e.g. human language or human perceptual classification rules) are too com- plex and multifaceted to be captured in any compact set of formal logic rules. Wallach [WA 10] has made this point eloquently, and argued the necessity of fusing top-down (e.g. formal logic based) and bottom-up (e.g. self-organizing learning based) approaches to machine ethics. A number of more sociological considerations also arise. It is sometimes argued that the risk from highly-advanced AGI going morally awry on its own may be less than that of moderately- advanced AGI being used by human beings to advocate immoral ends. This possibility gives

Forum Discussions

This document was digitized, indexed, and cross-referenced with 1,400+ persons in the Epstein files. 100% free, ad-free, and independent.

Donate More ways to support →

Annotations powered by Hypothesis. Select any text on this page to annotate or highlight it.

Academic discussion of AGI value alignment frameworks

Summary

Tags

Extracted Text (OCR)

Forum Discussions