Text extracted via OCR from the original document. May contain errors from the scanning process.
Ben Goertzel with Cassio Pennachin & Nil Geisweiller &
the OpenCog Team
Engineering General Intelligence, Part 1:
A Path to Advanced AGI via Embodied Learning and
Cognitive Synergy
September 19, 2013
EFTA00623759
EFTA00623760
This book is dedicated by Ben Goertzel to his beloved,
departed grandfather, Leo Ztuell - an amazingly
warm-hearted, giving human being who was also a deep
thinker and excellent scientist, who got Ben started on the
path of science. As a careful experimentalist, Leo would
have been properly skeptical of the big hypotheses made
here - but he would have been eager to see them put to the
test!
EFTA00623761
EFTA00623762
Preface
This is a large, two-part book with an even larger goal: To outline a practical approach to
engineering software systems with general intelligence at the human level and ultimately beyond.
Machines with flexible problem-solving ability, open-ended learning capability, creativity and
eventually, their own kind of genius.
Part 1, this volume, reviews various critical conceptual issues related to the nature of intel-
ligence and mind. It then sketches the broad outlines of a novel, integrative architecture for
Artificial General Intelligence (AGI) called CogPrime ... and describes an approach for giving a
young AGI system (CogPrime or otherwise) appropriate experience, so that it can develop its
own smarts, creativity and wisdom through its own experience. Along the way a formal theory,
of general intelligence is sketched, and a broad roadmap leading from here to human-level arti-
ficial intelligence. Hints are also given regarding how to eventually, potentially create machines
advancing beyond human level - including some frankly futuristic speculations about strongly
self-modifying AGI architectures with flexibility far exceeding that of the human brain.
Part 2 then digs far deeper into the details of CogPrime's multiple structures, processes and
functions, culminating in a general argument as to why we believe CogPrime will be able to
achieve general intelligence at the level of the smartest humans (and potentially greater), and
a detailed discussion of how a CogPrime-powered virtual agent or robot would handle some
simple practical tasks such as social play with blocks in a preschool context. It first describes
the CogPrime software architecture and knowledge representation in detail; then reviews the
cognitive cycle via which CogPrime perceives and acts in the world and reflects on itself; and
next turns to various forms of learning: procedural, declarative (e.g. inference), simulative and
integrative. Methods of enabling natural language functionality in CogPrime are then discussed;
and then the volume concludes with a chapter summarizing the argument that CogPrime can
lead to human-level (and eventually perhaps greater) AGI, and a chapter giving a thought
experiment describing the internal dynamics via which a completed CogPrime system might
solve the problem of obeying the request "Build me something with blocks that I haven't seen
before."
The chapters here are written to be read in linear order - and if consumed thus, they tell
a coherent story about how to get from here to advanced AGI. However, the impatient reader
may be forgiven for proceeding a bit nonlinearly. An alternate reading path for the impatient
reader would be to start with the first few chapters of Part 1, then skim the final two chapters of
Part 2, and then return to reading in linear order. The final two chapters of Part 2 give a broad
overview of why we think the CogPrime design will work, in a way that depends on the technical
"Ii
EFTA00623763
vu'
details of the previous chapters, but (we believe) not so sensitively as to be incomprehensible
without them.
This is admittedly an unusual sort of book, mixing demonstrated conclusions with unproved
conjectures in a complex way, all oriented toward an extraordinarily ambitious goal. Further,
the chapters are somewhat variant in their levels of detail - some very nitty-gritty, some more
high level, with much of the variation due to how much concrete work has been done on the
topic of the chapter at time of writing. However, it Ls important to understand that the ideas
presented here are not mere armchair speculation - they are currently being used as the basis
for an open-source software project called OpenCog, which is being worked on by software
developers around the world. Right now OpenCog embodies only a percentage of the overall
CogPrime design as described here. But if OpenCog continues to attract sufficient funding
or volunteer interest, then the ideas presented in these volumes will be validated or refuted
via practice. (As a related note: here and there in this book, we will refer to the "current"
CogPrime implementation (in the OpenCog framework); in all cases this refers to OpenCog as
of late 2013.)
To state one believes one knows a workable path to creating a human-level (and potentially
greater) general intelligence is to make a dramatic statement, given the conventional way of
thinking about the topic in the contemporary scientific community. However, we feel that once
a little more time has passed, the topic will lose its drama (if not its interest and importance),
and it will be widely accepted that there are many ways to create intelligent machines - some
simpler and some more complicated; some more brain-like or human-like and some less so; some
more efficient and some more wasteful of resources; etc. We have little doubt that, from the
perspective of AGI science 50 or 100 years hence (and probably even 10-20 years hence), the
specific designs presented here will seem awkward, messy, inefficient and circuitous in various
respects. But that is how science and engineering progress. Given the current state of knowledge
and understanding, having any concrete, comprehensive design and plan for creating AGI is
a significant step forward; and it is in this spirit that we present here our thinking about the
CogPrime architecture and the nature of general intelligence.
In the words of Sir Edmund Hillary, the first to scale Everest: "Nothing Venture, Nothing
Win."
Prehistory of the Book
The writing of this book began in earnest in 2001, at which point it was informally referred to
as `The Novamente Book." The original "Novamente Book" manuscript ultimately got too big
for its own britches, and subdivided into a number of different works - The Hidden Pattern
roential, a philosophy of mind book published in 2006; Probabilistic Logic Networks IGIGHOSI,
a more technical work published in 2008; Real World Reasoning IGGC lib, a sequel to Proba-
bilistic Logic Networks published in 2011; and the two parts of this book.
The ideas described in this book have been the collaborative creation of multiple overlapping
communities of people over a long period of time. The vast bulk of the writing here was done by
Ben Goertzel; but Cassio Pennachin and Nil Geisweiller made sufficient writing, thinking and
editing contributions over the years to more than merit their inclusion of co-authors. Further,
many of the chapters here have co-authors beyond the three main co-authors of the book; and
EFTA00623764
ix
the set of chapter co-authors does not exhaust the set of significant contributors to the ideas
presented.
The core concepts of the CogPrime design and the underlying theory were conceived by Ben
Goertzel in the period 1995-1996 when he was a Research Fellow at the University of Western
Australia; but those early ideas have been elaborated and improved by many more people than
can be listed here (as well as by Ben's ongoing thinking and research). The collaborative design
process ultimately resulting in CogPrime started in 1997 when Intelligenesis Corp. was formed
- the Webmind Al Engine created in Intelligenesis's research group during 1997-2001 was the
predecessor to the Novamente Cognition Engine created at Novamente LLC during 2001-2008,
which was the predecessor to CogPrime.
Acknowledgements
For sake of simplicity, this acknowledgements section is presented from the perspective of the
primary author, Ben Goertzel. Ben will thus begin by expressing his thanks to his primary
co-authors, Cassio Pennachin (collaborator since 1998) and Nil Geisweiller (collaborator since
2005). Without outstandingly insightful, deep-thinking colleagues like you, the ideas presented
here - let alone the book itself- would not have developed nearly as effectively as what has
happened. Similar thanks also go to the other OpenCog collaborators who have co-authored
various chapters of the book.
Beyond the co-authors, huge gratitude must also be extended to everyone who has been
involved with the OpenCog project, and/or was involved in Novamente LLC and Webmind Inc.
before that. We are grateful to all of you for your collaboration and intellectual companionship!
Building a thinking machine Ls a huge project, too big for any one human; it will take a team
and I'm happy to be part of a great one. It is through the genius of human collectives, going
beyond any individual human mind, that genius machines are going to be created.
A tiny, incomplete sample from the long list of those others deserving thanks is:
• Ken Silverman and Gwendalin Qi Aranya (formerly Gwen Goertzel), both of whom listened
to me talk at inordinate length about many of the ideas presented here a long, long time
before anyone else was interested in listening. Ken and I schemed some AGI designs at
Simon's Rock College in 1983, years before we worked together on the Webmind AI Engine.
• Allan Combs, who got me thinking about consciousness in various different ways, at a very
early point in my career. I'm very pleased to still count Allan as a friend and sometime
collaborator! Fred Abraham as well, for introducing me to the intersection of chaos theory
and cognition, with a wonderful flair. George Christca, a deep Al/math/physim thinker from
Perth. for re-awakening my interest in attractor neural nets and their cognitive implications,
in the mid-1990s.
• All of the 130 staff of Webmind Inc. during 1998-2001 while that remarkable, ambitious,
peculiar AGI-oriented firm existed. Special shout-outs to the "Voice of Reason" Pei Wang
and the "Siberian Madmind" Anton Kolonin, Mike Ross, Cate Hartley, Karin Verspoor and
the tragically prematurely deceased Jeff Pressing (compared to whom we are all mental
midgets), who all made serious conceptual contributions to my thinking about AGI. Lisa
Pazer and Andy Sicilian who made Webmind happen on the business side. And of course
Cassio Pennachin, a co-author of this book; and Ken Silverman, who co-architected the
whole Webmind system and vision with me from the start.
EFTA00623765
x
• The Webmind Diehards, who helped begin the Novamente project that succeeded Webmind
beginning in 2001: Cassio Pennachin, Stephan Vladimir Bugaj, Takuo Henmi, Matthew
lkle', Thiago Maia, Andre Senna, Guilhenne Lamacie and Saulo Pinto
• Those who helped get the Novamente project off the ground and keep it progressing over the
years, including some of the Webmind Diehards and also Moshe Looks, Bruce Klein, Izabela
Lyon Freire, Chris Poulin, Murilo Queiroz, Predrag Janicic, David Hart, Ari Heljakka, Hugo
Pinto, Deborah Duong, Paul Prueitt, Glenn Tarbox, Nil Geisweiller and Cassio Pennachin
(the co-authors of this book), Sibley Verbeck, Jeff Reed, Pejman Makhfi, Welter Silva,
Lukasz Kaiser and more
• All theme who have helped with the OpenCog system, including Linas Vepstas, Joel Pitt,
Jared Wigmore / Jade O'Neill, Zhenhua Cal, Deheng Huang, Shujing Ke, Lake Watkins,
Alex van der Peet, Samir Araujo, Fabricio Silva, Yang Ye, Shuo Chen, Michel Drenthe, Ted
Sanders, Gustavo Gain and of course Nil and Cassio again. Tyler Emerson and Eliezer
Yudkowsky, for choosing to have the Singularity Institute for Al (now MIR1) provide seed
funding for OpenCog.
• The numerous members of the AGI community who have tossed around AGI ideas with me
since the first AGI conference in 2006, including but definitely not limited to: Stan Franklin,
Juergen Schmidhuber, Marcus Rutter, Kai-Uwe Kuehnberger, Stephen Reed, Blerim Enruli,
Kristinn Thorisson, Joscha Bach, Abram Demski, hamar Arel, Mark Waser, Randal Koene,
Paul Rosenbloom, Zhongzhi Shi, Steve Omohundro, Bill Hibbard, Eray Ozkural, Brandon
Rohrer, Ben Johnston, John Laird, Shane Legg, Selmer Brin&sjord, Anders Sandberg, Alexei
Samsonovich, Wlodek Duch, and more
• The inimitable "Artilect Warrior" Hugo de Gans, who (when he was working at Xiamen
University) got me started working on AGI in the Orient (and introduced me to my wife
Ruiting in the process). And Changle Zhou, who brought Hugo to Xiamen and generously
shared his brilliant research students with Hugo and me. And Mb Jiang, collaborator of
Hugo and Changle, a deep AGI thinker who is helping with OpenCog theory and practice
at time of writing.
• Gino Yu, who got me started working on AGI here in Hong Kong, where I am living at time
of writing. As of 2013 the bulk of OpenCog work is occurring in Hong Kong via a research
grant that Gino and I obtained together
• Dan Stoicescu, whose funding helped Novamente through some tough times.
• Jeffrey Epstein, whose visionary funding of my AGI research has helped me through a
number of tight spots over the years. At time of writing, Jeffrey is helping support the
OpenCog Hong Kong project.
• Zeger Karssen, founder of Atlantis Press, who conceived the Thinking Machines book series
in which this book appears, and who has been a strong supporter of the AGI conference
series from the beginning
• My wonderful wife Ruiting Lian, a source of fantastic amounts of positive energy for me
since we became involved four years ago. Ruiting has listened to me discuss the ideas
contained here time and time again, often with judicious and insightful feedback (as she
is an excellent AI researcher in her own right); and has been wonderfully tolerant of me
diverting numerous evenings and weekends to getting this book finished (as well as to other
AGI-related pursuits). And my parents Ted and Carol and kids Zar, Zeb and Zade, who
have also indulged me in discussions on many of the themes discussed here on countless
occasions! And my dear, departed grandfather Leo Zwell, for getting me started in science.
EFTA00623766
xi
• Crunchkin and Pumpkin, for regularly getting me away from the desk to stroll around the
village where we live; many of my best ideas about AGI and other topics have emerged
while walking with my furry, four-legged family members
September 2013
Ben Goertzet
EFTA00623767
EFTA00623768
Contents
1
Introduction
1
1.1 AI Returns to Its Roots
1
1.2 AGI versus Narrow AI
2
1.3 CogPrime
3
1.4 The Secret Sauce
3
1.5 Extraordinary Proof?
4
1.6 Potential Approaches to AGI
6
1.6.1 Build AGI from Narrow AI
6
1.6.2 Enhancing Chatbots
6
1.6.3 Emulating the Brain
6
1.6.4 Evolve an AGI
7
1.6.5 Derive an AGI design mathematically
7
1.6.6 Use heuristic computer science methods
8
1.6.7 Integrative Cognitive Architecture
8
1.6.8 Can Digital Computers Really Be Intelligent?
8
1.7 Five Key Words
9
1.7.1 Memory and Cognition in CogPrime
10
1.8 Virtually and Robotically Embodied Al
11
1.9 Language Learning
12
1.10 AGI Ethics
12
1.11 Structure of the Book
13
1.12 Key Claims of the Book
13
Section I Artificial and Natural General Intelligence
2
What Is Human-Like General Intelligence?
19
2.1 Introduction
19
2.1.1 What Is General Intelligence?
19
2.1.2 What Is Human-like General Intelligence?
20
2.2 Commonly Recognized Aspects of Human-like Intelligence
20
2.3 Further Characterizations of Humanlike Intelligence
24
2.3.1 Competencies Characterizing Human-like Intelligence
24
2.3.2 Gardner's Theory, of Multiple Intelligences
25
EFTA00623769
xiv
2.4
2.5
Contents
2.3.3 Newell's Criteria for a Human Cognitive Architecture
26
2.3.4 intelligence and Creativity
26
Preschool as a View into Human-like General Intelligence
27
2.4.1 Design for an AGI Preschool
28
Integrative and Synergetic Approaches to Artificial General Intelligence
29
2.5.1 Achieving Humanlike Intelligence via Cognitive Synergy
30
3
A Patternist Philosophy of Mind
35
3.1 Introduction
35
3.2 Some Patternist Principles
35
3.3 Cognitive Synergy
40
3.4 The General Structure of Cognitive Dynamics: Analysis and Synthesis
42
3.4.1
Component-Systems and Self-Generating Systems
42
3.4.2 Analysis and Synthesis
43
3.4.3
The Dynamic of Iterative Analysis and Synthesis
46
3.4.4
Self and Focused Attention as Approximate Attractors of the Dynamic
of Iterated Forward-Analysis
47
3.4.5 Conclusion
50
3.5 Perspectives on Machine Consciousness
51
3.6 Postscript: Formalizing Pattern
53
4
Brief Survey of Cognitive Architectures
57
4.1 Introduction
57
4.2 Symbolic Cognitive Architectures
58
4.2.1 SOAR
60
4.2.2 ACT-R
61
4.2.3 Cyc and Texai
62
4.2.4 NARS
63
4.2.5 GLAIR and SNePS
64
4.3 Emergentist Cognitive Architectures
65
4.3.1 DeSTIN: A Deep Reinforcement Learning Approach to AGI
66
4.3.2 Developmental Robotics Architectures
72
4.4 Hybrid Cognitive Architectures
73
4.4.1 Neural versus Symbolic; Global versus Local
75
4.5 Globalist versus Localist Representations
78
4.5.1 CLARION
79
4.5.2 The Society of Mind and the Emotion Machine
80
4.5.3 DUAL
80
4.5.4 4D/RCS
81
4.5.5 PolyScheme
82
4.5.6 Joshua Blue
83
4.5.7 LIDA
84
4.5.8 The Global Workspace
84
4.5.9 The LIDA Cognitive Cycle
85
4.5.10 Psi and MicroPsi
88
4.5.11 The Emergence of Emotion in the Psi Model
91
4.5.12 Knowledge Representation, Action Selection and Planning in Psi
93
EFTA00623770
Contents
xv
4.5.13 Psi versus CogPrime
94
5
A Generic Architecture of Human-Like Cognition
95
5.1 Introduction
95
5.2 Key Ingredients of the Integrative Human-Like Cognitive Architecture Diagram
96
5.3 An Architecture Diagram for Human-Like General Intelligence
97
5.4 Interpretation and Application of the Integrative Diagram
104
6
A Brief Overview of CogPrime
107
6.1 Introduction
107
6.2 High-Level Architecture of CogPrime
107
6.3 Current and Prior Applications of OpenCog
108
6.3.1 Transitioning from Virtual Agents to a Physical Robot
110
6.4 Memory Types and Associated Cognitive Processes in CogPrime
110
6.4.1 Cognitive Synergy in PLN
111
6.5 Goal-Oriented Dynamics in CogPrime
113
6.6 Analysis and Synthesis Processes in CogPrime
114
6.7 Conclusion
116
Section II Toward a General Theory of General Intelligence
7
A Formal Model of Intelligent Agents
129
7.1 Introduction
129
7.2 A Simple Formal Agents Model (SRAM)
130
7.2.1 Goals
131
7.2.2 Memory Stores
132
7.2.3 The Cognitive Schematic
133
7.3 Toward a Formal Characterization of Real-World General Intelligence
135
7.3.1 Biased Universal Intelligence
136
7.3.2 Connecting Legg and Hutter's Model of Intelligent Agents to the Real
World
137
7.3.3 Pragmatic General Intelligence
138
7.3.4 Incorporating Computational Cost
139
7.3.5 Assessing the Intelligence of Real-World Agents
139
7.4 Intellectual Breadth: Quantifying the Generality of an Agent's Intelligence
141
7.5 Conclusion
142
8
Cognitive Synergy
143
8.1 Cognitive Synergy
143
8.2 Cognitive Synergy
144
8.3 Cognitive Synergy in CogPrime
146
8.3.1 Cognitive Processes in CogPrime
146
8.4 Some Critical Synergies
149
8.5 The Cognitive Schematic
151
8.6 Cognitive Synergy for Procedural and Declarative Learning
153
8.6.1 Cognitive Synergy in MOSES
153
8.6.2 Cognitive Synergy in PLN
155
8.7 Is Cognitive Synergy Tricky'?
157
EFTA00623771
xvi
Contents
8.7.1 The Puzzle: Why Is It So Hard to Measure Partial Progress Toward
Human-Level AGI?
157
8.7.2 A Possible Answer: Cognitive Synergy is Tricky'
158
8.7.3 Conclusion
159
9
General Intelligence in the Everyday Human World
161
9.1 Introduction
161
9.2 Some Broad Properties of the Everyday World That Help Structure Intelligence 162
9.3 Embodied Communication
163
9.3.1 Generalizing the Embodied Communication Prior
166
9.4
Naive Physics
166
9.4.1
Objects, Natural Units and Natural Kinds
167
9.4.2 Events, Processes and Causality
168
9.4.3
Stuffs, States of Matter, Qualities
168
9.4.4 Surfaces, Limits, Boundaries, Media
168
9.4.5 What Kind of Physics Is Needed to Foster Human-like Intelligence?
169
9.5 Folk Psychology
170
9.5.1 Motivation, Requiredness, Value
171
9.6
Body and Mind
171
9.6.1 The Human Sensorium
171
9.6.2 The Human Body's Multiple Intelligences
172
9.7
The Extended Mind and Body
176
9.8 Conclusion
176
10 A Mind-World Correspondence Principle
177
10.1 Introduction
177
10.2 What Might a General Theory, of General Intelligence Look Like?
178
10.3 Steps Toward A (Formal) General Theory of General Intelligence
179
10.4 The Mind-World Correspondence Principle
180
10.5 How Might the Mind-World Correspondence Principle Be Useful?
181
10.6 Conclusion
182
Section III Cognitive and Ethical Development
11 Stages of Cognitive Development
187
11.1 Introduction
187
11.2 Piagetan Stages in the Context of a General Systems Theory of Development 188
11.3 Piaget's Theory of Cognitive Development
188
11.3.1 Perry's Stages
192
11.3.2 Keeping Continuity in Mind
192
11.4 Piaget's Stages in the Context of Uncertain Inference
193
11.4.1 The Infantile Stage
195
11.4.2 The Concrete Stage
196
11.4.3 The Formal Stage
200
11.4.4 The Reflexive Stage
202
EFTA00623772
Contents
xvii
12 The Engineering and Development of Ethics
205
12.1 Introduction
205
12.2 Review of Current Thinking on the Risks of AGI
206
12.3 The Value of an Explicit Goal System
209
12.4 Ethical Synergy
210
12.4.1 Stages of Development of Declarative Ethics
211
12.4.2 Stages of Development of Empathic Ethics
214
12.4.3 An Integrative Approach to Ethical Development
215
12.4.4 Integrative Ethics and Integrative AGI
216
12.5 Clarifying the Ethics of Justice: Extending the Golden Rule in to a
Multifactorial Ethical Model
219
12.5.1 The Golden Rule and the Stages of Ethical Development
222
12.5.2 The Need for Context-Sensitivity and Adaptiveness in Deploying
Ethical Principles
223
12.6 The Ethical Treatment of AGIs
226
12.6.1 Possible Consequences of Depriving AGIs of Freedom
228
12.6.2 AGI Ethics as Boundaries Between Humans and AGIs Become Blurred
229
12.7 Possible Benefits of Closely Linking AGIs to the Global Brain
230
12.7.1 The Importance of Fostering Deep, Consensus-Building Interactions
Between People with Divergent Views
231
12.8 Possible Benefits of Creating Societies of AGIs
233
12.9 AGI Ethics As Related to Various Future Scenarios
234
12.9.1 Capped Intelligence Scenarios
234
12.9.2 Superintelligent Al: Soft-Takeoff Scenarios
235
12.9.3 Superintelligent AI: Hard-Takeoff Scenarios
235
12.9.4 Global Brain Mindplex Scenarios
237
12.10Conclusion: Eight Ways to Bias AGI Toward Friendliness
239
12.10.1Encourage Measured Co-Advancement of AGI Software and AGI Ethics
Theory
241
12.10.2Develop Advanced AGI Sooner Not Later
241
Section IV Networks for Explicit and Implicit Knowledge Representation
13 Local, Global and Glocal Knowledge Representation
245
13.1 Introduction
245
13.2 Localized Knowledge Representation using Weighted, Labeled Hypergraphs
246
13.2.1 Weighted, Labeled Hypergraphs
246
13.3 Atoms: Their Types and Weights
247
13.3.1 Some Basic Atom Types
247
13.3.2 Variable Atoms
249
13.3.3 Logical Links
251
13.3.4 Temporal Links
252
13.3.5 Associative Links
253
13.3.6 Procedure Nodes
254
13.3.7 Links for Special External Data Types
254
13.3.8 Truth Values and Attention Values
255
13.4 Knowledge Representation via Attractor Neural Networks
256
EFTA00623773
xviii
Contents
13.4.1 The Hopfield neural net model
256
13.4.2 Knowledge Representation via Cell Assemblies
257
13.5 Neural Foundations of Learning
258
13.5.1 Hebbian Learning
258
13.5.2 Virtual Synapses and Hebbian Learning Between Assemblies
258
13.5.3 Neural Darwinism
259
13.6 Glocal Memory
260
13.6.1 A Semi-Formal Model of Glocal Memory
262
13.6.2 Glocal Memory in the Brain
263
13.6.3 Glocal Hopfield Networks
268
13.6.4 Neural-Symbolic Glocality in CogPrime
269
14 Representing Implicit Knowledge via Hypergraphs
271
14.1 Introduction
271
14.2 Key Vertex and Edge Types
271
14.3 Derived Hypergraphs
272
14.3.1 SMEPH Vertices
272
14.3.2 SMEPH Edges
273
14.4 Implications of Patternist Philosophy for Derived Hypergraphs of Intelligent
Systems
274
14.4.1 SMEPH Principles in CogPrime
276
15 Emergent Networks of Intelligence
279
15.1 Introduction
279
15.2 Small World Networks
280
15.3 Dual Network Structure
281
15.3.1 Hierarchical Networks
281
15.3.2 Associative, Heterarchical Networks
282
15.3.3 Dual Networks
284
Section V A Path to Human-Level AGI
16 AGI Preschool
289
16.1 Introduction
289
16.1.1 Contrast to Standard AI Evaluation Methodologies
290
16.2 Elements of Preschool Design
291
16.3 Elements of Preschool Curriculum
292
16.3.1 Preschool in the Light of Intelligence Theory
293
16.4 Task-Based Assessment in AGI Preschool
295
16.5 Beyond Preschool
298
16.6 Issues with Virtual Preschool Engineering
298
16.6.1 Integrating Virtual Worlds with Robot Simulators
301
16.6.2 BlocksNBeads World
301
17 A Preschool-Based Roadmap to Advanced AGI
307
17.1 Introduction
307
17.2 Measuring Incremental Progress Toward Human-Level AGI
308
17.3 Conclusion
315
EFTA00623774
Contents
xix
18 Advanced Self-Modification: A Possible Path to Superhuman AGI
317
18.1 Introduction
317
18.2 Cognitive Schema Learning
318
18.3 Self-Modification via Supercompilation
319
18.3.1 Three Aspects of Supercompilation
321
18.3.2 Supercompilation for Goal-Directed Program Modification
322
18.4 Self-Modification via Theorem-Proving
323
A Glossary
325
A.1 List of Specialized Acronyms
325
A.2 Glossary of Specialized Terms
326
References
343
EFTA00623775
EFTA00623776
Chapter 1
Introduction
1.1 AI Returns to Its Roots
Our goal in this book is straightforward, albeit ambitious: to present a conceptual and technical
design for a thinking machine, a software program capable of the same qualitative sort of general
intelligence as human beings. It's not certain exactly how far the design outlined here will be
able to take us, but it seems plausible that once fully implemented, tuned and tested, it will be
able to achieve general intelligence at the human level and in some respects beyond.
Our ultimate aim is Artificial General Intelligence construed in the broadest sense, including
artificial creativity and artificial genius. We feel it is important to emphasize the extremely
broad potential of Artificial General Intelligence systems. The human brain is not built to be
modified, except via the slow process of evolution. Engineered AGI systems, built according to
designs like the one outlined here, will be much more susceptible to rapid improvement from
their initial state. It seems reasonable to us to expect that, relatively shortly after achieving the
first roughly human-level AGI system, AGI systems with various sorts of beyond-human-level
capabilities will be achieved.
Though these long-term goals are core to our motivations, we will spend much of our time here
explaining how we think we can make AGI systems do relatively simple things, like the things
human children do in preschool. The penultimate chapter of (Part 2 of) the book describes a
thought-experiment involving a robot playing with blocks, responding to the request "Build me
something I haven't seen before." We believe that preschool creativity contains the seeds of,
and the core structures and dynamics underlying, adult human level genius ... and new, as yet
unforeseen forms of artificial innovation.
Much of the book focuses on a specific AGI architecture, which we call CogPrime, and which
is currently in the midst of implementation using the OpenCog software framework. CogPrime
is large and complex and embodies a host of specific decisions regarding the various aspects of
intelligence. We don't view CogPrime as the unique path to advanced AGI, nor as the ultimate
end-all of AGI research. We feel confident there are multiple possible paths to advanced AGI,
and that in following any of these paths, multiple theoretical and practical lessons will be
learned, leading to modifications of the ideas possessed while along the early stages of the path.
But our goal here is to articulate one path that we believe makes sense to follow, one overall
design that we believe can work.
1
EFTA00623777
2
I Introduction
1.2 AGI versus Narrow AI
An outsider to the AI field might think this sort of book commonplace in the research literature,
but insiders know that's far from the truth. The field of Artificial Intelligence (AI) was founded
in the mid 1950s with the aim of constructing "thinking machines" - that is, computer systems
with human-like general intelligence, including humanoid robots that not only look but act
and think with intelligence equal to and ultimately greater than human beings. But in the
intervening years, the field has drifted far from its ambitious roots, and this book represents
part of a movement aimed at restoring the initial goals of the AI field, but in a manner powered
by new tools and new ideas far beyond those available half a century ago.
After the first generation of Al researchers found the task of creating human-level AGI very,
difficult given the technology, of their time, the Al field shifted focus toward what Ray Kurzweil
has called "narrow AI" - the understanding of particular specialized aspects of intelligence; and
the creation of AI systems displaying intelligence regarding specific tasks in relatively narrow
domains. In recent years, however, the situation has been changing. More and more researchers
have recognized the necessity - and feasibility - of returning to the original goals of the field.
In the decades since the 1950s, cognitive science and neuroscience have taught us a lot about
what a cognitive architecture needs to look like to support roughly human-like general intelli-
gence. Computer hardware has advanced to the point where we can build distributed systems
containing large amounts of RAM and large numbers of processors., carrying out complex tasks
in real time. The AI field has spawned a host of ingenious algorithms and data structures, which
have been successfully deployed for a huge variety of purposes.
Due to all this progress, increasingly, there has been a call for a transition from the current
focus on highly specialized "narrow AI" problem solving systems, back to confronting the more
difficult issues of "human level intelligence" and more broadly "artificial general intelligence
(AGI)." Recent years have seen a growing number of special sessions, workshops and confer-
ences devoted specifically to AGI, including the annual BICA (Biologically Inspired Cognitive
Architectures) AAAI Symposium, and the international AGI conference series (one in 2006,
and annual since 2008). And, even more exciting, as reviewed in Chapter 4, there are a number
of contemporary, projects focused directly and explicitly on AGI (sometimes under the name
"AGI", sometimes using related terms such as "Human Level Intelligence").
In spite of all this progress, however, we feel that no one has yet clearly articulated a detailed,
systematic design for an AGI, with potential to yield general intelligence at the human level
and ultimately beyond. In this spirit, our main goal in this lengthy two-part book is to outline
a novel design for a thinking machine - an AGI design which we believe has the capability to
produce software systems with intelligence at the human adult level and ultimately beyond.
Many of the technical details of this design have been previously presented online in a wikibook
V;Oel06J; and the basic ideas of the design have been presented briefly in a series of conference
papers IG PS1,03, CPPGOU, G00%1. But the overall design has not been presented in a coherent
and systematic way before this book. In order to frame this design properly, we also present
a considerable number of broader theoretical and conceptual ideas here, some more and some
less technical in nature.
EFTA00623778
1.4 The Secret Sauce
3
1.3 CogPrime
The AGI design presented here has not previously been granted a name independently of its
particular software implementations, but for the purposes of this book it needs one, so we've
christened it CogPrime . This fits with the name "OpenCogPrime" that has already been
used to describe the software implementation of CogPrime within the open-source OpenCog
AGI software framework. The OpenCogPrime software, right now, implements only a small
fraction of the CogPrime design as described here. However, OpenCog was designed specifically
to enable efficient, scalable implementation of the full CogPrime design (as well as to serve as a
more general framework for AGI R&D); and work currently proceeds in this direction, though
there is a lot of work still to be done and many challenges remain.
The CogPrime design is more comprehensive and thorough than anything that has been
presented in the literature previously, including the work of others reviewed in Chapter 4. It
covers all the key aspects of human intelligence, and explains how they interoperate and how
they can be implemented in digital computer software. Part 1 of this work outlines CogPrime at
a high level, and makes a number of more general points about artificial general intelligence and
the path thereto; then Part 2 digs deeply into the technical particulars of CogPrime. Even Part
2, however, doesn't explain all the details of CogPrime that have been worked out so far, and
it definitely doesn't explain all the implementation details that have gone into designing and
building OpenCogPrime. Creating a thinking machine is a large task, and even the intermediate
level of detail takes up a lot of pages.
1.4 The Secret Sauce
There is no consensus on why all the related technological and scientific progress mentioned
above has not yet yielded AI software systems with human-like general intelligence (or even
greater levels of brilliance!). However, we hypothesize that the core reason boils down to the
following three points:
• Intelligence depends on the emergence of certain high-level structures and dynamics across
a system's whole knowledge base;
• We have not discovered any one algorithm or approach capable of yielding the emergence
of these structures;
• Achieving the emergence of these structures within a system formed by integrating a number
of different AI algorithms and structures requires careful attention to the manner in which
I This brings up a terminological note: At several places in this Volume and the next we will refer to the current
CogPrime or OpenCog implementation; in all cases this refers to OpenCog as of late 2013. We realize the risk
of mentioning the state of our software system at time of writing: for future readers this may give the wrong
impression, because if our project goes well, more and more of CogPrime will get implemented and tested as
time goes on (e.g. within the OpenCog framework, under active development at time of writing). However, not
mentioning the current implementation at all seems an even worse course to us, since we feel readers will be
interested to know which of our ideas - at time of writing - have been honed via practice and which have not.
Online resources such as http: / /opencog . org may be consulted by readers curious about the current state
of the main OpenCog implementation; though in future forks of the code may be created, or other systems may
be built using some or all of the ideas in this book, etc.
EFTA00623779
4
I Introduction
these algorithms and structures are integrated; and so far the integration has not been done
in the correct way.
The human brain appears to be an integration of an assemblage of diverse structures and
dynamics, built using common components and arranged according to a sensible cognitive archi-
tecture. However, its algorithms and structures have been honed by evolution to work closely
together - they are very tightly inter-adapted, in the same way that the different organs of
the body are adapted to work together. Due to their close interoperation they give rise to the
overall systemic behaviors that characterize human-like general intelligence. We believe that
the main missing ingredient in AI so far is cognitive synergy: the fitting-together of differ-
ent intelligent components into an appropriate cognitive architecture, in such a way that the
components richly and dynamically support and assist each other, interrelating very closely in
a similar manner to the components of the brain or body and thus giving rise to appropriate
emergent structures and dynamics. This leads us to one of the central hypotheses underlying
the CogPrime approach to AGI: that the cognitive synergy ensuing from integrating
multiple symbolic and subsymbolic learning and memory components in an appro-
priate cognitive architecture and environment, can yield robust intelligence at the
human level and ultimately beyond.
The reason this sort of intimate integration has not yet been explored much is that it's difficult
on multiple levels, requiring the design of an architecture and its component algorithms with
a view toward the structures and dynamics that will arise in the system once it is coupled
with an appropriate environment. Typically, the AI algorithms and structures corresponding
to different cognitive functions have been developed based on divergent theoretical principles,
by disparate communities of researchers, and have been tuned for effective performance on
different tasks in different environments. Making such diverse components work together in a
truly synergetic and cooperative way is a tall order, yet we believe that this - rather than some
particular algorithm, structure or architectural principle - is the "secret sauce" needed to create
human-level AGI based on technologies available today.
1.5 Extraordinary Proof?
There is a saying that "extraordinary claims require extraordinary proof' and by that stan-
dard, if one believes that having a design for an advanced AGI is an extraordinary claim, this
book must be rated a failure. We don't offer extraordinary proof that CogPrime, once fully
implemented and educated, will be capable of human-level general intelligence and more.
It would be nice if we could offer mathematical proof that CogPrime has the potential we
think it does, but at the current time mathematics is simply not up to the job. We'll pursue
this direction briefly in Chapter 7 and other chapters, where we'll clarify exactly what kind
of mathematical claim "CogPrime has the potential for human-level intelligence" turns out to
be. Once this has been clarified, it will be clear that current mathematical knowledge does not
yet let us evaluate, or even fully formalize, this kind of claim. Perhaps one day rigorous and
detailed analyses of practical AGI designs will be feasible - and we look forward to that day -
but it's not here yet.
Also, it would of course be profoundly exciting if we could offer dramatic practical demon-
strations of CogPrime's capabilities. We do have a partial software implementation, in the
OpenCogPrime system, but currently the things OpenCogPrime does are too simple to really
EFTA00623780
1.5 Extraordinary Proof?
5
serve as proofs of CogPrime's power for advanced AGI. We have used some CogPrime ideas in
the OpenCog framework to do things like natural language understanding and data mining, and
to control virtual dogs in online virtual worlds; and this has been very useful work in multiple
senses. It has taught us more about the CogPrime design; it has produced some useful software
systems; and it constitutes fractional work building toward a full OpenCog based implemen-
tation of CogPrime. However, to date, the things OpenCogPrime has done are all things that
could have been done in different ways without the CogPrime architecture (though perhaps not
as elegantly nor with as much room for interesting expansion).
The bottom line is that building an AGI is a big job. Software companies like Microsoft spend
dozens to hundreds of man-years building software products like word processors and operating
systems, so it should be no surprise that creating a digital intelligence is also a relatively large-
scale software engineering project. As time advances and software tools improve, the number of
man-hours required to develop advanced AGI gradually decreases - but right now, as we write
these words, it's still a rather big job. In the OpenCogPrime project we are making a serious
attempt to create a CogPrime based AGI using an open-source development methodology,
with the open-source Linux operating system as one of our inspirations. But the open-source
methodology doesn't work magic either, and it remains a large project, currently at an early
stage. I emphasize this point so that readers lacking software engineering expertise don't take
the currently fairly limited capabilities of OpenCogPrime as somehow a damning indictment of
the potential of the CogPrime design. The design is one thing, the implementation another -
and the OpenCogPrime implementation currently encompasses perhaps one third to one half
of the key ideas in this book.
So we don't have extraordinary, proof to offer. What we aim to offer instead are clearly-
constructed conceptual and technical arguments as to why we think the CogPrime design has
dramatic AGI potential.
It is also possible to push back a bit on the common intuition that having a design for human-
level AGI is such an "extraordinary claim." It may be extraordinary relative to contemporary
science and culture, but we have a strong feeling that the AGI problem is not difficult in the
same ways that most people (including most Al researchers) think it is. We suspect that in
hindsight, after human-level AGI has been achieved, people will look back in shock that it took
humanity so long to come up with a workable AGI design. As you'll understand once you've
finished Part 1 of the book, we don't think general intelligence is nearly as "extraordinary"
and mysterious as it's commonly made out to be. Yes, building a thinking machine is hard -
but humanity has done a lot of other hard things before. It may seem difficult to believe that
human-level general intelligence could be achieved by something as simple as a collection of
algorithms linked together in an appropriate way and used to control an agent. But we suggest
that, once the first powerful AGI systems are produced, it will become apparent that engineering
human-level minds is not so profoundly different from engineering other complex systems.
All in all, we'll consider the book successful if a significant percentage of open-minded,
appropriately-educated readers come away from it scratching their chins and pondering: "Haunt.
You know, that just might work." and a small percentage come away thinking "Now that's an
initiative I'd really like to help with!".
EFTA00623781
6
I Introduction
1.6 Potential Approaches to AGI
In principle, there is a large number of approaches one might take to building an AGI, starting
from the knowledge, software and machinery, now available. This is not the place to review
them in detail, but a brief list seems apropos, including commentary on why these are not the
approaches we have chosen for our own research. Our intent here is not to insult or dismiss
these other potential approaches, but merely to indicate why, as researchers with limited time
and resources, we have made a different choice regarding where to focus our own energies.
1.6.1 Build AGI from Narrow AI
Most of the Al programs around today are "narrow Al" programs - they carry, out one particular
kind of task intelligently. One could try to make an advanced AGI by combining a bunch of
enhanced narrow AI programs inside some kind of overall framework.
However, we're rather skeptical of this approach because none of these narrow AI programs
have the ability to generalize across domains - and we don't see how combining them or ex-
tending them is going to cause this to magically emerge.
1.6.2 Enhancing Chatbots
One could seek to make an advanced AGI by taking a chatbot, and trying to improve its code
to make it actually understand what it's talking about. We have some direct experience with
this route, as in 2010 our Al consulting firm was contracted to improve Ray Kurzweil's online
chatbot "Ramona". Our new Ramona understands a lot more than the previous Ramona version
or a typical chatbot, due to using Wikipedia and other online resources, but still it's far from
an AGI.
A more ambitious attempt in this direction was Jason Hutchens' a-i.com project, which
sought to create a human child level AGI via development and teaching of a statistical learning
based chatbot (rather than the typical rule-based kind). The difficulty with this approach,
however, is that the architecture of a chatbot is fundamentally different from the architecture
of a generally intelligent mind. Much of what's important about the human mind is not directly
observable in conversations, so if you start from conversation and try to work toward an AGI
architecture from there, you're likely to miss many critical aspects.
1.6.3 Emulating the Brain
One can approach AGI by trying to figure out how the brain works, using brain imaging and
other tools from neuroscience, and then emulating the brain in hardware or software.
One rather substantial problem with this approach is that we don't really understand how
the brain works yet, because our software for measuring the brain is still relatively crude. There
is no brain scanning method that combines high spatial and temporal accuracy, and none is
EFTA00623782
1.6 Potential Approaches to AGI
7
likely to come about for a decade or two. So to do brain-emulation AGI seriously, one needs to
wait a while until brain scanning technology improves.
Current AI methods like neural nets that are loosely based on the brain, are really not brain-
like enough to make a serious claim at emulating the brain's approach to general intelligence.
We don't yet have any real understanding of how the brain represents abstract knowledge, for
example, or how it does reasoning (though the authors, like many others, have made some
speculations in this regard IGNIIII08I).
Another problem with this approach is that once you're done, what you get is something
with a very humanlike mind, and we already have enough of those! However, this is perhaps
not such a serious objection, because a digital-computer-based version of a human mind could
be studied much more thoroughly than a biology-based human mind. We could observe its
dynamics in real-time in perfect precision, and could then learn things that would allow us to
build other sorts of digital minds.
1.6.4 Evolve an AGI
Another approach is to try to run an evolutionary process inside the computer, and wait for
advanced AGI to evolve.
One problem with this is that we don't know how evolution works all that well. There's a
field of artificial life, but so far its results have been fairly disappointing. It's not yet clear how
much one can vary on the chemical structures that underly real biology, and still get powerful
evolution like we see in real biology. If we need good artificial chemistry, to get good artificial
biology, then do we need good artificial physics to get good artificial chemistry?
Another problem with this approach, of course, is that it might take a really long time.
Evolution took billions of years on Earth, using a massive amount of computational power. To
make the evolutionary approach to AGI effective, one would need some radical innovations to
the evolutionary process (such as, perhaps, using probabilistic methods like BOA IPe1051 or
NIOSES 11..0000] in place of traditional evolution).
1.6.5 Derive an AGI design mathematically
One can try to use the mathematical theory of intelligence to figure out how to make advanced
AGI.
This interests us greatly, but there's a huge gap between the rigorous math of intelligence
as it exists today and anything of practical value. As we'll discuss in Chapter 7, most of the
rigorous math of intelligence right now is about how to make AI on computers with dramati-
cally unrealistic amounts of memory or processing power. When one tries to create a theoretical
understanding of real-world general intelligence, one arrives at quite different sorts of consider-
ations, as we will roughly outline in Chapter 10. Ideally we would like to be able to study the
CogPrime design using a rigorous mathematical theory of real-world general intelligence, but at
the moment that's not realistic. The best we can do is to conceptually analyze CogPrime and
its various components in terms of relevant mathematical and theoretical ideas; and perform
analysis of CogPrime's individual structures and components at varying levels of rigor.
EFTA00623783
8
I Introduction
1.6.6 Use heuristic computer science methods
The computer science field contains a number of abstract formalisms, algorithms and structures
that have relevance beyond specific narrow AI applications, yet aren't necessarily understood
as thoroughly as would be required to integrate them into the rigorous mathematical theory, of
intelligence. Based on these formalisms, algorithms and structures, a number of "single formal-
ism/algorithm focused" AGI approaches have been outlined, some of which will be reviewed in
Chapter 4. For example Pei Wang's NARS ("Non-Axiomatic Reasoning System") approach is
based on a specific logic which he argues to be the "logic of general intelligence" - so, while his
system contains many other aspects than this logic, he considers this logic to be the crux of the
system and the source of its potential power as an AGI system.
The basic intuition on the part of these "single formalism/algorithm focused" researchers
seems to be that there is one key formalism or algorithm underlying intelligence, and if you
achieve this key aspect in your AGI program, you're going to get something that fundamentally
thinks like a person, even if it has some differences due to its different implementation and
embodiment. On the other hand, it's also possible that this idea is philosophically incorrect:
that there is no one key formalism. algorithm, structure or idea underlying general intelligence.
The CogPrime approach is based on the intuition that to achieve human-level, roughly human-
like general intelligence based on feasible computational resources, one needs an appropriate
heterogeneous combination of algorithms and structures, each coping with different types of
knowledge and different aspects of the problem of achieving goals in complex environments.
1.6.7 Integrative Cognitive Architecture
Finally, to create advanced AGI one can try to build some sort of integrative cognitive architec-
ture: a software system with multiple components that each carry out some cognitive function,
and that connect together in a specific way to try to yield overall intelligence.
Cognitive science gives us some guidance about the overall architecture, and computer science
and neuroscience give us a lot of ideas about what to put in the different components. But still
this approach is very complex and there is a lot of need for creative invention.
This is the approach we consider most "serious" at present (at least until neuroscience ad-
vances further). And, as will be discussed in depth in these pages, this is the approach we've
chosen: CogPrime is an integrative AGI architecture.
1.6.8 Can Digital Computers Really Be Intelligent?
All the AGI approaches we've just mentioned assume that it's possible to make AGI on digital
computers. While we suspect this is correct, we must note that it isn't proven.
It might be that - as Penrose 11'mM', Hamerofflam87] and others have argued - we need
quantum computers or quantum gravity computers to make AGI. However, there is no evidence
of this at this stage. Of course the brain like all matter is described by quantum mechanics,
but this doesn't imply that the brain is a "macroscopic quantum system" in a strong sense
(like, say, a Bose-Einstein condensate). And even if the brain does use quantum phenomena in
EFTA00623784
1.7 Five Key Words
9
a dramatic way to carry out some of its cognitive proc
(a hypothesis for which there is no
current evidence), this doesn't imply that these quantum phenomena are necessary in order to
carry out the given cognitive processes. For example there is evidence that birds use quantum
nonlocal phenomena to carry, out navigation based on the Earth's magnetic fields IGRM± Ill;
yet scientists have built instruments that carry out the same functions without using any special
quantum effects. The importance of quantum phenomena in biology (except via their obvious
role in giving rise to biological phenomena describable via classical physics) remains a subject
of debate IAG B
IS
Quantum "magic" aside, it is also conceivable that building AGI is fundamentally impossible
for some other reason we don't understand. Without getting religious about it, it is rationally
quite passible that some aspects of the universe are beyond the scope of scientific methods.
Science is fundamentally about recognizing patterns in finite sets of bits (e.g. finite sets of
finite-precision observations), whereas mathematics recognizes many sets much larger than this.
Selmer Bringsjord IBM], and other advocates of "hypercomputing" approaches to intelligence,
argue that the human mind depends on massively large infinite sets and therefore can never be
simulated on digital computers nor understood via finite sets of finite-precision measurements
such as science deals with.
But again, while this sort of possibility is interesting to speculate about, there's no real reason
to believe it at this time. Brain science and AI are both very young sciences and the "working
hypothesis" that digital computers can manifest advanced AGI has hardly been explored at
all yet, relative to what will be passible in the next decades as computers get more and more
powerful and our understanding of neuroscience and cognitive science gets more and more
complete. The CogPrime AGI design presented here is based on this working hypothesis.
Many of the ideas in the book are actually independent of the "mind can be implemented
digitally" working hypothesis, and could apply to AGI systems built on analog, quantum or
other non-digital frameworks - but we will not pursue these possibilities here. For the moment,
outlining an AGI design for digital computers is hard enough! Regardless of speculations about
quantum computing in the brain, it seems clear that AGI on quantum computers is part of our
future and will be a powerful thing; but the description of a CogPrime analogue for quantum
computers will be left for a later work.
1.7 Five Key Words
As noted, the CogPrime approach lies squarely in the integrative cognitive architecture camp.
But it is not a haphazard or opportunistic combination of algorithms and data structures. At
bottom it is motivated by the patternist philosophy of mind laid out in Ben Goertzel's book
The Hidden Pattern [Goe06a1, which was in large part a summary and reformulation of ideas
presented in a series of books published earlier by the same author [GociMI, roc93al, roc93H,
roe97], IGor011. A few of the core ideas of this philosophy are laid out in Chapter 3, though
that chapter is by no means a thorough summary.
One way to summarize some of the most important yet commonsensical parts of the patternist
philosophy of mind, in an AGI context, is to list five words: perception, memory, prediction,
action, goals.
In a phrase: "A mind uses perception and memory to make predictions about
which actions will help it achieve its goals."
EFTA00623785
10
1 Introduction
This ties in with the ideas of many other thinkers, including Jeff Hawkins"'memory/predic-
tion" theory II l I306], and it also speaks directly to the formal characterization of intelligence
presented in Chapter 7: general intelligence as "the ability to achieve complex goals in complex
environments."
Naturally the goals involved in the above phrase may be explicit or implicit to the intelligent
agent, and they may shift over time as the agent develops.
Perception is taken to mean pattern recognition: the recognition of (novel or familiar) pat-
terns in the environment or in the system itself. Memory is the storage of already-recognized
patterns, enabling recollection or regeneration of these patterns as needed. Action is the for-
mation of patterns in the body and world. Prediction is the utilization of temporal patterns to
guess what perceptions will be seen in the future, and what actions will achieve what effects in
the future - in essence, prediction consists of temporal pattern recognition, plus the (implicit
or explicit) assumption that the universe possesses a "habitual tendency" according to which
previously observed patterns continue to apply.
1.7.1 Memory and Cognition in CogPrime
Each of these five concepts has a lot of depth to it, and we won't say too much about them in
this brief introductory overview; but we will take a little time to say something about memory
in particular.
As we'll see in Chapter 7, one of the things that the mathematical theory of general intelli-
gence makes clear is that, if you assume your Al system has a huge amount of computational
resources, then creating general intelligence is not a big trick. Given enough computing power,
a very brief and simple program can achieve any computable goal in any computable environ-
ment, quite effectively. Marcus Hutter's A/X.fa design tHut05J gives one way of doing this,
backed up by rigorous mathematics. Put informally, what this means is: the problem of AGI is
really a problem of coping with inadequate compute resources, just as the problem of natural
intelligence is really a problem of coping with inadequate energetic resources.
One of the key ideas underlying CogPrime Ls a principle called cognitive synergy, which
explains how real-world minds achieve general intelligence using limited resources, by appropri-
ately organizing and utilizing their memories.
This principle says that there are many different kinds of memory in the mind: sensory,
episodic, procedural, declarative, attentional, intentional. Each of them has certain learning
processes associated with it; for example, reasoning is associated with declarative memory.
Synergy arises here in the way the learning processes associated with each kind of memory have
got to help each other out when they get stuck, rather than working at cross-purposes.
Cognitive synergy is a fundamental principle of general intelligence - it doesn't tend to play
a central role when you're building narrow-Al systems.
In the CogPrime approach all the different kinds of memory are linked together in a single
meta-representation, a sort of combined semantic/neural network called the AtomSpace. It
represents everything from perceptions and actions to abstract relationships and concepts and
even a system's model of itself and others. When specialized representations are used for other
types of knowledge (e.g. program trees for procedural knowledge, spatiotemporal hierarchies
for perceptual knowledge) then the knowledge stored outside the AtomSpace is represented via
EFTA00623786
1.8 Virtually and Robotically Embodied Al
tokens (Atoms) in the AtomSpace, allowing it to be located by various cognitive processes, and
associated with other memory items of any type.
So for instance an OpenCog AI system has an AtomSpace, plus some specialized knowledge
stores linked into the AtomSpace; and it also has specific algorithms acting on the AtomSpace
and appropriate specialized stores corresponding to each type of memory. Each of these algo-
rithms is complex and has its own story; for instance (an incomplete list, for more detail see
the following section of this Introduction):
• Declarative knowledge is handled using Probabilistic Logic Networks, described in Chapter
34 and others;
• Procedural knowledge is handled using MOSES, a probabilistic evolutionary learning algo-
rithm described in Chapter 21 and others;
• Attentional knowledge is handled by ECAN (economic attention allocation), described in
Chapter 23 and others:
• OpenCog contains a language comprehension system called RelEx that takes English sen-
tences and turns them into nodes mid links in the AtomSpace. It's currently being ex-
tended to handle Chinese. RelEx handles mostly declarative knowledge but also involves
some procedural knowledge for linguistic phenomena like reference resolution and semantic
disambiguation.
But the crux of the CogPrime cognitive architecture is not any particular cognitive process,
but rather the way they all work together using cognitive synergy.
1.8 Virtually and Robotically Embodied AI
Another issue that will arise frequently in these pages is embodiment. There's a lot of debate in
the AI community over whether embodiment is necessary for advanced AGI or not. Personally,
we doubt it's necesbary but we think it's extremely convenient, and are thus considerably
interested in both virtual world and robotic embodiment. The CogPrime architecture itself is
neutral on the issue of embodiment, and it could be used to build a mathematical theorem
prover or an intelligent chat bot just as easily as an embodied AGI system. However, most of
our attention has gone into figuring out how to use CogPrime to control embodied agents in
virtual worlds, or else (to a lesser extent) physical robots. For instance, during 2011-2012 we
are involved in a Hong Kong government funded project using OpenCog to control video game
agents in a simple game world modeled on the game Minecraft IGPC±
Current virtual world technology has significant limitations that make them far less than
ideal from an AGI perspective, and in Chapter 16 we will discuss how they can be remedied.
However, for the medium-term future virtual worlds are not going to match the natural world
in terms of richness and complexity - and so there's also something to be said for physical
robots that interact with all the messiness of the real world.
With this in mind, in the Artificial Brain Lab at Xiamen University in 2009.2010, we con-
ducted some experiments using OpenCog to control the Nao humanoid robot [GD091. The goal
of that work was to take the same code that controls the virtual dog and use it to control the
physical robot. But it's harder because in this context we need to do real vision processing
and real motor control. A similar project is being undertaken in Hong Kong at time of writ-
ing, involving a collaboration between OpenCog Al developers and David Hanson's robotics
EFTA00623787
12
1 Introduction
group. One of the key ideas involved in this project is explicit integration of subcymbolic and
more symbolic subsystems. For instance, one can use a purely subsymbolic, hierarchical pattern
recognition network for vision processing, and then link its internal structures into the nodes
and links in the AtomSpace that represent concepts. So the subsymbolic and symbolic systems
can work harmoniously and productively together, a notion we will review in more detail in
Chapter 26.
1.9 Language Learning
One of the subtler aspects of our current approach to teaching CogPrime is language learning.
Three relatively crisp and simple approaches to language learning would be:
• Build a language processing system using hand-coded grammatical rules, based on linguistic
theory;
• Train a language processing system using supervised, unsupervised or semisupervised learn-
ing, based on computational linguistics;
• Have an AI system learn language via experience, based on imitation and reinforcement and
experimentation, without any built-in distinction between linguistic behaviors and other
behaviors.
While the third approach is conceptually appealing, our current approach in CogPrime (de-
scribed in a series of chapters in Part 2) is none of the above, but rather a combination of the
above. OpenCog contains a natural language processing system built using a combination of
the rule-based and statistical approaches, which has reasonably adequate functionality; and our
plan is to use it as an initial condition for ongoing adaptive improvement based on embodied
communicative experience.
1.10 AGI Ethics
When discussing AGI work with the general public, ethical concerns often arise. Science fic-
tion films like the Terminator series have raised public awareness of the possible dangers of
advanced AGI systems without correspondingly advanced ethics. Non-profit organizations like
the Singularity Institute for AI ( http://singinstorg) have arisen specifically to raise attention
about, and fester research on, these potential dangers.
Our main focus here is on how to create AGI, not how to teach an AGI human ethical
principles. However, we will address the latter issue explicitly in Chapter 12, and we do think it's
important to emphasize that AGI ethics has been at the center of the design process throughout
the conception and development of CogPrime and OpenCog.
Broadly speaking there are (at least) two major threats related to advanced AGI. One is
that people might use AGIs for bad ends; and the other is that, even if an AGI is made with
the best intentions, it might reprogram itself in a way that causes it to do something terrible.
If it's smarter than us, we might be watching it carefully while it does this, and have no idea
what's going on.
EFTA00623788
1.12 Key Claims of the Book
13
The best way to deal with this second "bad AGI" problem is to build ethics into your AGI
architecture - and we have done this with CogPrime, via creating a goal structure that explicitly
supports ethics-directed behavior, and via creating an overall architecture that supports "ethical
synergy" along with cognitive synergy. In short, the notion of ethical synergy is that there are
different kinds of ethical thinking associated with the different kinds of memory and you want
to be sure your AGI has all of them, and that it uses them together effectively.
In order to create AGI that is not only intelligent but beneficial to other sentient beings,
ethics has got to be part of the design and the roadmap. As we teach our AGI systems, we need
to lead them through a series of instructional and evaluative tasks that move from a primitive
level to the mature human level - in intelligence, but also in ethical judgment.
1.11 Structure of the Book
The book Ls divided into two parts. The technical particulars of CogPrime are discussed in Part
2; what we deal with in Part 1 are important preliminary, and related matters such as:
• The nature of real-world general intelligence, both conceptually and from the perspective
of formal modeling (Section I).
• The nature of cognitive and ethical development for humans and AGIs (Section III).
• The high-level properties of CogPrime, including the overall architecture and the various
sorts of memory involved (Section IV).
• What kind of path may viably lead us from here to AGI, with focus laid on preschool-type
environments that easily foster humanlike cognitive development. Various advanced aspects
of AGI systems, such as the network and algebraic structures that may emerge from them,
the ways in which they may self-modify, and the degree to which their initial design may
constrain or guide their future state even after long periods of radical self-improvement
(Section V).
One point made repeatedly throughout Part 1, which is worth emphasizing here, is the current
lack of a really rigorous and thorough general technical theory of general intelligence. Such a
theory, if complete, would be incredibly helpful for understanding complex AGI architectures
like CogPrime. Lacking such a theory, we must work on CogPrime and other such systems using
a combination of theory, experiment and intuition. This is not a bad thing, but it will be very
helpful if the theory and practice of AGI are able to grow collaboratively together.
1.12 Key Claims of the Book
We will wrap up this Introduction with a systematic list of some of the key claims to be argued
for in these pages. Not all the terms and ideas in these claims have been mentioned in the
preceding portions of this Introduction, but we hope they will be reasonably clear to the reader
anyway, at least in a general sense. This list of claims will be revisited in Chapter 49 near the
end of Part 2, where we will look bark at the ideas and arguments that have been put forth in
favor of them, in the intervening chapters.
EFTA00623789
14
1 Introduction
In essence this is a list of claims such that, if the reader accepts these claims, they should
probably accept that the CogPrime approach to AGI is a viable one. On the other hand if the
reader rejects one or more of these claims, they may find one or more aspects of CogPrime
unacceptable for some reason.
Without further ado, now, the claims:
1. General intelligence (at the human level and ultimately beyond) can be achieved via creating
a computational system that seeks to achieve its goals, via using perception and memory
to predict which actions will achieve its goals in the contexts in which it finds itself.
2. To achieve general intelligence in the context of human-intelligence-friendly environments
and goals using feasible computational resources, it's important that an AGI system can
handle different kinds of memory (declarative, procedural. episodic, sensory, intentional,
attentional) in customized but interoperable ways.
3. Cognitive synergy: It's important that the cognitive processes associated with different kinds
of memory can appeal to each other for assistance in overcoming bottlenecks in a manner
that enables each cognitive process to act in a manner that is sensitive to the particularities
of each others' internal representations, and that doesn't impose unreasonable delays on
the overall cognitive dynamics.
4. As a general principle, neither purely localized nor purely global memory, is sufficient for
general intelligence under feasible computational resources; "glocal" memory will be re-
quired.
5. To achieve human-like general intelligence, it's important for an intelligent agent to have
sensory data and motoric affordances that roughly emulate those available to humans.
We don't know exactly how close this emulation needs to be, which means that our AGI
systems and platforms need to support fairly flexible experimentation with virtual-world
and/or robotic infrastructures.
6. To work toward adult human-level, roughly human-like general intelligence, one fairly easily
comprehensible path is to use environments and goals reminiscent of human childhood, and
seek to advance one's AGI system along a path roughly comparable to that followed by
human children.
7. It is most effective to teach an AGI system aimed at roughly human-like general intelli-
gence via a mix of spontaneous learning and explicit instruction, and to instruct it via a
combination of imitation, reinforcement and correction, and a combination of linguistic and
nonlinguistic instruction.
8. One effective approach to teaching an AGI system human language is to supply it with
some in-built linguistic facility, in the form of rule-based and statistical-linguistics-based
NLP systems, and then allow it to improve and revise this facility based on experience.
9. An AGI system with adequate mechanisms for handling the key types of knowledge men-
tioned above, and the capability to explicitly recognize large-scale patterns in itself, should,
upon sustained interaction with an appropriate environment in pursuit of ap-
propriate goals, emerge a variety of complex structures in its internal knowledge network,
including, but not limited to:
• a hierarchical network, representing both a spatiotemporal hierarchy and an approxi-
mate "default inheritance" hierarchy, cross-linked
• a heterarchical network of associativity, roughly aligned with the hierarchical network
• a self network which is an approximate micro image of the whole network
EFTA00623790
1.12 Key Claims of the Book
15
• inter-reflecting networks modeling self and others, reflecting a "mirrorhouse" design
pattern
10. Given the strengths and weaknesses of current and near-future digital computers,
a. A (loosely) neural-symbolic network is a good representation for directly storing many
kinds of memory, and interfacing between those that it doesn't store directly;
b. Uncertain logic is a good way to handle declarative knowledge. 'lb deal with the prob-
lems facing a human-level AGI, an uncertain logic must integrate imprecise probability
and fuzziness with a broad scope of logical constructs. PLN is one good realization.
c. Programs are a good way to represent procedures (both cognitive and physical-action,
but perhaps not including low-level motor-control procedures).
d. Evolutionary program learning is a good way to handle difficult program learning prob-
lems. Probabilistic learning on normalized programs is one effective approach to evolu-
tionary program learning. MOSES is one good realization of this approach.
e. Multistart hill-climbing, with a strong Occam prior, is a good way to handle relatively
straightforward program learning problems.
f. Activation spreading and Hebbian learning comprise a reasonable way to handle atten-
tional knowledge (though other approaches, with greater overhead cost, may provide
better accuracy and may be appropriate in some situations).
• Artificial economics is an effective approach to activation spreading and Hebbian
learning in the context of neural-symbolic networks;
• ECAN is one good realization of artificial economics;
• A good trade-off between comprehensiveness and efficiency is to focus on two kinds
of attention: processor attention (represented in CogPrime by ShortTermlmpor-
tance) and memory attention (represented in CogPrime by LongTermImportance).
g. Simulation is a good way to handle episodic knowledge (remembered and imagined).
Running an internal world simulation engine is an effective way to handle simulation.
h. Hybridization of one's integrative neural-symbolic system with a spatiotemporally hier-
archical deep learning system is an effective way to handle representation and learning
of low-level sensorimotor knowledge. DeSTIN is one example of a deep learning system
of this nature that can be effective in this context.
i. One effective way to handle goals is to represent them declaratively, and allocate atten-
tion among them economically. CogPrime's PLN/ECAN based framework for handling
intentional knowledge is one good realization.
11. It is important for an intelligent system to have some way of recognizing large-scale pat-
terns in itself, and then embodying these patterns as new, localized knowledge items in
its memory. Given the use of a neural-symbolic network for knowledge representation, a
graph-mining based "map formation" heuristic is one good way to do this.
12. Occam's Razor: Intelligence is closely tied to the creation of procedures that achieve goals
in environments in the simplest possible way. Each of an AGI system's cognitive algorithms
should embody a simplicity bias in some explicit or implicit form.
13. An AGI system, if supplied with a commonsensically ethical goal system and an intentional
component based on rigorous uncertain inference, should be able to reliably achieve a much
higher level of commonsensically ethical behavior than any human being.
14. Once sufficiently advanced, an AGI system with a logic-based declarative knowledge ap-
proach and a program-learning-based procedural knowledge approach should be able to
EFTA00623791
16
1 Introduction
radically self-improve via a variety of methods, including supercompilation and automated
theorem-proving.
EFTA00623792
Section I
Artificial and Natural General Intelligence
EFTA00623793
EFTA00623794
Chapter 2
What Is Human-Like General Intelligence?
2.1 Introduction
CogPrime, the AGI architecture on which the bulk of this book focuses, is aimed at the creation
of artificial general intelligence that is vaguely human-like in nature, and possesses capabilities
at the human level and ultimately beyond.
Obviously this description begs some foundational questions, such as, for starters: What is
"general intelligence"? What is "human-like general intelligence"? What is "intelligence" at all?
Perhaps in the future there will exist a rigorous theory of general intelligence which applies
usefully to real-world biological and digital intelligences. In later chapters we will give some
ideas in this direction. But such a theory is currently nascent at best. So, given the present
state of science, these two questions about intelligence must be handled via a combination of
formal and informal methods. This brief, informal chapter attempts to explain our view on the
nature of intelligence in sufficient detail to place the discussion of CogPrime in appropriate
context, without trying to resolve all the subtleties.
Psychologists sometimes define human general intelligence using IQ tests and related instru-
ments - so one might wonder: why not just go with that? But these sorts of intelligence testing
approaches have difficulty even extending to humans from diverse cultures HIPOI:2j IFis0ll.
So it's clear that to ground AGI approaches that are not based on precise modeling of human
cognition, one requires a more ftmdamental understanding of the nature of general intelligence.
On the other hand, if one conceives intelligence too broadly and mathematically, there's a risk
of leaving the real human world too far behind. In this chapter (followed up in Chapters 9 and
7 with more rigor), we present a highly abstract understanding of intelligence-in-general, and
then portray human-like general intelligence as a (particularly relevant) special case.
2.1.1 What Is General Intelligence?
Many attempts to characterize general intelligence have been made; Legg and Butter ILII07a1
review over 70! Our preferred abstract characterization of intelligence is: the capability of a
system to choose actions maximizing its goal-achievement, based on its perceptions
and memories, and making reasonably efficient use of its computational resources
19
EFTA00623795
20
2 What Is Human-Like General Intelligence?
rector". A general intelligence is then understood as one that can do this for a variety of
complex goals in a variety of complex environments.
However, apart from positing definitions, it is difficult to say anything nontrivial about gen-
eral intelligence in general. Marcus Hutter
lut051 has demonstrated. using a characterization
of general intelligence similar to the one above, that a very simple algorithm called AIXI" can
demonstrate arbitrarily high levels of general intelligence, if given sufficiently immense com-
putational resources. This is interesting because it shows that (if we assume the universe can
effectively be modeled as a computational system) general intelligence is basically a problem of
computational efficiency. The particular structures and dynamics that characterize real-world
general intelligences like humans arise because of the need to achieve reasonable levels of intel-
ligence using modest space and time resources.
The "patternist" theory of mind presented in EGoe06al and briefly summarized in Chap-
ter 3 below presents a number of emergent structures and dynamics that are hypothesized to
characterize pragmatic general intelligence, including such things as system-wide hierarchical
and heterarchical knowledge networks, and a dynamic and self-maintaining self-model. Much of
the thinking underlying CogPrime has centered on how to make multiple learning components
combine to give rise to these emergent structures and dynamics.
2.1.2 What Is Human-like General Intelligence?
General principles like "complex goals in complex environments" and patternism are not suf-
ficient to specify the nature of human-like general intelligence. Due to the harsh reality of
computational resource restrictions, real-world general intelligences are necessarily biased to
particular classes of environments. Human intelligence is biased toward the physical, social and
linguistic environments in which humanity evolved, and if Al systems are to possess humanlike
general intelligence they must to some extent share these biases.
But what are these biases, specifically? This is a large and complex question, which we seek
to answer in a theoretically grounded way in Chapter 9. However, before turning to abstract
theory, one may also approach the question in a pragmatic way, by looking at the categories of
things that humans do to manifest their particular variety of general intelligence. This is the
task of the following section.
2.2 Commonly Recognized Aspects of Human-like Intelligence
It would be nice if we could give some sort of "standard model of human intelligence" in this
chapter, to set the context for our approach to artificial general intelligence - but the truth is
that there isn't any. What the cognitive science field has produced so far is better described as:
a broad set of principles and platitudes, plus a long, loosely-organized list of ideas and results.
Chapter 5 below constitutes an attempt to present an integrative architecture diagram for
human-like general intelligence, synthesizing the ideas of a number of different AGI and cognitive
theorists. However, though the diagram given there attempts to be inclusive, it nonetheless
contains many features that are accepted by only a plurality of the research community.
EFTA00623796
2.2 Commonly Recognized Aspects of Human-like Intelligence
21
The following list of key aspects of human-like intelligence has a better claim at truly being
generic and representing the consensus understanding of contemporary science. It was produced
by a very simple method: starting with the Wikipedia page for cognitive psychology, and then
adding a few items onto it based on scrutinizing the tables of contents of some top-ranked
cognitive psychology textbooks. There is some redundancy among list items, and perhaps also
some minor omissions (depending on how broadly one construes some of the items), but the
point is to give a broad indication of human mental functions as standardly identified in the
psychology field:
• Perception
— General perception
— Psychophysics
— Pattern recognition (the ability to correctly interpret ambiguous sensory information)
— Object and event recognition
— Time sensation (awareness and estimation of the passage of time)
• Motor Control
- Motor planning
- Motor execution
- Sensorimotor integration
• Categorization
- Category induction and acquisition
- Categorical judgement and classification
- Category representation and structure
- Similarity
• Memory
- Aging and memory
- Autobiographical memory
- Constructive memory
- Emotion and memory
- False memories
- Memory biases
- Long-term memory
- Episodic memory
- Semantic memory
- Procedural memory
- Short-term memory
- Sensory memory
- Working memory
• Knowledge representation
- Mental imagery
- Propositional encoding
- Imagery versus propositions as representational mechanisms
EFTA00623797
22
2 What Is Human-Like General Intelligence?
- Dual-coding theories
- Mental models
• Language
- Grammar and linguistics
- Phonetics and phonology
- Language acquisition
• Thinking
- Choice
- Concept formation
- Judgment and decision making
- Logic, formal awl natural reasoning
- Problem solving
- Planning
- Numerical cognition
- Creativity
• Consciousness
- Attention and Filtering (the ability to focus mental effort on specific stimuli whilst
excluding other stimuli from consideration)
— Access consciousness
— Phenomenal consciousness
• Social Intelligence
- Distributed Cognition
- Empathy
If there's nothing surprising to you in the above list, I'm not surprised! If you've read a
bit in the modern cognitive science literature, the list may even seem trivial. But it's worth
reflecting that 50 years ago, no such list could have been produced with the same level of broad
acceptance. And less than 100 years ago, the Western world's scientific understanding of the
mind was dominated by Freudian thinking; and not too long after that, by behaviorist thinking,
which argued that theorizing about what went on inside the mind made no sense, and science
should focus entirely on analyzing external behavior. The progress of cognitive science hasn't
made as many headlines as contemporaneous progress in neuroscience or computing hardware
and software, but it's certainly been dramatic. One of the reasons that AGI is more achievable
now than in the 1950s and 60s when the AI field began, is that now we understand the structures
and processes characterizing human thinking a lot better.
In spite of all the theoretical and empirical progress in the cognitive science field, however,
there is still no consensus among experts on how the various aspects of intelligence in the above
"human intelligence feature list" are achieved and interrelated. In these pages, however, for
the purpose of motivating CogPrime, we assume a broad integrative understanding roughly as
follows:
• Perception: There is significant evidence that human visual perception occurs using a
spatiotemporal hierarchy of pattern recognition modules, in which higher-level modules
EFTA00623798
2.2 Commonly Recognized Aspects of Human-like Intelligence
23
deal with broader spacetime regions, roughly as in the DeSTIN AGI architecture discussed
in Chapter 4. Further, there is evidence that each module carries out temporal predictive
pattern recognition as well as static pattern recognition. Audition likely utilizes a similar
hierarchy. Olfaction may use something more like a Hopfield attractor neural network, as
described in Chapter 13. The networks corresponding to different sense modalities have
multiple cross-linkages, more at the upper levels than the lower, and also link richly into
the parts of the mind dealing with other functions.
• Motor Control: This appears to be handled by a spat iotemporal hierarchy as well, in which
each level of the hierarchy corresponds to higher-level (in space and time) movements. The
hierarchy Ls very tightly linked in with the perceptual hierarchies, allowing sensorimotor
learning and coordination.
• Memory: There appear to be multiple distinct but tightly cross-linked memory systems,
corresponding to different sorts of knowledge such as declarative (facts and beliefs), proce-
dural. episodic, sensorimotor, attentional and intentional (goals).
• Knowledge Representation: There appear to be multiple base-level representational
systems; at least one corresponding to each memory system, but perhaps more than that.
Additionally there must be the capability to dynamically create new context-specific repre-
sentational systems founded on the base representational system.
• Language: While there is surely some innate biasing in the human mind toward learning
certain types of linguistic structure, it's also notable that language shares a great deal of
structure with other aspects of intelligence like social roles [C13001 and the physical world
reasol. Language appears to be learned based on biases toward learning certain types of
relational role systems; and language processing seems a complex mix of generic reason-
ing and pattern recognition processes with specialized acoustic and syntactic processing
routines.
• Consciousness is pragmatically well-understood using Boars' "global workspace" theory,
in which a small subset of the mind's content is summoned at each time into a "working
memory" aka "workspace" aka "attentional focus" where it is heavily processed and used to
guide action selection.
• Thinking is a diverse combination of processes encompassing things like categorization,
(crisp and uncertain) reasoning, concept creation, pattern recognition, and others; these
processes must work well with all the different types of memory and must effectively inte-
grate knowledge in the global workspace with knowledge in long-term memory.
• Social Intelligence seems closely tied with language and also with self-modeling; we model
ourselves in large part using the same specialized biases we use to help us model others.
None of the points in the above bullet list is particularly controversial, but neither are any
of them universally agreed-upon by experts. However, in order to make any progress on AGI
design one must make some commitments to particular cognition-theoretic understandings, at
this level and ultimately at more precise levels as well. Further, general philosophical analyses
like the patternist philosophy to be reviewed in the following chapter only provide limited
guidance here. Patternism provides a filter for theories about specific cognitive functions - it
rules out assemblages of cognitive-function-specific theories that don't fit together to yield a
mind that could act effectively as a pattern-recognizing, goal-achieving system with the right
internal emergent structures. But it's not a precise enough filter to serve as a sole guide for
cognitive theory even at the high level.
The above list of points leads naturally into the integrative architecture diagram presented
in Chapter 5. But that generic architecture diagram is fairly involved, and before presenting
EFTA00623799
24
2 What Is Human-Like General Intelligence?
it, we will go through some more background regarding human-like intelligence (in the rest
of this chapter), philosophy of mind (in Chapter 3) and contemporary AGI architectures (in
Chapter4).
2.3 Further Characterizations of Humanlike Intelligence
We now present a few complementary approaches to characterizing the key aspects of human-
like intelligence, drawn from different perspectives in the psychology and AI literature. These
different approaches all overlap substantially, which is good, yet each gives a slightly different
slant.
2.8.1 Competencies Characterizing Human-like Intelligence
First we give a list of key competencies characterizing human level intelligence resulting from
the the AGI Roaclmap Workshop held at the University of Knoxville in October 2008 r, which
was organized by Ben Goertzel and Itamar Arel. In this list, each broad competency area is
listed together with a number of specific competencies sub-areas within its scope:
1. Perception: vision, hearing, touch, proprioception, crossmodal
2. Actuation: physical skills, navigation, tool use
3. Memory: episodic, declarative, behavioral
4. Learning: imitation, reinforcement, interactive verbal instruction, written media, experi-
mentation
5. Reasoning: deductive, abductive, inductive, causal, physical, associational, categorization
6. Planning: strategic, tactical, physical, social
7. Attention: visual, social, behavioral
8. Motivation: subgoal creation, affect-based motivation, control of emotions
9. Emotion: expressing emotion, understanding emotion
10. Self: self-awareness, self-control, other-awareness
11. Social: empathy, appropriate social behavior, social communication, social inference, group
play, theory of mind
12. Communication: gestural, pictorial, verbal, language acquisition, cross-modal
13. Quantitative: counting, grounded arithmetic, comparison, measurement
14. Building/Creation: concept formation, verbal invention, physical construction, social
group formation
Clearly this list is getting at the same things as the textbook headings given in Section 2.2,
but with a different emphasis due to its origin among AGI researchers rather than cognitive
See ht t
//www.ece.ut k edu/ -it amar/AGI_Roadmap.html; participants included: Sam Adams, IBM
Research; Ben Goertzel, Novamente LLC; Ramer Arel, University of Tennessee; Joscha Bach, Institute of Cogni-
tive Science, University of Osnabruck, Germany; Robert Coop, University of Tennessee; Rod FUrlan, Singularity
Institute; Matthias Sellouts, Indiana University; J. Storrs Hall, Foresight Institute; Alexei Samsonovich, George
Mason University; Matt Schlesinger, Southern Illinois University; John Sowa, Vivomind Intelligence, Inc.; Stuart
C. Shapiro, University at Buffalo
EFTA00623800
2.3 Further Characterizations of H
anlike Intelligence
25
psychologists. As part of the AGI Roadmap project, specific tasks were created corresponding
to each of the sub-areas in the above list; we will describe some of these tasks in Chapter 17.
2.3.2 Gardner's Theory of Multiple Intelligences
The diverse list of human-level "competencies" given above is reminiscent of Gardner's t
tr991
multiple intelligences (MI) framework - a psychological approach to intelligence assessment
based on the idea that different people have mental strengths in different high-level domains,
so that intelligence tests should contain aspects that focus on each of these domains separately.
MI does not contradict the "complex goals in complex environments" view of intelligence, but
rather may be interpreted as making specific commitments regarding which complex tasks and
which complex environments are most important for roughly human-like intelligence.
MI does not seek an extreme generality, in the sense that it explicitly focuses on domains
in which humans have strong innate capability as well as general-intelligence capability; there
could easily be non-human intelligences that would exceed htunans according to both the com-
monsense human notion of "general intelligence" and the generic "complex goals in complex
environments" or Hutter/Legg-style definitions, yet would not equal humans on the MI crite-
ria. This strong anthropocentrism of MI is not a problem from an AGI perspective so long as
one uses MI in an appropriate way, i.e. only for assessing the extent to which an AGI system
displays specifically human-like general intelligence. This restrictiveness is the price one pays
for having an easily articulable and relatively easily implementable evaluation framework.
Table ?? summarizes the types of intelligence included in Gardner's MI theory.
Intelligence Type
Aspects
Linguistic
Words and language, written and spoken; retention, inter-
pretation and explanation of ideas and information via lan-
guage; understands relationship between communication
and meaning
Logical-Mathematical Logical thinking, detecting patterns, scientific reasoning
and deduction; analyse problems, perform mathematical
calculations, understands relationship between cause and
effect towards a tangible outcome
Musical
Musical ability, awareness, appreciation and use of sound;
recognition of tonal and rhythmic patterns, understands
relationship between sound and feeling
Bodily-Kinesthetic
Body movement control, manual dexterity, physical agility
and balance: eye and body coordination
Spatial-Visual
Visual and spatial perception; interpretation and creation
of images; pictorial imagination and expression; under-
stands relationship between images and meanings, and be-
tween space and effect
Interpersonal
Perception of other people's feelings; relates to others; inter-
pretation of behaviour and communications; understands
relationships between people and their situations
Table 2.1: Types of Intelligence in Gardner's Multiple Intelligence Theory
EFTA00623801
26
2 What Is Human-Like General Intelligence?
2.3.3 Newell's Criteria for a Human Cognitive Architecture
Finally, another related perspective is given by Alan Newell's "functional criteria for a human
cognitive architecture" INew90], which require that a humanlike AGI system should:
1. Behave as an (almost) arbitrary function of the environment
2. Operate in real time
3. Exhibit rational, i.e., effective adaptive behavior
4. Use vast amounts of knowledge about the environment
5. Behave robustly in the face of error, the unexpected, and the unknown
6. Integrate diverse knowledge
7. Use (natural) language
8. Exhibit self-awareness and a sense of self
9. Learn from its environment
10. Acquire capabilities through development
11. Arise through evolution
12. Be realizable within the brain
In our view, Newell's criterion 1 is poorly-formulated, for while universal Turing computing
power is easy to come by, any finite AI system must inevitably be heavily adapted to some
particular class of environments for straightforward mathematical reasons Illtit05, CP1'10].
On the other hand, his criteria 11 and 12 are not relevant to the CogPrime approach as we are
not doing biological modeling but rather AGI engineering. However, Newell's criteria 2-10 are
essential in our view, and all will be covered in the following chapters.
2.3.4 intelligence and Creativity
Creativity is a key aspect of intelligence. While sometimes associated especially with genius-
level intelligence in science or the arts, actually creativity is pervasive throughout intelligence,
at all levels. When a child makes a flying toy car by pasting paper bird wings on his toy car, and
when a bird figures out how to use a curved stick to get a piece of food out of a difficult corner
— this is creativity, just as much as the invention of a new physics theory or the design of a new
fashion line. The very nature of intelligence - achieving complex goals in complex environments
- requires creativity for its achievement, because the nature of complex environments and goals
is that they are always unveiling new aspects, so that dealing with them involves inventing
things beyond what worked for previously known aspects.
CogPrime contains a number of cognitive dynamics that are especially effective at creating
new ideas, such as: concept creation (which synthesizes new concepts via combining aspects
of previous ones), probabilistic evolutionary learning (which simulates evolution by natural
selection, creating new procedures via mutation, combination and probabilistic modeling based
on previous ones), and analogical inference (an aspect of the Probabilistic Logic Networks
subsystems). But ultimately creativity is about how a system combines all the processes at its
disposal to synthesize novel solutions to the problems posed by its goals in its environment.
There are times, of course, when the same goal can be achieved in multiple ways — some
more creative than others. In CogPrime this relates to the existence of multiple top-level goals,
one of which may be novelty. A system with novelty as one of its goals, alongside other more
EFTA00623802
2.4 Preschool as a View into Human-like General Intelligence
27
specific goals, will have a tendency to solve other problems in creative ways, thus fulfilling its
novelty goal along with its other goals. This can be seen at the level of childlike behaviors, and
also at a much more advanced level. Salvador Dali wanted to depict his thoughts and feelings,
but he also wanted to do so in a striking and unusual way; this combination of aspirations
spurred him to produce his amazing art. A child who is asked to draw a house, but has a
goal of novelty, may draw a tower with a swimming pool on the roof rather than a typical
Colonial structure. A physical motivated by novelty will seek a non-obvious solution to the
equation at hand, rather than just applying tried and true methods, and perhaps discover
some new phenomenon. Novelty can be measured formally in terms of information-theoretic
surprisingness based upon a given basis of knowledge and experience ISch061; something that
is novel and creative to a child may be familiar to the adult world, and a solution that seems
novel and creative to a brilliant scientist today, may seem like cliche' elementary school level
work 100 years from now.
Measuring creativity is even more difficult and subjective than measuring intelligence. Qual-
itatively, however, we humans can recognize it; and we suspect that the qualitative emergence
of dramatic, multidisciplinary, computational creativity will be one of the things that makes the
human population feel emotionally that advanced AGI has finally arrived.
2.4 Preschool as a View into Human-like General Intelligence
One issue that arises when pursuing the grand goal of human-level general intelligence is how
to measure partial progress. The classic Turing Test of imitating human conversation remains
too difficult to usefully motivate immediate-term Al research (see tHI:95I [Fre9OJ for arguments
that it has been counterproductive for the Al field). The same holds true for comparable alter-
natives like the Robot College Test of creating a robot that can attend a semester of university
and obtain passing grades. However, some researchers have suggested intermediary goals, that
constitute partial progress toward the grand goal and yet are qualitatively different from the
highly specialized problems to which most current AI systems are applied.
In this vein, Sam Adams and his team at IBM have outlined a so-called 'Toddler Turing
Test," in which one seeks to use Al to control a robot qualitatively displaying similar cognitive
behaviors to a young human child (say, a 3 year old) VABL02]. In fact this sort of idea has a
long and venerable history in the AI field - Alan Turing's original 1950 paper on AI rthr50j,
where he proposed the Turing Test, contains the suggestion that
"Instead of trying to produce a programme to simulate the adult mind,
why not rather try to produce one which simulates the child's?"
We find this childlike cognition based approach promising for many reasons, including its in-
tegrative nature: what a young child does involves a combination of perception, actuation, lin-
guistic and pictorial communication, social interaction, conceptual problem solving and creative
imagination. Specifically, inspired by these ideas, in Chapter 16 we will suggest the approach
of teaching and testing early-stage AGI systems in environments that emulate the preschools
used for teaching human children.
Human intelligence evolved in response to the demands of richly interactive environments,
and a preschool is specifically designed to be a richly interactive environment with the capability
to stimulate diverse mental growth. So, we are currently exploring the use of CogPrime to control
EFTA00623803
28
2 What Is Human-Like General Intelligence?
virtual agents in preschool-like virtual world environments, as well as commercial humanoid
robot platforms such as the Nao (see Figure 2.1) or Robokind (2.2) in physical preschool-like
robot labs.
Another advantage of focusing on childlike cognition is that child psychologists have created
a variety of instruments for measuring child intelligence. In Chapter 17, we will discuss an
approach to evaluating the general intelligence of human childlike AGI systems via combining
tests typically used to measure the intelligence of young human children, with additional tests
crafted based on cognitive science and the standard preschool curriculum.
To put it differently: While our long-term goal is the creation of genius machines with general
intelligence at the human level and beyond, we believe that every young child has a certain
genius; and by beginning with this childlike genius, we can built a platform capable of developing
into a genius machine with far more dramatic capabilities.
2.4.1 Design for an AGI Preschool
More precisely, we don't suggest to place a CogPrime system in an environment that is an
exact imitation of a human preschool - this would be inappropriate since current robotic or
virtual bodies are very differently abled than the body of a young human child. But we aim to
place CogPrime in an environment emulating the basic diversity and educational character of
a typical human preschool. We stress this now, at this early point in the book, because we will
use running examples throughout the book drawn from the preschool context.
The key notion in modern preschool design is the "learning center," an area designed and
outfitted with appropriate materials for teaching a specific skill. Learning centers are designed to
encourage learning by doing, which greatly facilitates learning processes based on reinforcement,
imitation and correction; and also to provide multiple techniques for teaching the same skills,
to accommodate different learning styles and prevent overfitting and overspecialization in the
learning of new skills.
Centers are also designed to cross-develop related skills. A "manipulatives center," for ex-
ample, provides physical objects such as drawing implements, toys and puzzles, to facilitate
development of motor manipulation, visual discrimination, and (through sequencing and clas-
sification games) basic logical reasoning. A "dramatics center" cross-trains interpersonal and
empathetic skills along with bodily-kinesthetic, linguistic, and musical skills. Other centers,
such as art, reading, writing, science and math centers are also designed to train not just one
area, but to center around a primary intelligence type while also cross-developing related areas.
For specific examples of the learning centers associated with particular contemporary preschools,
see [Nei98J. In many progressive, student-centered preschools, students are left largely to their
own devices to move front one center to another throughout the preschool room. Generally,
each center will be staffed by an instructor at some points in the day but not others, providing
a variety of learning experiences.
To imitate the general character of a human preschool, we will create several centers in our
robot lab. The precise architecture will be adapted via experience but initial centers will likely
be:
• a blocks center: a table with blocks on it
• a language center: a circle of chairs, intended for people to sit around and talk with the
robot
EFTA00623804
2.5 Integrative and Synergetic Approaches to Artificial Ceneral Intelligence
29
• a manipulatives center, with a variety of different objects of different shapes and sizes,
intended to teach visual and motor skills
• a ball play center: where balls are kept in chests and there is space for the robot to kick
the balls around
• a dramatics center where the robot can observe and enact various movements
One Running Example
As we proceed through the various component structures and dynamics of CogPrime in the
following chapters, it will be useful to have a few running examples to use to explain how the
various parts of the system are supposed to work. One example we will use fairly frequently is
drawn from the preschool context: the somewhat open-ended task of Build me something
out of blocks, that you haven't built for me before, and then tell me what it is. This
is a relatively simple task that combines multiple aspects of cognition in a richly interconnected
way, and is the sort of thing that young children will naturally do in a preschool setting.
2.5 Integrative and Synergetic Approaches to Artificial General
Intelligence
In Chapter 1 we characterized CogPrime as an integrative approach. And we suggest that the
naturalness of integrative approaches to AGI follows directly from comparing above lists of
capabilities and criteria to the array of available AI technologies. No single known algorithm
or data structure appears easily capable of carrying out all these functions, so if one wants
to proceed now with creating a general intelligence that is even vaguely humanlike, one must
integrate various AI technologies within some sort of unifying architecture.
For this reason and others, an increasing amount of work in the AI community these days
is integrative in one sense or another. Estimation of Distribution Algorithms integrate proba-
bilistic reasoning with evolutionary, learning IPe1051. Markov Logic Networks IR DWil integrate
formal logic and probabilistic inference, as does the Probabilistic Logic Networks framework
IGIGH08] utilized in CogPrime and explained further in the book, and other works in the
"Progic" area such as [ll'W00J. Leslie Pack Kaelbling has synthesized low-level robotics methods
(particle filtering) with logical inference 17.11071. Dozens of further examples could be given.
The construction of practical robotic systems like the Stanley system that won the DARPA
Grand Challenge IT ,a061 involve the integration of numerous components based on different
principles. These algorithmic and pragmatic innovations provide ample raw materials for the
construction of integrative cognitive architectures and are part of the reason why childlike AGI
is more approachable now than it was 50 or even 10 years ago.
Further, many of the cognitive architectures described in the current AI literature are "inte-
grative" in the sense of combining multiple, qualitatively different, interoperating algorithms.
Chapter 4 gives a high-level overview of existing cognitive architectures, dividing them into
symbolic, entergentist (e.g. neural network) and hybrid architectures. The hybrid architectures
generally integrate symbolic and neural components, often with multiple subcomponents within
each of these broad categories. However, we believe that even these excellent architectures are
not integrative enough, in the sense that they lack sufficiently rich and nuanced interactions
EFTA00623805
30
2 What Is Human-Like General Intelligence?
between the learning components associated with different kinds of memory, and hence are un-
likely to give rise to the emergent structures and dynamics characterizing general intelligence.
One of the central ideas underlying CogPrime is that with an integrative cognitive architecture
that combines multiple aspects of intelligence, achieved by diverse structures and algorithms,
within a common framework designed specifically to support robust synergetic interactions
between these aspects.
The simplest way to create an integrative AI architecture is to loosely couple multiple com-
ponents carrying out various functions, in such a way that the different components pass inputs
and outputs amongst each other but do not interfere with or modulate each others' internal
functioning in real-time. However, the human brain appears to be integrative in a much tighter
sense, involving rich real-time dynamical coupling between various components with distinct
but related functions. In IGoe09a1 we have hypothesized that the brain displays a property of
cognitive synergy, according to which multiple learning processes can not only dispatch
subproblems to each other, but also share contextual understanding in real-time, so
that each one can get help from the others in a contextually savvy way. By imbuing AI ar-
chitectures with cognitive synergy, we hypothesize, one can get past the bottlenecks that have
plagued AI in the past. Part of the reasoning here, as elaborated in Chapter 9 and IGoe09b], is
that real physical and social environments display a rich dynamic interconnection between their
various aspects, so that richly dynamically interconnected integrative AI architectures will be
able to achieve goals within them more effectively.
And this brings us to the patternist perspective on intelligent systems, alluded to above and
fleshed out further in Chapter 3 with its focus on the emergence of hierarchically and heterarchi-
cally structured networks of patterns, and pattern-systems modeling self and others. Ultimately
the purpose of cognitive synergy in an AGI system is to enable the various AI algorithms and
structures composing the system to work together effectively enough to give rise to the right
system-wide emergent structures characterizing real-world general intelligence. The underlying
theory is that intelligence is not reliant on any particular structure or algorithm, but is reliant
on the emergence of appropriately structured networks of patterns, which can then be used to
guide ongoing dynamics of pattern recognition and creation. And the underlying hypothesis is
that the emergence of these structures cannot be achieved by a loosely interconnected assem-
blage of components, no matter how sensible the architecture; it requires a tightly connected,
synergetic system.
It is possible to make these theoretical ideas about cognition mathematically rigorous; for
instance, Appendix ?? briefly presents a formal definition of cognitive synergy that has been
analyzed as part of an effort to prove theorems about the importance of cognitive synergy for
giving rise to emergent system properties associated with general intelligence. However, while
we have found such formal analyses valuable for clarifying our designs and understanding their
qualitative properties, we have concluded that, for the present, the best way to explore our
hypotheses about cognitive synergy and human-like general intelligence is empirically - via
building and testing systems like CogPrime.
2.5.1 Achieving Humanlike Intelligence via Cognitive Synergy
Summing up: at the broadest level, there are four primary challenges in constructing an inte-
grative, cognitive synergy based approach to AGI:
EFTA00623806
2.5 Integrative and Synergetic Approaches to Artificial General Intelligence
31
1. choosing an overall cognitive architecture that pacsesses adequate richness and flexi-
bility for the task of achieving childlike cognition.
2. Choosing appropriate AI algorithms and data structures to fulfill each of the func-
tions identified in the cognitive architecture (e.g. visual perception, audition, episodic mem-
ory, language generation, analogy,...)
3. Ensuring that these algorithms and structures, within the chosen cognitive architecture,
are able to cooperate in such a way as to provide appropriate coordinated, synergetic
intelligent behavior (a critical aspect since childlike cognition is an integrated functional
response to the world, rather than a loosely coupled collection of capabilities.)
4. Embedding one's system in an environment that provides sufficiently rich stimuli and
interactions to enable the system to use this cooperation to ongoingly, creatively develop
an intelligent internal world-model and self-model.
We argue that CogPrime
provides a viable way to address these challenges.
EFTA00623807
32
2 What Is Human-Like General Intelligence?
Fig. 2.1: The Nao humanoid robot
EFTA00623808
2.5 Integrative and Synergetic Approaches to Artificial Ceneral Intelligence
33
Fig. 2.2: The Nao humanoid robot
EFTA00623809
EFTA00623810
Chapter 3
A Patternist Philosophy of Mind
3.1 Introduction
In the last chapter we discussed human intelligence from a fairly down-to-earth perspective,
looking at the particular intelligent functions that human beings carry out in their everyday
lives. And we strongly feel this practical perspective is important: Without this concreteness, it's
too easy for AGI research to get distracted by appealing (or frightening) abstractions of various
sorts. However, it's also important to look at the nature of mind and intelligence from a more
general and conceptual perspective, to avoid falling into an approach that follows the particulars
of human capability but ignores the deeper structures and dynamics of mind that ultimately
allow human minds to be so capable. In this chapter we very briefly review some ideas from the
patternist philosophy of mind, a general conceptual framework on intelligence which has
been inspirational for many key aspects of the CogPrime design, and which has been ongoingly
developed by one of the authors (Ben Goertzel) during the last two decades (in a series of
publications beginning in 1991, most recently The Hidden Pattern [Goe(Xial). Some of the ideas
described are quite broad and conceptual, and are related to CogPrime only via serving as
general inspirations; others are more concrete and technical, and are actually utilized within
the design itself.
CogPrime is an integrative design formed via the combination of a number of different
philosophical, scientific and engineering ideas. The success or failure of the design doesn't depend
on any particular philosophical understanding of intelligence. In that sense, the more abstract
notions presented in this chapter should be considered "optional" rather than critical in a
CogPrime context. However, due to the core role patternism has played in the development of
CogPrime, understanding a few things about general patternist philosophy will be helpful for
understanding CogPrime, even for those readers who are not philosophically inclined. Those
readers who are philosophically inclined, on the other hand, are urged to read The Hidden
Pattern and then interpret the particulars of CogPrime in this light.
3.2 Some Patternist Principles
The patternist philosophy of mind is a general approach to thinking about intelligent systems.
It is based on the very simple premise that mind is made of pattern - and that a mind is a
35
EFTA00623811
36
3 A Patternist Philosophy of Mind
system for recognizing patterns in itself and the world, critically including patterns regarding
which procedures are likely to lead to the achievement of which goals in which contexts.
Pattern as the basis of mind is not in itself is a very, novel idea; this concept is present. for
instance, in the 19th-century philosophy of Charles Peirce IPei:341, in the writings of contempo-
rary philosophers Daniel Dennett [Den91J and Douglas Hofstadter 1110179, I lo1961, in Benjamin
Whorl's
tj linguistic philosophy and Gregory Bateson's lam 79] systems theory of mind
and nature. Bateson spoke of the Metapattern: "that it is pattern which connects." In Goertzel's
writings on philosophy of mind, an effort has been made to pursue this theme more thoroughly
than has been done before, and to articulate in detail how various aspects of human mind and
mind in general can be well-understood by explicitly adopting a patternist perspective.
In the patternist perspective, "pattern" is generally defined as "representation as something
simpler." Thus, for example, if one measures simplicity in terms of bit-count, then a program
compressing an image would be a pattern in that image. But if one uses a simplicity measure
incorporating run-time as well as bit-count, then the compressed version may or may not be a
pattern in the image, depending on how one's simplicity measure weights the two factors. This
definition encompasses simple repeated patterns, but also much more complex ones. While
pattern theory has typically been elaborated in the context of computational theory, it is not
intrinsically tied to computation; rather, it can be developed in any context where there is a
notion of "representation" or "production" and a way of measuring simplicity. One just needs
to be able to assess the extent to which f represents or produces X, and then to compare the
simplicity of f and X; and then one can assess whether f is a pattern in X. A formalization of
this notion of pattern is given in roeutial and briefly summarized at the end of this chapter.
Next, in patternism the mind of an intelligent system is conceived as the (fuzzy) set of
patterns in that system, and the set of patterns emergent between that system and other
systems with which it interacts. The latter clause means that the patternist perspective is
inclusive of notions of distributed intelligence illut961. Basically, the mind of a system is the
fuzzy set of different simplifying representations of that system that may be adopted.
Intelligence is conceived, similarly to in Marcus Hutter's 1Hut05i recent work (and as elabo-
rated informally in Chapter 2 above, and formally in Chapter 7 below), as the ability to achieve
complex goals in complex environments; where complexity itself may be defined as the pos-
session of a rich variety of patterns. A mind is thus a collection of patterns that is associated
with a persistent dynamical process that achieves highly-patterned goals in highly-patterned
environments.
An additional hypothesis made within the patternist philosophy of mind is that reflection is
critical to intelligence. This lets us conceive an intelligent system as a dynamical system that
recognizes patterns in its environment and itself, as part of its quest to achieve complex goals.
While this approach is quite general, it is not vacuous; it gives a particular structure to the
tasks of analyzing and synthesizing intelligent systems. About any would-be intelligent system,
we are led to ask questions such as:
• How are patterns represented in the system? That is, how does the underlying infrastructure
of the system give rise to the displaying of a particular pattern in the system's behavior?
• What kinds of patterns are most compactly represented within the system?
• What kinds of patterns are most simply learned?
I In some prior writings the term "psynet model of mind" has been used to refer to the application of patternist
philosophy to cognitive theory, but this term has been •deprecated• in recent publications as it seemed to
introduce more confusion than clarification.
EFTA00623812
3.2 Sonic Patternist Principles
37
• What learning processes are utilized for recognizing patterns?
• What mechanisms are used to give the system the ability to introspect (so that it can
recognize patterns in itself)?
Now, these same sorts of questions could be asked if one substituted the word "pattern" with
other words like "knowledge" or "information". However, we have found that asking these ques-
tions in the context of pattern leads to more productive answers, avoiding unproductive byways
and also tying in very nicely with the details of various existing formalisms and algorithms for
knowledge representation and learning.
Among the many kinds of patterns in intelligent systems, semiotic patterns are particularly
interesting ones. Peirce decomposed these into three categories:
• iconic patterns, which are patterns of contextually important internal similarity between
two entities (e.g. an iconic pattern binds a picture of a person to that person)
• indexical patterns, which are patterns of spatiotemporal co-occurrence (e.g. an indexical
pattern binds a wedding dress and a wedding)
• symbolic patterns, which are patterns indicating that two entities are often involved in
the same relationships (e.g. a symbolic pattern between the number "5" (the symbol) and
various sets of 5 objects (the entities that the symbol is taken to represent))
Of course, some patterns may span more than one of these semiotic categories; and there
are also some patterns that don't fall neatly into any of these categories. But the semiotic
patterns are particularly important ones; and symbolic patterns have played an especially large
role in the history of AI, because of the radically different approaches different researchers have
taken to handling them in their Al systems. Mathematical logic and related formalisms provide
sophisticated mechanisms for combining and relating symbolic patterns ("symbols"), and some
AI approaches have focused heavily on these, sometimes more so than on the identification of
symbolic patterns in experience or the use of them to achieve practical goals. We will look fairly
carefully at these differences in Chapter 4.
Pursuing the patternist philosophy in detail leads to a variety of particular hypotheses and
conclusions about the nature of mind. Following from the view of intelligence in terms of
achieving complex goals in complex environments, comes a view in which the dynamics of
a cognitive system are understood to be governed by two main forces:
• self-organization, via which system dynamics cause existing system patterns to give rise to
new ones
• goal-oriented behavior, which will be defined more rigorously in Chapter 7, but basically
amounts to a system interacting with its environment in a way that appears like an attempt
to maximize some reasonably simple function
Self-organized and goal-oriented behavior mast be understood as cooperative aspects. If an
agent is asked to build a surprising structure out of blocks and does so, this is goal-oriented.
But the agent's ability to carry out this goal-oriented task will be greater if it has previously
played around with blocks a lot in an unstructured, spontaneous way. And the "nudge toward
creativity" given to it by asking it to build a surprising blocks structure may cause it to explore
some novel patterns, which then feed into its future unstructured blocks play.
Based on these concepts, as argued in detail in V;oeetial, several primary dynamical principles
may be posited, including:
EFTA00623813
38
3 A Patternist Philosophy of Mind
• Evolution , conceived as a general process via which patterns within a large population
thereof are differentially selected and used as the basis for formation of new patterns, based
on some "fitness function" that is generally tied to the goals of the agent
- Example: If trying to build a blocks structure that will surprise Bob, an agent may
simulate several procedures for building blocks structures in its "mind's eye", assessing
for each one the expected degree to which it might surprise Bob. The search through
procedure space could be conducted as a form of evolution, via an algorithm such as
MOSES.
• Autopoiesis: the process by which a system of interrelated patterns maintains its integrity,
via a dynamic in which whenever one of the patterns in the system begins to decrease in
intensity, some of the other patterns increase their intensity in a manner that causes the
troubled pattern to increase in intensity again
- Example: An agent's set of strategies for building the base of a tower, and its set of
strategies for building the middle part of a tower, are likely to relate autopoietically. If
the system partially forgets how to build the base of a tower, then it may regenerate
this missing knowledge via using its knowledge about how to build the middle part
(i.e., it knows it needs to build the base in a way that will support good middle parts).
Similarly if it partially forgets how to build the middle part, then it may regenerate this
missing knowledge via using its knowledge about how to build the base (i.e. it knows a
good middle part should fit in well with the sorts of base it knows are good).
- This same sort of interdependence occurs between pattern-sets containing more than
two elements
- Sometimes (as in the above example) autopoietic interdependence in the mind is tied
to interdependencies in the physical world, sometimes not.
• Association. Patterns, when given attention, spread some of this attention to other pat-
terns that they have previously been associated with in some way. Furthermore, there is
Peirce's law of mind [Pci3-11, which could be paraphrased in modern terms as stating that
the mind is an associative memory network, whose dynamics dictate that every idea in
the memory is an active agent, continually acting on those ideas with which the memory
associates it.
- Example: Building a blocks structure that resembles a tower, spreads attention to mem-
ories of prior towers the agents has seen, and also to memories of people the agent knows
have seen towers, and structures it has built at the same time as towers, structures that
resemble towers in various respects, etc.
• Differential attention allocation / credit assignment. Patterns that have been valu-
able for goal-achievement are given more attention, and are encouraged to participate in
giving rise to new patterns.
- Example: Perhaps in a prior instance of the task "build me a surprising structure out of
blocks," searching through memory for non-blocks structures that the agent has played
with has proved a useful cognitive strategy. In that case, when the task is posed to the
agent again, it should tend to allocate disproportionate resources to this strategy.
• Pattern creation. Patterns that have been valuable for goal-achievement are mutated and
combined with each other to yield new patterns.
EFTA00623814
3.2 Sonic Patternist Principles
39
- Example: Building towers has been useful in a certain context, but so has building
structures with a large number of triangles. Why not build a tower out of triangles?
Or maybe a vaguely tower-like structure that uses more triangles than a tower easily
could?
- Example: Building an elongated block structure resembling a table was successful in the
past, as was building a structure resembling a very flat version of a chair. Generalizing,
maybe building distorted versions of furniture Ls good. Or maybe it is building distorted
version of any previously perceived objects that is good. Or maybe both, to different
degrees....
Next, for a variety of reasons outlined in EGoeoGal it becomes appealing to hypothesize that the
network of patterns in an intelligent system must give rise to the following large-scale emergent
structures
• Hierarchical network. Patterns are habitually in relations of control over other patterns that
represent more specialized aspects of themselves.
- Example: The pattern associated with "tall building" has some control over the pattern
associated with "tower", as the former represents a more general concept ... and "tower"
has some control over "Eiffel tower", etc.
• Heterarchical network. The system retains a memory of which patterns have previously
been associated with each other in any way.
- Example: `Tower" and "snake" are distant in the natural pattern hierarchy, but may be
associatively/heterarchically linked due to having a common elongated structure. This
heterarchical linkage may be used for many things, e.g. it might inspire the creative
construction of a tower with a snake's head.
• Dual network. Hierarchical and heterarchical structures are combined, with the dynamics
of the two structures working together harmoniously. Among many possible ways to hier-
archically organize a set of patterns, the one used should be one that causes hierarchically
nearby patterns to have many meaningful heterarchical connections; and of course, there
should be a tendency to search for heterarchical connections among hierarchically nearby
patterns.
- Example: While the set of patterns hierarchically nearby "tower" and the set of patterns
heterarchically nearby "tower" will be quite different, they should still have more overlap
than random pattern-sets of similar sizes. So, if looking for something else heterarchically
near "tower", using the hierarchical information about "tower" should be of some use,
and vice versa.
- In PLN, hierarchical relationships correspond to Atoms A and B so that InheritanceAB
and InheritanceBA have highly dissimilar strength; and heterarchical relationships cor-
respond to IntensionalSimilarity relationships. The dual network structure then arises
when intensional and extensional inheritance approximately correlate with each other,
so that inference about either kind of inheritance assists with figuring out about the
other kind.
• Self structure. A portion of the network of patterns forms into an approximate image of the
overall network of patterns.
EFTA00623815
40
3 A Patternist Philosophy of Mind
— Example: Each time the agent builds a certain structure, it observes itself building
the structure, and its role as "builder of a tall tower" (or whatever the structure is)
becomes part of its self-model. Then when it is asked to build something new, it may
consult its self-model to see if it believes itself capable of building that sort of thing (for
instance, if it is asked to build something very large, its self-model may tell it that it
lacks persistence for such projects, so it may reply "I can try, but I may wind up not
finishing it").
As we proceed through the CogPrime design in the following pages, we will see how each
of these abstract concepts arises concretely from CogPrime's structures and algorithms. If the
theory of roe0Gal is correct, then the success of CogPrime as a design will depend largely on
whether these high-level structures and dynamics can be made to emerge from the synergetic
interaction of CogPrime's representation and algorithms, when they are utilized to control an
appropriate agent in an appropriate environment.
3.3 Cognitive Synergy
Now we dig a little deeper and present a different sort of "general principle of feasible general
intelligence", already hinted in earlier chapters: the cognitive synergy principle 2, which is both
a conceptual hypothesis about the structure of generally intelligent systems in certain classes of
environments, and a design principle used to guide the design of CogPrime. Chapter 8 presents
a mathematical formalization of the notion of cognitive synergy; here we present the conceptual
idea informally, which makes it more easily digestible but also more vague-sounding.
We will focus here on cognitive synergy specifically in the case of "multi-memory systems,"
which we define as intelligent systems whose combination of environment, embodiment and
motivational system make it important for them to possess memories that divide into partially
but not wholly distinct components corresponding to the categories of:
• Declarative memory
— Examples of declarative knowledge: Towers on average are taller than buildings. I gener-
ally am better at building structures I imagine, than at imitating structures I'm shown
in pictures.
• Procedural memory (memory about how to do certain things)
- Examples of procedural knowledge: Practical know-how regarding how to pick up an
elongated rectangular block, or a square one. Know-how regarding when to approach
a problem by asking "What would one of my teachers do in this situation" versus by
thinking through the problem from first principles.
• Sensory and episodic memory
- Example of sensory knowledge: memory of Bob's face; memory of what a specific tall
blocks tower looked like
2 While these points are implicit in the theory of mind given in IGoeOlial, they are not articulated in this
specific form there. So the material presented in this section is a new development within patternist philosophy,
developed since rocO6al in a series of conference papers such as V:csaM)al.
EFTA00623816
3.3 Cognitive Synergy
41
— Example of episodic knowledge: memory of the situation in which the agent first met
Bob; memory of a situation in which a specific tall blocks tower was built
• Attentional memory (knowledge about what to pay attention to in what contexts)
— Example of attention! knowledge: When involved with a new person, it's useful to pay
attention to whatever that person looks at
• Intentional memory (knowledge about the system's own goals and subgoals)
- Example of intentional knowledge: If my goal is to please some person whom I don't
know that well, then a subgoal may be figuring out what makes that person smile.
In Chapter 9 below we present a detailed argument as to how the requirement for a multi-
memory underpinning for general intelligence emerges from certain underlying assumptions
regarding the measurement of the simplicity of goals and environments. Specifically we argue
that each of these memory types corresponds to certain modes of communication, so that intel-
ligent agents which have to efficiently handle a sufficient variety of types of communication with
other agents, are going to have to handle all these types of memory. These types of communi-
cation overlap and are often used together, which implies that the different memories and their
associated cognitive processes need to work together. The points made in this section do not
rely on that argument regarding the relation of multiple memory, types to the environmental
situation of multiple communication types. What they do rely on is the assumption that, in
the intelligence agent in question, the different components of memory are significantly but not
wholly distinct. That is, there are significant "family resemblances" between the memories of a
single type, yet there are also thoroughgoing connections between memories of different types.
Repeating the above points in a slightly more organized manner and then extending them, the
essential idea of cognitive synergy, in the context of multi-memory systems, may be expressed
in terms of the following points
1. Intelligence, relative to a certain set of environments, may be understood as the capability
to achieve complex goals in these environments.
2. With respect to certain classes of goals and environments, an intelligent system requires a
"multi-memory" architecture, meaning the possession of a number of specialized yet inter-
connected knowledge types, including: declarative, procedural, attentions], sensory, episodic
and intentional (goal-related). These knowledge types may be viewed as different sorts of
patterns that a system recognizes in itself and its environment.
3. Such a system mast possess knowledge creation (i.e. pattern recognition / formation) mech-
anisms corresponding to each of these memory types. These mechanisms are also called
"cognitive processes."
4. Each of these cognitive processes, to be effective, must have the capability to recognize when
it lacks the information to perform effectively on its own; and in this case, to dynamically
and interactively draw information from knowledge creation mechanisms dealing with other
types of knowledge
5. This cross-mechanism interaction must have the result of enabling the knowledge creation
mechanisms to perform much more effectively in combination than they would if operated
non-interactively. This is "cognitive synergy."
Interactions as mentioned in Points 4 and 5 in the above list are the real conceptual meat
of the cognitive synergy idea. One way to express the key idea here, in an Al context, is that
EFTA00623817
42
3 A Patternist Philosophy of Mind
most AI algorithms suffer from combinatorial explosions: the number of possible elements to
be combined in a synthesis or analysis is just too great, and the algorithms are unable to
filter through all the possibilities, given the lack of intrinsic constraint that comes along with
a "general intelligence" context (as opposed to a narrow-Al problem like chess-playing, where
the context is constrained and hence restricts the scope of possible combinations that needs
to be considered). In an AGI architecture based on cognitive synergy, the different learning
mechanisms must be designed specifically to interact in such a way as to palliate each others'
combinatorial explosions - so that, for instance, each learning mechanism dealing with a certain
sort of knowledge, must synergize with learning mechanisms dealing with the other sorts of
knowledge, in a way that decreases the severity of combinatorial explosion.
One prerequisite for cognitive synergy to work is that each learning mechanism must rec-
ognize when it is "stuck," meaning it's in a situation where it has inadequate information to
make a confident judgment about what steps to take next. Then, when it does recognize that
it's stuck, it may request help from other, complementary cognitive mechanisms.
3.4 The General Structure of Cognitive Dynamics: Analysis and
Synthesis
We have discussed the need for synergetic interrelation between cognitive processes correspond-
ing to different types of memory ... and the general high-level cognitive dynamics that a mind
must possess (evolution, autopoiesis). The next step is to dig further into the nature of the cog-
nitive processes associated with different memory types and how they give rise to the needed
high-level cognitive dynamics. In this section we present a general theory of cognitive processes
based on a decomposition of cognitive processes into the two categories of analysis and synthesis,
and a general formulation of each of these categories 3.
Specifically we focus here on what we call focused cognitive processes; that is, cognitive
processes that selectively focus attention on a subset of the patterns making up a mind. In
general these are not the only kind, there may also be global cognitive processes that act on
every pattern in a mind. An example of a global cognitive process in CogPrime is the basic
attention allocation process, which spreads "importance" among all knowledge in the system's
memory. Global cognitive processes are also important, but focused cognitive processes are
subtler to understand which is why we spend more time on them here.
3.4.1 Component-Systems and Self-Generating Systems
We begin with autopoesis - and, more specifically, with the concept of a "component-system",
as described in George Kampis's book Self-Modifying Systems in Biology and Cognitive Science
IlCam” i j. and as modified into the concept of a "self-generating system" or SGS in Goertzel's
book Chaotic Logic roe9-11. Roughly speaking, a Kampis-style component-system consists of
a set of components that combine with each other to form other compound components. The
3 While these points are highly compatible with theory of mind given in
they are not articulated there.
The material presented in this section is a new development within patternist philosophy, presented previously
only in the article IC
EFTA00623818
3.4 The General Structure of Cognitive Dynamics: Analysis and Synthesis
43
metaphor Kampis uses is that of Lego blocks, combining to form bigger Lego structures. Com-
pound structures may in turn be combined together to form yet bigger compound structures.
A self-generating system is basically the same concept as a component-system, but understood
to be computable, whereas Kampis claims that component-systems are =computable.
Next, in SGS theory there is also a notion of reduction (not present in the Lego metaphor):
sometimes when components are combined in a certain way, a "reaction" happens, which may
lead to the elimination of some of the components. One relevant metaphor here is chemistry.
Another is abstract algebra: for instance, if we combine a component f with its "inverse" com-
ponent r 1, both components are eliminated. Thus, we may think about two stages in the
interaction of sets of components: combination, and reduction. Reduction may be thought of
as algebraic simplification, governed by a set of rules that apply to a newly created compound
component, based on the components that are assembled within it.
Formally, suppose C1, C2, ••• is the set of components present in a discrete-time component-
system at time t. Then, the components present at time t+1 are a subset of the set of components
of the form
Reduce(Join(Ci(1),
Ci(r)))
where Join is a joining operation, and Reduce is a reduction operator. The joining operation
is assumed to map tuples of components into components, and the reduction operator is assumed
to map the space of components into itself. Of course, the specific nature of a component system
is totally dependent on the particular definitions of the reduction and joining operators; in
following chapters we will specify these for the CogPrime system, but for the purpose of the
broader theoretical discussion in this section they may be left general.
What is called the "cognitive equation" in Chaotic Logic roc941 is the case of a SGS where
the patterns in the system at time t have a tendency to correspond to components of the system
at future times t + s. So, part of the action of the system is to transform implicit knowledge
(patterns among system components) into explicit knowledge (specific system components). We
will see one version of this phenomenon in Chapter 14 where we model implicit knowledge using
mathematical structures called "derived hypergraphs"; and we will also later review several ways
in which CogPrime's dynamics explicitly encourage cognitive-equation type dynamics, e.g.:
• inference, which takes conclusions implicit in the combination of logical relationships, and
makes them implicit by deriving new logical relationships from them
• map formation, which takes concepts that have often been active together, and creates new
concepts grouping them
• association learning, which creates links representing patterns of association between entities
• probabilistic procedure learning, which creates new models embodying patterns regarding
which procedures tend to perform well according to particular fitness functions
5.4.2 Analysis and Synthesis
Now we move on to the main point of this section: the argument that all or nearly all focused
cognitive processes are expressible using two general process-schemata we call synthesis and
EFTA00623819
44
3 A Patternist Philosophy of Mind
analysis 4. The notion of "focused cognitive process" will be exemplified more thoroughly below,
but in essence what is meant is a cognitive process that begins with a small number of items
(drawn from memory) as its focus, and has as its goal discovering something about these
items, or discovering something about something else in the context of these items or in a way
strongly biased by these items. This is different from a global cognitive process whose goal is
more broadly-based and explicitly involves all or a large percentage of the knowledge in an
intelligent system's memory store.
Among the focused cognitive processes are those governed by the so-called cognitive schematic
implication
Context A Procedure —> Goal
where the Context involves sensory, episodic and/or declarative knowledge; and attentional
knowledge is used to regulate how much resource is given to each such schematic implication in
memory. Synergy among the learning processes dealing with the context, the procedure and the
goal is critical to the adequate execution of the cognitive schematic using feasible computational
resources. This sort of explicitly goal-driven cognition plays a significant though not necessarily
dominant role in CogPrime, and is also related to production rules systems and other traditional
AI systems, as will be articulated in Chapter 4.
The synthesis and analysis processes as we conceive them, in the general framework of SGS
theory, are as follows. First, synthesis, as shown in Figure 3.1, is defined as
synthesis: Iteratively build compounds from the initial component pool using the combinators,
greedily seeking compounds that seem likely to achieve the goal.
Or in more detail:
1. Begin with some initial components (the initial "current pool"), an additional set of com-
ponents identified as "combinators" (combination operators), and a goal function
2. Combine the components in the current pool, utilizing the combinators, to form product
components in various ways, carrying out reductions as appropriate, and calculating relevant
quantities associated with components as needed
3. Select the product components that seem most promising according to the goal function,
and add these to the current pool (or else simply define these as the current pool)
4. Return to Step 2
And analysis, as shown in Figure 3.2, is defined as
analysis: Iteratively search (the system's long-term memory) for component-sets that com-
bine using the combinators to form the initial component pool (or subsets thereof), greedily
seeking component-sets that seem likely to achieve the goal
or in more detail:
1. Begin with some components (the initial "current poor) and a goal function
2. Seek components so that, if one combines them to form product components using the
combinators and then performs appropriate reductions, one obtains (as many as possible
of) the components in the current pool
4 In laPPG061, what is here called "analysis" was called "backward synthesis", a name which has some advantages
since it indicated that what's happening is a form of creation; but here we have opted for the more traditional
analysis/synthesis terminology
EFTA00623820
3.4 The General Structure of Cognitive Dynamics: Analysis and Synthesis
45
initial focus
(concepts, procedures,
inference rules, etc.)
combinations of items
in initial focus...
combinations of
combinations...
Fig. 3.1: The General Process of Synthesis
3. Use the newly found constructions of the components in the current pool, to update the
quantitative properties of the components in the current pool, and also (via the current
pool) the quantitative properties of the components in the initial pool
4. Out of the components found in Step 2, select the ones that seem most promising according
to the goal function, and add these to the current pool (or else simply define these as the
current pool)
5. Return to Step 2
More formally, synthesis may be specified as follows. Let X denote the set of combinators,
and let Yo denote the initial pool of components (the initial focus of the cognitive process).
Given K, let 21 denote the set
ReducePoin(Ci(1),...,a(r)))
where the Ci are drawn from Y1 or from X. We may then say
Yi÷1 = Filter(Z)
where Filter is a function that selects a subset of its arguments.
Analysis, on the other hand, begins with a set W of components, and a set X of combinators,
and tries to find a series Y so that according to the process of synthesis, Y„=W.
In practice, of course, the implementation of a synthesis process need not involve the explicit
construction of the full set Z. Rather, the filtering operation takes place implicitly during the
construction of
The result, however, is that one gets some subset of the compounds pro-
ducible via joining and reduction from the set of components present in Y; plus the combinators
X.
EFTA00623821
46
3 A Patternist Philosophy of Mind
initial focus
(concepts, procedures,
inference rules, etc.)
set of items
set of items that
that combine to yield
combine to yield
Items in initial focus
Fig. 3.2: The General Process of Analysis
Conceptually one may view synthesis as a very generic sort of "growth process," and analysis
as a very generic sort of "figuring out how to grow something." The intuitive idea underlying
the present proposal is that these forward-going and backward-going "growth processes" are
among the essential foundations of cognitive control, and that a conceptually sound design for
cognitive control should explicitly make use of this fact. To abstract away from the details,
what these processes are about is:
• taking the general dynamic of compound-formation and reduction as outlined in Kampis
and Chaotic Logic
• introducing goal-directed pruning ("filtering") into this dynamic so as to account for the
limitations of computational resources that are a necessary part of pragmatic intelligence
543 The Dynamic of Iterative Analysis and Synthesis
While synthesis and analysis are both very, useful on their own, they achieve their greatest power
when harnessed together. It is my hypothesis that the dynamic pattern of alternating synthesis
and analysis has a fundamental role in cognition. Put simply, synthesis creates new mental
forms by combining existing ones. Then, analysis seeks simple explanations for the forms in the
mind, including the newly created ones; and, this explanation itself then comprises additional
new forms in the mind, to be used as fodder for the next round of synthesis. Or, to put it yet
more simply:
EFTA00623822
3.4 The General Structure of Cognitive Dynamics: Analysis and Synthesis
47
Combine
Explain
Combine a Explain a Combine
It is not hard to express this alternating dynamic more formally, as well.
• Let X denote any set of components.
• Let F(X) denote a set of components which is the result of synthesis on X.
• Let I3(X) denote a set of components which is the result of analysis of X. We assume also
a heuristic biasing the synthesis process toward simple constructs.
• Let S(t) denote a set of components at time t, representing part of a system's knowledge
base.
• Let 1(t) denote components resulting from the external environment at time t.
Then, we may consider a dynamical iteration of the form
S(t +1). B(MM+ 1(t)))
This expresses the notion of alternating synthesis and analysis formally, as a dynamical
iteration on the space of sets of components. We may then speak about attractors of this
iteration: fixed points, limit cycles and strange attractors. One of the key hypotheses I wish
to put forward here is that some key emergent cognitive structures are strange attractors of
this equation. The iterative dynamic of combination and explanation leads to the emergence
of certain complex structures that are, in essence, maintained when one recombines their parts
and then seeks to explain the recombinations. These structures are built in the first place
through iterative recombination and explanation, and then survive in the mind because they
are conserved by this process. They then ongoingly guide the construction and destruction of
various other temporary mental structures that are not so conserved.
:1.4.4 Self and Focused Attention as Approximate Attractors of the
Dynamic of Iterated Forward-Analysis
As noted above, patternist philosophy argues that two key aspects of intelligence are emergent
structures that may be called the "self" and the "attentional focus." These, it is suggested, are
aspects of intelligence that may not effectively be wired into the infrastructure of an intelligent
system, though of course the infrastructure may be configured in such a way as to encourage
their emergence. Rather, these aspects, by their nature, are only likely to be effective if they
emerge from the cooperative activity of various cognitive processes acting within a broad base
of knowledge.
Above we have described the pattern of ongoing habitual oscillation between synthesis and
analysis as a kind of "dynamical iteration." Here we will argue that both self and attentional
focus may be viewed as strange attractors of this iteration. The mode of argument is relatively
informal. The essential processes under consideration are ones that are poorly understood from
an empirical perspective, due to the extreme difficulty involved in studying them experimentally.
For understanding self and attentional focus, we are stuck in large part with introspection, which
is famously unreliable in some contexts, yet still dramatically better than having no information
at all. So, the philosophical perspective on self and attentional focus given here is a synthesis of
empirical and introspective notions, drawn largely from the published thinking and research of
EFTA00623823
48
3 A Patternist Philosophy of Mind
others but with a few original twists. From a CogPrime perspective, its use has been to guide
the design process, to provide a grounding for what otherwise would have been fairly arbitrary,
choices.
3.4.4.1 Self
Another high-level intelligent system pattern mentioned above is the "self", which we here will tie
in with analysis and synthesis processes. The term "self" as used here refers to the "phenomenal
self" INI0041 or "self-model". That is, the self is the model that a system builds internally,
reflecting the patterns observed in the (external and internal) world that directly pertain to
the system itself. As is well known in everyday human life, self-models need not be completely
accurate to be u.seful; and in the presence of certain psychological factors, a more accurate
self-model may not necessarily be advantageous. But a self-model that is too badly inaccurate
will lead to a badly-functioning system that is unable to effectively act toward the achievement
of its own goals.
The value of a self-model for any intelligent system carrying out embodied agentive cognition
is obvious. And beyond this, another primary, use of the self is as a foundation for metaphors
and analogies in various domains. Patterns recognized pertaining to the self are analogically
extended to other entities. In some cases this leads to conceptual pathologies, such as the an-
thropomorphization of trees, rocks and other such objects that one sees in some precivilized
cultures. But in other cases this kind of analogy leads to robust sorts of reasoning - for instance,
in reading Lakoff and Nunez's ILN00] intriguing explorations of the cognitive foundations of
mathematics, it is pretty easy to see that most of the metaphors on which they hypothesize
mathematics to be based, are grounded in the mind's conceptualization of itself as a spatiotem-
porally embedded entity, which in turn is predicated on the mind's having a conceptualization
of itself (a self) in the first place.
A self-model can in many cases form a self-fulfilling prophecy (to make an obvious double-
entendre'!). Actions are generated based on one's model of what sorts of actions one can and or
should take; and the results of these actions are then incorporated into one's self-model. If a
self-model proves a generally bad guide to action selection, this may never be discovered, unless
said self-model includes the knowledge that semi-random experimentation is often useful.
In what sense, then, may it be said that self is an attractor of iterated analysis? Analysis
infers the self from observations of system behavior. The system asks: What kind of system
might I be, in order to give rise to these behaviors that I observe myself carrying out? Based
on asking itself this question, it constructs a model of itself, i.e. it constructs a self. Then, this
self guides the system's behavior: it builds new logical relationships its self-model and various
other entities, in order to guide its future actions oriented toward achieving its goals. Based on
the behaviors newly induced via this constructive, forward-synthesis activity, the system may
then engage in analysis again and ask: What mast I be now, in order to have carried out these
new actions? And so on.
Our hypothesis is that after repeated iterations of this sort, in infancy, finally during early
childhood a kind of self-reinforcing attractor occurs, and we have a self-model that is resilient
and doesn't change dramatically when new instances of action- or explanation-generation occur.
This is not strictly a mathematical attractor, though, because over a long period of time the self
may well shift significantly. But, for a mature self, many hundreds of thousands or millions of
forward-analysis cycles may occur before the self-model is dramatically modified. For relatively
EFTA00623824
3.4 The General Structure of Cognitive Dynamics: Analysis and Synthesis
49
long periods of time, small changes within the context of the existing self may suffice to allow
the system to control itself intelligently.
Humans can also develop what are known as subselves Iliow901. A subself is a partially
autonomous self-network focused on particular tasks, environments or interactions. It contains
a unique model of the whole organism. and generally has its own set of episodic memories,
consisting of memories of those intervals during which it was the primary dynamic mode con-
trolling the organism. One common example is the creative subself - the subpersonality that
takes over when a creative person launches into the process of creating something. In these
times. a whole different personality sometimes emerges, with a different sort of relationship
to the world. Among other factors, creativity requires a certain open-ness that is not always
productive in an everyday life context, so it's natural for the self-system of a highly creative
person to bifurcate into one self-system for everyday life, and another for the protected context
of creative activity. This sort of phenomenon might emerge naturally in CogPrime systems as
well if they were exposed to appropriate environments and social situations.
Finally, it is interesting to speculate regarding how self may differ in future AI systems as
opposed to in humans. The relative stability we see in human selves may not exist in AI systems
that can self-improve and change more fundamentally and rapidly than humans can. There may
be a situation in which, as soon as a system has understood itself decently, it radically modifies
itself and hence violates its existing self-model. Thus: intelligence without a long-term stable self.
In this case the "attractor-ish" nature of the self holds only over much shorter time scales than
for human minds or human-like minds. But the alternating process of synthesis and analysis
for self-construction is still critical, even though no reasonably stable self-constituting attractor
ever emerges. The psychology of such intelligent systems will almost surely be beyond human
beings' capacity for comprehension and empathy.
3.4.4.2 Attentions' Focus
Finally, we turn to the notion of an "attentional focus" similar to Baars' [13aa97j notion of a
Global Workspace, which will be reviewed in more detail in Chapter 4: a collection of mental
entities that are, at a given moment, receiving far more than the usual share of an intelligent
system's computational resources. Due to the amount of attention paid to items in the atten-
tional focus, at any given moment these items are in large part driving the cognitive processes
going on elsewhere in the mind as well - because the cognitive processes acting on the items in
the attentional focus are often involved in other mental items, not in attentional focus, as well
(and sometimes this results in pulling these other items into attentional focus). An intelligent
system must constantly shift its attentional focus from one set of entities to another based on
changes in its environment and based on its own shifting discoveries.
In the human mind, there is a self-reinforcing dynamic pertaining to the collection of entities
in the attentional focus at any given point in time, resulting from the observation that: If A
is in the attentional focus, and A and B have often been associated in the past, then odds
are increased that B will soon be in the attentional focus. This basic observation has been
refined tremendously via a large body of cognitive psychology work; and neurologically it follows
not only from Hebb's I11ebd91 classic work on neural reinforcement learning, but also from
numerous more modern refinements N13981. But it implies that two items A and B, if both in
the attentional focus, can reinforce each others' presence in the attentional focus, hence forming
a kind of conspiracy to keep each other in the limelight. But of course, this kind of dynamic
EFTA00623825
50
3 A Patternist Philosophy of Mind
must be counteracted by a pragmatic tendency to remove items from the attentional focus if
giving them attention is not providing sufficient utility in terms of the achievement of system
goals.
The synthmis and analysis perspective provides a more systematic perspective on this self-
reinforcing dynamic. Synthesis occurs in the attentional focus when two or more items in the
focus are combined to form new items, new relationships, new ideas. This happens continually,
as one of the main purposes of the attentional focus Ls combinational. On the other hand,
Analysis then occurs when a combination that has been speculatively formed is then linked
in with the remainder of the mind (the "unconscious", the vast body of knowledge that is not
in the attentional focus at the given moment in time). Analysis basically checks to see what
support the new combination has within the existing knowledge store of the system. Thus,
forward-analysis basically comes down to "generate and test", where the testing takes the form
of attempting to integrate the generated structures with the ideas in the unconscious long-
term memory. One of the most obvious examples of this kind of dynamic is creative thinking
(Boden, 2003: Goertzel, 1997), where the attentional focus continually combinationally creates
new ideas, which are then tested via checking which ones can be validated in terms of (built up
from) existing knowledge.
The analysis stage may result in items being pushed out of the attentional focus, to be
replaced by others. Likewise may the synthesis stage: the combinations may overshadow and
then replace the things combined. However, in human minds and functional AI minds, the
attentional focus will not be a complete chaos with constant turnover: Sometimes the same set of
ideas - or a shifting set of ideas within the same overall family of ideas - will remain in focus for a
while. When this occurs it Ls because this set or family of ideas forms an approximate attractor
for the dynamics of the attentional focus. in particular for the forward-analysis dynamic of
speculative combination and integrative explanation. Often, for instance, a small "core set" of
ideas will remain in the attentional focus for a while, but will not exhaust the attentional focus:
the rest of the attentional focus will then, at any point in time, be occupied with other ideas
related to the ones in the core set. Often this may mean that, for a while, the whole of the
attentional focus will move around quasi-randomly through a "strange attractor" consisting of
the set of ideas related to those in the core set.
3.4.5 Conclusion
The ideas presented above (the notions of synthesis and analysis, and the hypothesis of self and
attentional focus as attractors of the iterative forward-analysis dynamic) are quite generic and
are hypothetically proposed to be applicable to any cognitive system, natural or artificial. Later
chapters will discuss the manifestation of the above ideas in the context of CogPrime. We have
found that the analysis/synthesis approach is a valuable tool for conceptualizing CogPrime's
cognitive dynamics, and we conjecture that a similar utility may be found more generally.
Next, so a S not to end the section on too blasé of a note, we will also make a stronger
hypothesis: that. in order for a physical or software system to achieve intelligence that Ls roughly
human-level in both capability and generality, using computational resources on the same order
of magnitude as the human brain, this system must
• manifest the dynamic of iterated synthesis and analysis, as modes of an underlying "self-
generating system" dynamic
EFTA00623826
3.5 Perspectives ott Machine Consciousness
51
• do so in such a way as to lead to self and attentional focus as emergent structures that serve
as approximate attractors. of this dynamic, over time periods that are long relative to the
basic "cognitive cycle time" of the system's forward-analysis dynamics
To prove the truth of a hypothesis of this nature would seem to require mathematics fairly
far beyond anything that currently exists. Nonetheless, however, we feel it is important to
formulate and discuss such hypotheses, so as to point the way for future investigations both
theoretical and pragmatic.
3.5 Perspectives on Machine Consciousness
Finally, we can't let a chapter on philosophy - even a brief one - end without some discussion
of the thorniest topic in the philosophy of mind: consciousness. Rather than seeking to resolve
or comprehensively review this most delicate issue, we will restrict ourselves to giving it in
Appendix ?? an overview of many of the common views on the subject; and here in the main text
discussing the relationship between consciousness theory, and patternist philosophy of cognition,
the practical work of designing and building AGI.
One fairly concrete idea about consciousness, that relates closely to certain aspects of the
CogPrime design, is that the subjective experience of being conscious of some entity X, is corre-
lated with the presence of a very intense pattern in one's overall mind-state, corresponding to X.
This simple idea is also the essence of neuroscientist Susan Greenfield's theory of consciousness
'Cre01 (but in her theory, "overall mind-state" is replaced with "brain-state"), and has much
deeper historical roots in philosophy of mind which we shall not venture to unravel here.
This observation relates to the idea of "moving bubbles of awareness" in intelligent systems.
If an intelligent system consists of multiple processing or data elements, and during each (suf-
ficiently long) interval of time some of these elements get much more attention than others,
then one may view the system as having a certain "attentional focus" during each interval. The
attentional focus is itself a significant pattern in the system (the pattern being "these elements
habitually get more processor and memory", roughly speaking). As the attentional focus shifts
over time one has a "moving bubble of pattern" which then corresponds experientially to a
"moving bubble of awareness."
This notion of a "moving bubble of awareness" ties in very closely to global workspace
theory II3aa97] (briefly mentioned above), a cognitive theory that has broad support from
neuroscience and cognitive science and has also served as the motivation for Stan Franklin's
LIDA AI system 113I:091, to be discussed in Chapter rt. The global workspace theory views the
mind as consisting of a large population of small, specialized processes - a society of agents.
These agents organize themselves into coalitions, and coalitions that are relevant to contextually
novel phenomena, or contextually important goals, are pulled into the global workspace (which
is identified with consciousness). This workspace broadcasts the message of the coalition to all
the unconscious agents, and recruits other agents into consciousness. Various sorts of contexts
- e.g. goal contexts, perceptual contexts, conceptual contexts and cultural contexts - play a
role in determining which coalitions are relevant, and form the unconscious "background" of
the conscious global workspace. New perceptions are often, but not necessarily, pushed into the
workspace. Some of the agents in the global workspace are concerned with action selection, i.e.
with controlling and passing parameters to a population of possible actions. The contents of
the workspace at any given time have a certain cohesiveness and interdependency, the so-called
EFTA00623827
52
3 A Patternist Philosophy of Mind
"unity of consciousness." In essence the contents of the global workspace form a moving bubble
of attention or awareness.
In CogPrime, this moving bubble is achieved largely via economic attention network (SCAN)
equations IGPI+ 101 that propagate virtual currency between nodes and links representing el-
ements of memories, so that the attentional focus consists of the wealthiest nodes and links.
Figures 3.3 and 3.4 illustrate the existence and flow of attentional focus in OpenCog. On the
other hand, in Hameroff's recent model of the brain 'Hamill], the brain's moving bubble of
attention is achieved through dendro-dendritic connections and the emergent dendritic web.
Perception Action
Feeling Nodes
pnced m(100.50) n'•
%Watt*
Sea
•
•
• •
Sect IS,
•
Specific Objects,
Composite Actions, (some corresponding to
Complex Feelings
named concepts, some not)
able _7
•
tab!*
raise drrn
CD
Fig. 3.3: Graphical depiction of the momentary bubble of attention in the memory of an
OpenCog Al system. Circles and lines represent nodes and links in OpenCogPrimes mem-
ory, and stars denote those nodes with a high level of attention (represented in OpenCog by
the ShortTermlmportance node variable) at the particular point in time.
In this perspective, self, free will and reflective consciousness are specific phenomena occur-
ring within the moving bubble of awareness. They are specific ways of experiencing awareness,
corresponding to certain abstract types of physical structures and dynamics, which we shall
endeavor to identify in detail in Appendix ??.
EFTA00623828
3.6 Postscript: Formalizing Pattern
53
Perception Action
Feeling Nodes
pixel at (100,50) is
RED at
1:42:01.•
Secs 15,2006
•
•
•
•
•
p
Specific Objects,
Composite Actions, (some corresponding to
Complex Feelings
named concepts, some not)
•
table
ter
raise arm
Fig. 3.4: Graphical depiction of the momentary bubble of attention in the memory of an
OpenCog AI system, a few moments after the bubble shown in Figure 3.3, indicating the mov-
ing of the bubble of attention. Depictive conventions are the same as in Figure 1. This shows
an idealized situation where the declarative knowledge remains invariant from one moment to
the next but only the focus of attention shifts. In reality both will evolve together.
3.6 Postscript: Formalizing Pattern
Finally, before winding up our very brief tour through patternist philosophy of mind, we will
briefly visit patternism's more formal side. Many of the key aspects of patternism have been
rigorously formalized. Here we give only a few very basic elements of the relevant mathematics,
which will be used later on in the exposition of CogPrime. (Specifically, the formal definition of
pattern emerges in the CogPrime design in the definition of a fitness function for "pattern min-
ing" algorithms and Occam-based concept creation algorithms, and the definition of intensional
inheritance within PLN.)
We give some definitions, drawn from Appendix 1 of V;oe06al:
Definition 1 Given a metric space (Al, d), and two functions c : M r [0, co] (the "simplicity
measure) and F :
M (the 'reduction relationship"), we say that P E ell is a pattern
in X E M to the degree
EFTA00623829
54
3 A Patternist Philosophy of Mind
17;
(( i
d(Pc(r)iX))c(Xc)(—x ;(72)) ÷
(X)
This degree is called the pattern intensity of P in X. It quantifies the extent to which P
is a pattern in X. Supposing that F(P) = X, then the first factor in the definition equals 1,
and we are left with only the second term, which measures the degree of compression obtained
via representing X as the result of P rather than simply representing X directly% The greater
the compression ratio obtained via using P to represent X, the greater the intensity of P as a
pattern in X. The first time, in the case F(P) 0 X, adjusts the pattern intensity downwards to
account for the amount of error with which F(P) approximates 0 X. If one holds the second
factor fixed and thinks about varying the first factor, then: The greater the error, the lossier
the compression, and the lower the pattern intensity.
For instance, if one wishes one may take c to denote algorithmic information measured on
some reference Turing machine, and F(X) to denote what appears on the second tape of a
two-tape Turing machine t time-steps after placing X on its first tape. Other more naturalistic
computational models are also possible here and are discussed extensively in Appendix 1 of
roential.
Definition 2 The structure of X E M is the fuzzy set Stx defined via the membership
function
XSix(P)= tX
This lets us fornialize our definition of "mind" alluded to above: the mind of X as the set
of patterns associated with X. We can formalize this, for instance, by considering P to belong
to the mind of X if it is a pattern in some Y that includes X. There are then two numbers
to look at: tlie and P(YIX) (the percentage of Y that is also contained in X). To define the
degree to which P belongs to the mind of X we can then combine these two numbers using
some function f that is monotone increasing in both arguments. This highlights the somewhat
arbitrary, semantics of "of" in the phrase "the mind of X." Which of the patterns binding X to
its environment are part of X's mind, and which are part of the world? This isn't necessarily
a good question, and the answer seems to depend on what perspective you choose, represented
formally in the present framework by what combination function f you choose (for instance if
f (a, b) = or62'
then it depends on the choice of 0 < r < 1).
Next, we can formalize the notion of a "pattern space" by positing a metric on patterns, thus
making pattern space a metric space, which will come in handy in some places in later chapters:
Definition 3 Assuming Al is a countable space, the structural distance is a metric ds,
defined on M via
dst(X,Y)=T(XStx,Xsiy)
where T is the Tanimoto distance.
The Tanimoto distance between two real vectors A and B is defined as
T(A, B)
HAH2
A • B
+11B112 — A • B
and since M is countable this can be applied to fuzzy sets such as Stx via considering the
latter as vectors. (As an aside, this can be generalized to uncountable Al as well, but we will
not require this here.)
EFTA00623830
3.6 Postscript: Formalizing Pattern
55
Using this definition of pattern, combined with the formal theory of intelligence given in
Chapter 7, one may formalize the various hypotheses made in the previous section, regarding
the emergence of different kinds of networks and structures as patterns in intelligent systems.
However, it appears quite difficult to prove the formal versions of these hypotheses given current
mathematical tools, which renders such formalizations of limited use.
Finally, consider the case where the metric space Al has a partial ordering < on it; we may
then define
Definition 3.1. R E Al is a subpattern in X E Al to the degree
fpem true(R < P)dej:,
Kx
L'em dt37
This degree is called the subpattern intensity of P in X.
Roughly speaking, the subpattern intensity measures the percentage of patterns in X that
contain R (where "containment" is judged by the partial ordering c). But the percentage is
measured using a weighted average, where each pattern is weighted by its intensity as a pattern
in X. A subpattern may or may not be a pattern on its own. A nonpattern that happens to
occur within many patterns may be an intense subpattern.
Whether the subpatterns in X are to be considered part of the "mind" of X is a somewhat
superfluous question of semantics. Here we choose to extend the definition of mind given in
rocOtial to include subpatterns as well as patterns, because this makes it simpler to describe
the relationship between hypersets and minds, as we will do in Appendix ??.
EFTA00623831
EFTA00623832
Chapter 4
Brief Survey of Cognitive Architectures
4.1 Introduction
While we believe CogPrime is the most thorough attempt at an architecture for advanced AGI,
to date, we certainly recognize there have been many valuable attempts in the past with similar
aims; and we also have great respect for other AGI efforts occurring in parallel with Cog-
Prime development, based on alternative, sometimes overlapping, theoretical presuppositions
and practical choices. In most of this book we will ignore these other current and historical
efforts except where they are directly useful for CogPrime - there are many literature reviews
already published, and this is a research treatise not a textbook. In this chapter, however, we
will break from this pattern and give a rough high-level overview of the various AGI archi-
tectures at play in the field today. The overview definitely has a bias toward other work with
some direct relevance to CogPrime, but not an overwhelming bias; we also discuss a number of
approaches that are unrelated to, and even in some cases conceptually orthogonal to, our own.
CogPrime builds on prior AI efforts in a variety of ways. Most of the specific algorithms
and structures in CogPrime have their roots in prior Al work; and in addition, the CogPrime
cognitive architecture has been heavily inspired by some other holistic cognitive architectures,
especially (but not exclusively) MicroPsi Ilkkenpl, LIDA 1131:091 and DeSTIN IARK09a, ARC09J.
In this chapter we will briefly review some existing cognitive architectures, with especial but
not exclusive emphasis on the latter three.
We will articulate some rough mappings between elements of these other architectures and
elements of CogPrime - some in this chapter, and some in Chapter 5. However, these mappings
will mostly be left informal and very incompletely specified. The articulation of detailed inter-
architecture mappings is an important project, but would be a substantial additional project
going well beyond the scope of this book. We will not give a thorough review of the similarities
and differences between CogPrime and each of these architectures, but only mention some of
the highlights.
The reader desiring a more thorough review of cognitive architectures is referred to Wlodek
Duch's review paper from the AGI-08 conference IDOP081; and also to Alexei Samsonovich's
review paper [SamIOj, which compares a number of cognitive architectures in terms of a feature
checklist, and was created collaboratively with the creators of the architectures.
Duch, in his survey of cognitive architectures IDOP081, divides existing approaches into three
paradigms - symbolic, emergentist and hybrid - as broadly indicated in Figure 4.1. Drawing on
his survey and updating slightly, we give here some key examples of each, and then explain why
57
EFTA00623833
58
4 Brief Survey of Cognitive Architectures
CogPrime represents a significantly more effective approach to embodied human-like general
intelligence. In our treatment of emergentist architectures, we pay particular attention to devel-
opmental robotics architectures, which share considerably with CogPrime in terms of underlying
philosophy, but differ via not integrating a symbolic "language and inference" component such
as CogPrime includes.
In brief, we believe that the hybrid approach Ls the most pragmatic one given the current state
of AI technology, but that the emergentist approach gets something fundamentally right, by
focusing on the emergence of complex dynamics and structures from the interactions of simple
components. So CogPrime is a hybrid architecture which (according to the cognitive synergy
principle) binds its components together very tightly dynamically, allowing the emergence of
complex dynamics and structures in the integrated system. Most other hybrid architectures are
less tightly coupled and hence seem ill-suited to give rise to the needed emergent complexity. The
other hybrid architectures that do possess the needed tight coupling, such as MicroPsi 1Bac09],
strike us as underdeveloped and founded on insufficiently powerful learning algorithms.
Mao*
Amory
•
•
Rule-based memory
•
•
Globabse memory
lizabst memory
•
•
Localunchambuted
Symbolic -comedienne
Graph-baud memory
r
Learning
Learning
Learning
•
•
Indecent leammg
Analytical learning
•
•
Anocutive 'taming
Competence learning
•
•
Bence:I-up kneeing
Top-down kneeing
Fig. 4.1: Duch's simplified taxonomy of cognitive architectures. CogPrime falls into the "hy-
brid" category, but differs from other hybrid architectures in its focus on synergetic interactions
between components and their potential to give rise to appropriate system-wide emergent struc-
tures enabling general intelligence.
4.2 Symbolic Cognitive Architectures
A venerable tradition in Al focuses on the physical symbol system hypothesis INew901, which
states that minds exist mainly to manipulate symbols that represent aspects of the world or
themselves. A physical symbol system has the ability to input, output, store and alter symbolic
entities, and to execute appropriate actions in order to reach its goals. Generally, symbolic
cognitive architectures focus on "working memory" that draws on long-term memory as needed,
and utilize a centralized control over perception, cognition and action. Although in principle
such architectures could be arbitrarily capable (since symbolic systems have universal repre-
EFTA00623834
4.2 Symbolic Cognitive Architectures
59
sentational and computational power, in theory), in practice symbolic architectures tend to be
weak in learning, creativity, procedure learning, and episodic and associative memory. Decades
of work in this tradition have not resolved these issues, which has led many researchers to
explore other options. A few of the more important symbolic cognitive architectures are:
• SOAR II.RN81, a classic example of expert rule-based cognitive architecture designed to
model general intelligence. It has recently been extended to handle sensorimotor functions,
though in a somewhat cognitively unnatural way; and is not yet strong in areas such as
episodic memory, creativity, handling uncertain knowledge, and reinforcement learning.
• ACT-R IA1.031 is fundamentally a symbolic system, but Duch classifies it as a hybrid sys-
tem because it incorporates connectionist-style activation spreading in a significant role; and
there is an experimental thoroughly connectionist implementation to complement the pri-
mary mainly-symbolic implementation. Its combination of SOAR-style "production rules"
with large-scale connectionist dynamics allows it to simulate a variety of human psycholog-
ical phenomena, but abstract reasoning, creativity and transfer learning are still missing.
• EPIC IRM0'', a cognitive architecture aimed at capturing human perceptual, cognitive
and motor activities through several interconnected processors working in parallel. The
system is controlled by production rules for cognitive processors and a set of perceptual
(visual, auditory, tactile) and motor processors operating on symbolically coded features
rather than raw sensory, data. It has been connected to SOAR for problem solving, planning
and learning,
• ICARUS ILan051, an integrated cognitive architecture for physical agents, with knowledge
specified in the form of reactive skills, each denoting goal-relevant reactions to a class of
problems. The architecture includes a number of modules: a perceptual system, a planning
system, an execution system, and several memory systems. Concurrent processing is absent,
attention allocation is fairly crude, and uncertain knowledge is not thoroughly handled.
• SNePS (Semantic Network Processing System) [SE07J is a logic, frame and network-based
knowledge representation, reasoning, and acting system that has undergone over three
decades of development. While it has been used for some interesting prototype experi-
ments in language processing and virtual agent control, it has not yet been used for any
large-scale or real-world application.
• Cyc [I,C901 is an AGI architecture based on predicate logic as a knowledge representation,
and using logical reasoning techniques to answer questions and derive new knowledge from
old. It has been connected to a natural language engine, and designs have been created
for the connection of Cyc with Albus's 4D-RCS
Cyc's most unique aspect is the
large datah•so of commonsense knowledge that Cycorp has accumulated (millions of pieces
of knowledge, entered by specially trained humans in predicate logic format); part of the
philosophy underlying Cyc is that once a sufficient quantity of knowledge is accumulated in
the knowledge base, the problem of creating human-level general intelligence will become
much less difficult due to the ability to leverage this knowledge.
While these architectures contain many valuable ideas and have yielded some interesting results,
we feel they are incapable on their own of giving rise to the emergent structures and dynamics
required to yield humanlike general intelligence using feasible computational resources. However,
we are more sanguine about the possibility of ideas and components from symbolic architectures
playing a role in human-level AGI via incorporation in hybrid architectures.
We now review a few symbolic architectures in slightly more detail.
EFTA00623835
60
4 Brief Survey of Cognitive Architectures
4.2.1 SOAR
The cognitive architectures best known among AI academics are probably Soar and ACT-R,
both of which are explicitly being developed with the dual goals of creating human-level AGI
and modeling all aspects of human psychology. Neither the Soar nor ACT-R communities feel
themselves particularly near these long-term goals, yet they do take them seriously.
Soar is based on IF-THEN rules, otherwise known as "production rules." On the surface this
makes it similar to old-style expert systems, but Soar is much more than an expert system; it's
at minimum a sophisticated problem-solving engine. Soar explicitly conceives problem solving
as a search through solution space for a "goal state" representing a (precise or approximate)
problem solution. It uses a methodology of incremental search, where each step is supposed to
move the system a little closer to its problem-solving goal, and each step involves a potentially
complex "decision cycle."
In the simplest case, the decision cycle has two phases:
• Gathering appropriate information from the system's long-term memory (LTM) into its
working memory (WM)
• A decision procedure that uses the gathered information to decide an action
If the knowledge available in LTM isn't enough to solve the problem, then the decision
procedure invokes search heuristics like hill-climbing, which try to create new knowledge (new
production rules) that will help move the system closer to a solution. If a solution is found by
chaining together multiple production rules, then a chunking mechanism is used to combine
these rules together into a single rule for future use. One could view the chunking mechanism
as a way of converting explicit knowledge into implicit knowledge, similar to "map formation"
in CogPrime (see Chapter 42 of Part 2), but in the current Soar design and implementation it
is a fairly crude mechanism.
In recent years Soar has acquired a number of additional methods and modalities, including
some visual reasoning methods and some mechanisms for handling episodic and procedural
knowledge. These expand the scope of the system but the basic production rule and chunking
mechanisms as briefly described above remain the core "cognitive algorithm" of the system.
From a CogPrime perspective, what Soar offers is certainly valuable, e.g.
• heuristics for transferring knowledge from LTM into WM
• chaining and chunking of implications
• methods for interfacing between other forms of knowledge and implications
However, a very short and very partial list of the major differences between Soar and Cog-
Prime would include
• CogPrime contains a variety of other core cognitive mechanisms beyond the management
and chunking of implications
• the variety of "chunking" type methods in CogPrime goes far beyond the sort of localized
chunking done in Soar
• CogPrime is committed to representing uncertainty at the base level whereas Soar's pro-
duction rules are crisp
• The mechanisms for LTM-WM interaction are rather different in CogPrime, being based
on complex nonlinear dynamics as represented in Economic Attention Allocation (ECAN)
• Currently Soar does not contain creativity-focused heuristics like blending or evolutionary,
learning in its core cognitive dynamic.
EFTA00623836
4.2 Symbolic Cognitive Architectures
61
4.2.2 ACT-R
In the grand scope of cognitive architectures, ACT-R is quite similar to Soar, but there are
many micro-level differences. ACT-R is defined in terms of declarative and procedural knowl-
edge, where procedural knowledge takes the form of Soar-like production rules. and declarative
knowledge takes the form of chunks. It contains a variety of mechanisms for learning new rules
and chunks from old; and also contains sophisticated probabilistic equations for updating the
activation levels associated with items of knowledge (these equations being roughly analogous
in function to, though quite different from, the ECAN equations in CogPrime).
Figure 4.2 displays the current architecture of ACT-R. The flow of cognition in the system is
in response to the current goal, currently active information from declarative memory, informa-
tion attended to in perceptual modules (vision and audition are implemented), and the current
state of motor modules (hand and speech are implemented). The early work with ACT-R was
based on comparing system performance to human behavior, using only behavioral measures,
such as the timing of keystrokes or patterns of eye movements. Using such measures, it was not
possible to test detailed assumptions about which modules were active in the performance of
a task. More recently the ACT-R community has been engaged in a process of using imaging
data to provide converging data on module activity. Figure 4.3 illustrates the assnriations they
have made between the modules in Figure 4.2 and brain regions. Coordination among all of
these components occurs through actions of the procedural module, which is mapped to the
basal ganglia.
miggaii Mirka
'NNW
Fig. 4.2: High-level architecture of ACT-II
In practice ACT-R, even more so than Soar, seems to be used more as a programming
framework for cognitive modeling than as an Al system. One can fairly easily use ACT-II
to program models of specific human mental behaviors, which may then be matched against
EFTA00623837
62
4 Brief Survey of Cognitive Architectures
Fig. 4.3: Conjectured Mapping Between ACT-It and the Brain
psychological data. Opinions differ as to whether this sort of modeling is valuable for achieving
AGI goals. CogPrime is not designed to support this kind of modeling, as it intentionally does
many things very differently from humans.
ACT-I1 in its original form did not say much about perceptual and motor operations, but
recent versions have incorporated EPIC, an independent cognitive architecture focused on mod-
eling these aspects of human behavior.
4.2.5 Cyc and Texai
Our review of cognitive architectures would be incomplete without mentioning Cyc ILG90],
one of the best known and best funded AGI-oriented projects in history. While the main focus
of the Cyc project has been on the hand-coding of large amounts of declarative knowledge,
there is also a cognitive architecture of sorts there. The center of Cyc is an engine for logical
deduction, acting on knowledge represented in predicate logic. A natural language engine has
been associated with the logic engine, which enables one to ask English questions and get
English replies.
Stephen Reed, while an engineer at Cycorp, designed a perceptual-motor front end for Cyc
based on James Albus' Reference Model Architecture; the ensuing system, called Cognitive-
Cyc, would have been the first full-fledged cognitive architecture based on Cyc, but was not
implemented. Reed left Cycorp and is now building a system called Texai, which has many
similarities to Cyc (and relies upon the OpenCyc knowledge base, a subset of Cyc's overall
knowledge base), but incorporates a CognitiveCyc style cognitive architecture.
EFTA00623838
4.2 Symbolic Cognitive Architectures
63
4.2.4 NA RS
Pei Wang's NARS logic tWati061 played a large role in the development of PLN, CogPrime's
uncertain logic component, a relationship that is discussed in depth in IC11111081 and won't
be re-emphasized here. However, NARS is more than just an uncertain logic, it is also an
overall cognitive architecture (which is centered on NARS logic, but also includes other aspects).
CogPrime bears little relation to NARS except in the specific similarities between PLN logic
and NARS logic, but, the other aspects of NARS are worth briefly recounting here.
NARS is formulated as a system for processing tasks, where a task consists of a question or a
piece of new knowledge. The architecture is focused on declarative knowledge, but some pieces
of knowledge may be associated with executable procedures, which allows NARS to carry out
control activities (in roughly the same way that a Prolog program can).
At any given time a NARS system contains
• working memory: a small set of tasks which are active, kept for a short time, and closely
related to new questions and new knowledge
• long-term memory: a huge set of knowledge which is passive, kept for a long time, and not
necessarily related to current questions and knowledge
The working and long term memory spaces of NARS may each be thought of as a set of
chunks, where each chunk consists of a set of tasks and a set of knowledge. NARS's basic
cognitive process is:
1. choose a chunk
2. choose a task from that chunk
3. choose a piece of knowledge from that chunk
4. use the task and knowledge to do inference
5. send the new tasks to corresponding chunks
Depending on the nature of the task and knowledge, the inference involved may be one of
the following:
• if the task is a question, and the knowledge happens to be an answer to the question, a
copy of the knowledge is generated as a new task
• backward inference
• revision (merging two pieces of knowledge with the same form but different truth value)
• forward inference
• execution of a procedure associated with a piece of knowledge
Unlike many other systems, NARS doesn't decide what type of inference is used to process
a task when the task is accepted, but works in a data-driven way - that is, it is the task and
knowledge that dynamically determine what type of inference will be carried out
The "choice" processes mentioned above are done via assigning relative priorities to
• chunks (where they are called activity)
• tasks (where they are called urgency)
• knowledge (where they are called importance)
EFTA00623839
64
4 Brief Survey of Cognitive Architectures
and then distributing the system's resources accordingly, based on a probabilistic algorithm.
(It's interesting to note that while NARS uses probability theory as part of its control mecha-
nism, the logic it uses to represent its own knowledge about the world is nonprobabilistic. This
is considered conceptually consistent, in the context of NARS theory, because system control
is viewed as a domain where the system's knowledge is more complete, thus more amenable to
probabilistic reasoning.)
4.2.5 GLAIR and SNePS
Another logic-focused cognitive architecture, very different from NARS in detail, is Stuart
Shapiro's GLAIR cognitive architecture, which is centered on the SNePS paraconsistent logic
ISE071.
Like NAM the core "cognitive loop" of GLAIR is based on reasoning: either thinking about
some percept (e.g. linguistic input, or sense data from the virtual or physical world), or answer-
ing some question. This inference based cognition process is turned into an intelligent agent
control process via coupling it with an acting component, which operates according to a set of
policies, each one of which tells the system when to take certain internal or external actions
(including internal reasoning actions) in response to its observed internal and external situation.
GLAIR contains multiple layers:
• the Knowledge Layer (KL), which contains the beliefs of the agent, and is where reasoning,
planning, and act selection are performed
• the Sensori-Actuator Layer (SAL), contains the controllers of the sensors and effectors of
the hardware or software robot.
• the Perceptuo-Motor Layer (PML), which grounds the KL symbols in perceptual structures
and subconscious actions, contains various registers for providing the agent's sense of situ-
atedness in the environment, and handles translation and communication between the KL
and the SAL.
The logical Knowledge Layer incorporates multiple memory types using a common represen-
tation (including declarative, procedural, episodic, attentions] and intentional knowledge, and
meta-knowledge). To support this broad range of knowledge types, a broad range of logical in-
ference mechanisms are used, so that the KL may be variously viewed as predicate logic based,
frame based. semantic network based, or from other perspectives.
What makes GLAIR more robust than most logic based Al approaches is the novel pars-
consistent logical formalism used in the knowledge base, which means (among other things)
that uncertain, speculative or erroneous knowledge may exist in the system's memory without
leading the system to create a broadly erroneous view of the world or carry out egregiously
unintelligent actions. CogPrime is not thoroughly logic-focused like GLAIR is, but in its logical
aspect it seeks a similar robustness through its use of PLN logic, which embodies properties
related to paraconsistency.
Compared to CogPrime, we see that GLAIR has a similarly integrative approach, but that
the integration of different sorts of cognition is done more strictly within the framework of
logical knowledge representation.
EFTA00623840
4.3 Emergentist Cognitive Architectures
65
4.3 Emergentist Cognitive Architectures
Another species of cognitive architecture expects abstract symbolic processing to emerge from
lower-level "subsymbolic" dynamics, which sometimes (but not always) are designed to simu-
late neural networks or other aspects of human brain function. These architectures are typically
strong at recognizing patterns in high-dimensional data, reinforcement learning and associative
memory; but no one has yet shown how to achieve high-level functions such as abstract reason-
ing or complex language processing using a purely subsymbolic approach. A few of the more
important subsymbolic, emergentist cognitive architectures are:
• DeSTIN IARK09a, ARCO9], which is part of CogPrime, may also be considered as an
autonomous AGI architecture, in which case it Ls emergentist and contains mechanisms
to encourage language, high-level reasoning and other abstract aspects of intelligent to
emerge from hierarchical pattern recognition and related self-organizing network dynamics.
In CogPrime DeSTIN is used as part of a hybrid architecture, which greatly reduces the
reliance on DeSTIN's emergent properties.
• Hierarchical Temporal Memory (HTM) [11130(;J is a hierarchical temporal pattern
recognition architecture, presented as both an AI approach and a model of the cortex. So
far it has been used exclusively for vision processing and we will discuss its shortcomings
later in the context of our treatment of DeSTIN.
• SAL I.I1.081, based on the earlier and related IBCA (Integrated Biologically-based Cog-
nitive Architecture) is a large-scale emergent architecture that seeks to model distributed
information processing in the brain, especially the posterior and frontal cortex and the
hippocampus. So far the architectures in this lineage have been used to simulate various
human psychological and psycholinguistic behaviors, but haven't been shown to give rise to
higher-level behaviors like reasoning or subgoaling.
• NOMAD (Neurally Organized Mobile Adaptive Device) automata and its successors
[K EMI are based on Edelman's "Neural Darwinism" model of the brain, and feature large
numbers of simulated neurons evolving by natural selection into configurations that carry
out sensorimotor and categorization tasks. The emergence of higher-level cognition from
this approach seems rather unlikely.
• Ben Kuipers and his colleagues INIK07, MKOS, NIKOalhave pursued an extremely innovative
research program which combines qualitative reasoning and reinforcement learning to enable
an intelligent agent to learn how to act, perceive and model the world. Kuipers' notion of
"bootstrap learning" involves allowing the robot to learn almost everything about its world,
including for instance the structure of 3D space and other things that humans and other
animals obtain via their genetic endowments. Compared to Kuipers' approach, CogPrime
falls in line with most other approaches which provide more "hard-wired" structure, following
the analogy to biological organisms that are born with more innate biases.
There is also a set of emergentist architectures focused specifically on developmental robotics,
which we will review below in a separate subsection, as all of these share certain common
characteristics.
Our general perspective on the emergentist approach is that it is philosophically correct
but currently pragmatically inadequate. Eventually, some emergentist approach could surely
succeed at giving rise to humanlike general intelligence - the human brain, after all, is plainly
an emergentist system. However, we currently lack understanding of how the brain gives rise
to abstract reasoning and complex language, and none of the existing emergentist systems
EFTA00623841
66
4 Brief Survey of Cognitive Architectures
seem remotely capable of giving rise to such phenomena. It seems to us that the creation of
a successful emergentist ACT will have to wait for either a detailed understanding of how the
brain gives rise to abstract thought, or a much more thorough mathematical understanding of
the dynamics of complex self-organizing systems.
The concept of cognitive synergy is more relevant to emergentist than to symbolic archi-
tectures. In a complex emergentist architecture with multiple specialized components, much of
the emergence is expected to arise via synergy between different richly interacting components.
Symbolic systems, at least in the forms currently seen in the literature, seem less likely to give
rise to cognitive synergy as their dynamics tend to be simpler. And hybrid systems, as we shall
see, are somewhat diverse in this regard: some rely heavily on cognitive synergies and others
consist of more loosely coupled components.
We now review the DeSTIN emergentist architecture in more detail, and then turn to the
developmental robotics architectures.
44.1 DeSTIN: A Deep Reinforcement Learning Approach to AGI
The DeSTIN architecture, created by hamar Arel and his colleagues, addresses the problem
of general intelligence using hierarchical spatiotemporal networks designed to enable scalable
perception, state inference and reinforcement-learning-guided action in real-world environments.
DeSTIN has been developed with the plan of gradually extending it into a complete system for
humanoid robot control, founded on the same qualitative information-processing principles as
the human brain (though without striving for detailed biological realism). However, the practical
work with DeSTIN to date has focused on visual and auditory processing; and in the context of
the present proposal, the intention is to utilize DeSTIN for perception and actuation oriented
processing, hybridizing it with CogPrime which will handle abstract cognition and language.
Here we will discuss DeSTIN primarily in the perception context, only briefly mentioning the
application to actuation which is conceptually similar.
In DeSTIN (see Figure 4.4), perception is carried out by a deep spatiotemporal inference
network, which is connected to a similarly architected critic network that provides feedback on
the inference network's performance, and an action network that controls actuators based on the
activity in the inference network (Figure 4.5 depicts a standard action hierarchy, of which the
hierarchy in DeSTIN is an example). The nodes in these networks perform probabilistic pattern
recognition according to algorithms to be described below; and the nodes in each of the networks
may receive states of nodes in the other networks as inputs, providing rich intercormectivity
and synergetic dynamics.
4.3.1.1 Deep versus Shallow Learning for Perceptual Data Processing
The most critical feature of DeSTIN is its uniquely robust approach to modeling the world
based on perceptual data. Mimicking the efficiency and robustness by which the human brain
analyzes and represents information has been a core challenge in Al research for decades. For
instance, humans are exposed to massive amounts of visual and auditory data every second
of every day, and are somehow able to capture critical aspects of it in a way that allows for
appropriate future recollection and action selection. For decades, it has been known that the
EFTA00623842
4.3 Emergentist Cognitive Architectures
67
Inferred state
Action/
Correction
Actor
Actions
Rewards
Deep
Learning
System
(state
inference)
1
Observations
Environment
Fig. 4.4: High-level architecture of DeSTIN
brain is a massively parallel fabric, in which computation processes and memory, storage are
highly distributed. But massive parallelism is not in itself a solution - one also needs the right
architecture; which DeSTIN provides, building on prior work in the area of deep learning.
Humanlike intelligence is heavily adapted to the physical environments in which humans
evolved; and one key aspect of sensory data coming from our physical environments is its
hierarchical structure. However, most machine learning and pattern recognition systems are
"shallow" in structure, not explicitly incorporating the hierarchical structure of the world in
their architecture. In the context of perceptual data processing, the practical result of this is
the need to couple each shallow learner with a pre-processing stage, wherein high-dimensional
sensory signals are reduced to a lower-dimension feature space that can be understood by the
shallow learner. The hierarchical structure of the world is thus crudely captured in the hierarchy
of "preprocessor plus shallow learner." In this sort of approach, much of the intelligence of the
system shifts to the feature extraction process, which is often imperfect and always application-
domain specific.
Deep machine learning has emerged as a more promising framework for dealing with complex,
high-dimensional real-world data. Deep learning systems possess a hierarchical structure that
intrinsically biases them to recognize the hierarchical patterns present in real-world data. Thus,
they hierarchically form a feature space that is driven by regularities in the observations, rather
than by hand-crafted teclmiques. They also offer robustness to many of the distortions and
transformations that characterize real-world signals, such as noise, displacement, scaling, etc.
Deep belief networks IHOT0GI and Convolutional Neural Networks II,BDE901 have been
demonstrated to successfully address pattern inference in high dimensional data (e.g. images).
They owe their success to their underlying paradigm of partitioning large data structures into
smaller, more manageable units, and discovering the dependencies that may or may not exist
EFTA00623843
68
4 Brief Survey of Cognitive Architectures
Hierarchical control system
top level
node
sensations.
results
actuator
sensation
nsoriaetuator
Controlled system, contro led process, or environment
Fig. 4.5: A standard, general-purpose hierarchical control architecture. DeSTIN's control hi-
erarchy exemplifies this architecture, with the difference lying mainly in the DeSTIN control
hierarchy's tight integration with the state inference (perception) and critic (reinforcement)
hierarchies.
between such units. However, this paradigm has its limitations; for instance, these approaches
do not represent temporal information with the same ease as spatial structure. Moreover, some
key constraints are imposed on the learning schemes driving these architectures, namely the
need for layer-by-layer training, and oftentimes pre-training. DeSTIN overcomes the limitations
of prior deep learning approaches to perception processing, and also extends beyond perception
to action and reinforcement learning.
4.3.1.2 DeSTIN for Perception Processing
The hierarchical architecture of DeSTIN's spatiotemporal inference network comprises an ar-
rangement into multiple layers of "nodes" comprising multiple instantiations of an identical
cortical circuit. Each node corresponds to a particular spatiotemporal region, and uses a sta-
tistical learning algorithm to characterize the sequences of patterns that are presented to it by
nodes in the layer beneath it. More specifically,
• At the very lowest layer of the hierarchy nodes receive as input raw data (e.g. pixels of an
image) and continuously construct a belief state that attempts to characterize the sequences
of patterns viewed.
EFTA00623844
4.3 Emergentist Cognitive Architectures
69
• The second layer, and all those above it, receive as input the belief states of nodes at their
corresponding lower layers, and attempt to construct belief states that capture regularities
in their inputs.
• Each node also receives as input the belief state of the node above it in the hierarchy (which
constitutes "contextual" information)
Feedback
(contextual)
signals
P(S' S.C)
P(S" S,C)
P(O1 S')
P(O1 5')
P(S' s,c)
I' C
P(S' PLC)
P(C)IS j
P(S' S.C)
P(O I V)
P(S" IS
P(O I V)
Observation
(e.g. 32x32 image)
Fig. 4.6: Small-scale instantiation of the DeSTIN perceptual hierarchy. Each box represents a
node, which corresponds to a spatiotemporal region (nodes higher in the hierarchy corresponding
to larger regions). 0 denotes the current observation in the region, C is the state of the higher-
layer node, and S and S' denote state variables pertaining to two subsequent time steps. In
each node, a statistical learning algorithm is used to predict subsequent states based on prior
states, current observations, and the state of the higher-layer node.
More specifically, each of the DeSTIN nodes, referring to a specific spacetime region, contains
a set of state variables conceived as clusters, each corresponding to a set of previously-observed
sequences of events. These clusters are characterized by centroids (and are hence assumed
roughly spherical in shape), and each of them comprises a certain "spatiotemporal form" recog-
nized by the system in that region. Each node then contains the task of predicting the likelihood
of a certain centroid being most apropos in the near future, based on the past history of ob-
servations in the node. This prediction may be done by simple probability tabulation, or via
EFTA00623845
70
4 Brief Survey of Cognitive Architectures
application of supervised learning algorithms such as recurrent neural networks. These cluster-
ing and prediction processes occur separately in each node, but the nodes are linked together
via bidirectional dynamics: each node feeds input to its parents, and receives "advice" from its
parents that is used to condition its probability calculations in a contextual way.
These processes are executed formally by the following basic belief update rule, which governs
the learning process and is identical for every node in the architecture. The belief state is a
probability mass function over the sequences of stimuli that the nodes learns to represent.
Consequently, each node is allocated a predefined number of state variables each denoting a
dynamic pattern, or sequence, that is autonomously learned. The DeSTIN update rule maps
the current observation (o), belief state (b), and the belief state of a higher-layer node or context
(c), to a new (updated) belief state (Y), such that
b' (s') = Pr (slo, b , c) — Pr(s'nonbnc)
Pr(onbnc)
alternatively expressed as
(41)
Pr(ols', b, c) Pr (gib, c) Pr (b, c)
9(1) —
(4.2)
Pr (olb, c) Pr (b, c)
Under the assumption that observations depend only on the true state, or Pr(ols', b, c) =
Pr(ols"), we can further simplify the expression such that
9 (1) - Pr(ols') Pr (alb, c)
(4.3)
Pr (olb, c)
where Pr (116, c) = E Pr (893,06 (s), yielding the belief update rule
sES
Pr (old) 5 Pr (s'is, c) b (s)
b' (8')
sES
E Profs") 5 Pr (sills, c) b (s)
s" ES
sES
(4.4)
where S denotes the sequence set (i.e. belief dimension) such that the denominator term is a
normalization factor.
One interpretation of eq. (4.4) would be that the static pattern similarity metric, Pr (old) ,
is modulated by a construct that reflects the system dynamics, Pr (s'is,c). As such, the belief
state inherently captures both spatial and temporal information. In our implementation, the
belief state of the parent node, c, is chosen using the selection rule
c = arg max b p(s),
(4.5)
where by is the belief distribution of the parent node.
A close look at eq. (4.4) reveals that there are two core constructs to be learned, Pr(ols')
and Pr(s'is,c). In the current DeSTIN design, the former is learned via online clustering while
the latter is learned based on experience by inductively learning a rule that predicts the next
state s' given the prior state s and c.
The overall result is a robust framework that autonomously (i.e. with no human engineered
pre-processing of any type) learns to represent complex data patterns, and thus serves the
EFTA00623846
4.3 Emergentist Cognitive Architectures
71
critical role of building and maintaining a model of the state of the world. In a vision processing
context, for example, it allows for powerful unsupervised classification. If shown a variety of
real-world scenes, it will automatically form internal structures corresponding to the various
natural categories of objects shown in the scenes, such as trees, chairs, people, etc.; and also
the various natural categories of events it sees, such as reaching, pointing, falling. And, as will
be discussed below, it can use feedback from DeSTIN's action and critic networks to further
shape its internal world-representation based on reinforcement signals.
Benefits of DeSTIN for Perception Processing
DeSTIN's perceptual network offers multiple key attributes that render it more powerful than
other deep machine learning approaches to sensory data processing:
1. The belief space that is formed across the layers of the perceptual network inherently
captures both spatial and temporal regularities in the data. Given that many applications
require that temporal information be discovered for robust inference, this is a key advantage
over existing schemes.
2. Spatiotemporal regularities in the observations are captured in a coherent manner (rather
than being represented via two separate mechanisms)
3. All processing is both top-down and bottom-up, and both hierarchical and heterarchical,
based on nonlinear feedback connections directing activity and modulating learning in mul-
tiple directions through DeSTIN's cortical circuits
4. Support for multi-modal fusing is intrinsic within the framework, yielding a powerful state
inference system for real-world, partially-observable settings.
5. Each node is identical, which makes it easy to map the design to massively parallel platforms,
such as graphics processing units.
Points 2-4 in the above list describe how DeSTIN's perceptual network displays its own
"cognitive synergy" in a way that fits naturally into the overall synergetic dynamics of the overall
CogPrime architecture. Using this cognitive synergy, DeSTIN's perceptual network addresses
a key aspect of general intelligence: the ability to robustly infer the state of the world, with
which the system interacts, in an accurate and timely manner.
4.3.1.3 DeSTIN for Action and Control
DeSTIN's perceptual network performs unsupervised world-modeling, which is a critical aspect
of intelligence but of course is not the whole story. DeSTIN's action network, coupled with the
perceptual network, orchestrates actuator commands into complex movements, but also carries
out other functions that are more cognitive in nature.
For instance, people learn to distinguish between cups and bowls in part via hearing other
people describe some objects as cups and others as bowls. 'lb emulate this kind of learning,
DeSTIN's critic network provides positive or negative reinforcement signals based on whether
the action network has correctly identified a given object as a cup or a bowl, and this signal
then impacts the nodes in the action network. The critic network takes a simple external "degree
of success or failure" signal and turns it into multiple reinforcement signals to be fed into the
multiple layers of the action network. The result Ls that the action network self-organizes so
EFTA00623847
72
4 Brief Survey of Cognitive Architectures
as to include an implicit "cup versus bowl" classifier, whose inputs are the outputs of some of
the nodes in the higher levels of the perceptual network. This classifier belongs in the action
network because it is part of the procedure by which the DeSTIN system carries out the action
of identifying an object as a cup or a bowl.
This example illustrates how the learning of complex concepts and procedures is divided
fluidly between the perceptual network, which builds a model of the world in an unsupervised
way, and the action network, which learns how to respond to the world in a manner that will
receive positive reinforcement from the critic network.
4.5.2 Developmental Robotics Architectures
A particular subset of emergentist cognitive architectures are sufficiently important that we
consider them separately here: these are developmental robotics architectures, focused on con-
trolling robots without significant "hard-wiring" of knowledge or capabilities, allowing robots
to learn (and learn how to learn, etc.) via their engagement with the world. A significant focus
is often placed here on "intrinsic motivation," wherein the robot explores the world guided by
internal goals like novelty or curiosity, forming a model of the world as it goes along, based
on the modeling requirements implied by its goals. Many of the foundations of this research
area were laid by Juergen Schmidhuber's work in the 1990s ifich9lb, Sch91a, Sch95, Schq, but
now with more powerful computers and robots the area is leading to more impressive practical
demonstrations.
We mention here a handful of the important initiatives in this area:
• Juyang Weng's Day [Hz-p-021 and SAIL INVIIZ+001 projects involve mobile robots that
explore their environments autonomously, and learn to carry, out simple tasks by building up
their own world-representations through both unsupervised and teacher-driven processing
of high-dimensional sensorimotor data The underlying philosophy is based on human child
development IWII061, the knowledge representations involved are neural network based,
and a number of novel learning algorithms are involved, especially in the area of vision
processing.
• FLOWERS [13O09j, an initiative at the French research institute INRIA, led by Pierre-
Yves Oudeyer, is also based on a principle of trying to reconstruct the processes of devel-
opment of the human child's mind, spontaneously driven by intrinsic motivations. Kaplan
[Kap0sj has taken this project in a direction closely related to our own via the creation
of a "robot playroom." Experiential language learning has also been a focus of the project
1OK061, driven by innovations in speech understanding.
• IM-CLEVER', a new European project coordinated by Gianluca Baldassarre and con-
ducted by a large team of researchers at different institutions, is focused on creating software
enabling an iCub IMSV+081 humanoid robot to explore the environment and learn to carry
out human childlike behaviors based on its own intrinsic motivations. As this project is the
closest to our own we will discuss it in more depth below.
Like CogPrime, IM-CLEVER is a humanoid robot intelligence architecture guided by intrin-
sic motivations, and using hierarchical architectures for reinforcement learning and sensory ab-
http //im-clever noze it/project/project-description
EFTA00623848
4.4 Hybrid Cognitive Architectures
73
stract ion. IM-CLEVER's motivational structure is based in part on Schmidhuber's information-
theoretic model of curiosity [SeIi061; and CogPrime's Psi-based motivational structure utilizes
probabilistic measures of novelty, which are mathematically related to Schmidhuber's mea-
sures. On the other hand, IM-CLEVER's use of reinforcement learning follows Schmidhuber's
earlier work RL for cognitive robotics IBS04, 13ZGS06], Barto's work on intrinsically motivated
reinforcement learning ISB06, SM051, and Lee's ILMC07b, LMCO7aJ work on developmental
reinforcement learning; whereas CogPrime's assemblage of learning algorithms is more diverse,
including probabilistic logic, concept blending and other symbolic methods (in the OCP compo-
nent) as well as more conventional reinforcement learning methods (in the DeSTIN component).
In many respects IM-CLEVER bears a moderately strong resemblance to DeSTIN, whose
integration with CogPrime is discussed in Chapter 26 of Part 2 (although IM-CLEVER has
much more focus on biological realism than DeSTIN). Apart from numerous technical differ-
ences, the really big distinction between IM-CLEVER and CogPrime is that in the latter we
are proposing to hybridize a hierarchical-abstraction/reinforcement-learning system (such as
DeSTIN) with a more abstract symbolic cognition engine that explicitly handles probabilistic
logic and language. IM-CLEVER lacks the aspect of hybridization with a symbolic system, tak-
ing more of a pure emergentist strategy. Like DeSTIN considered as a standalone architecture
IM-CLEVER does entail a high degree of cognitive synergy, between components dealing with
perception, world-modeling, action and motivation. However, the "emergentist versus hybrid"
is a large qualitative difference between the two approaches.
In all, while we largely agree with the philosophy underlying developmental robotics, our
intuition is that the learning and representational mechanisms underlying the current systems
in this area are probably not powerful enough to lead to human child level intelligence. We
expect that these systems will develop interesting behaviors but fall short of robust preschool
level competency, especially in areas like language and reasoning where symbolic systems have
typically proved more effective. This intuition is what impels us to pursue a hybrid approach,
such as CogPrime. But we do feel that eventually, once the mechanisms underlying brains are
better understood and robotic bodies are richer in sensation and more adept in actuation, some
sort of emergentist, developmental-robotics approach can be successful at creating humanlike,
human-level AGI.
4.4 Hybrid Cognitive Architectures
In response to the complementary strengths and weaknesses of the symbolic and emergentist
approaches. in recent years a number of researchers have turned to integrative, hybrid archi-
tectures. which combine subsystems operating according to the two different paradigms. The
combination may be done in many different ways, e.g. connection of a large symbolic subsystem
with a large subsymbolic system, or the creation of a population of small agents each of which
is both symbolic and subsymbolic in nature.
Nils Nilsson expressed the motivation for hybrid AGI systems very clearly in his article at
the AI-50 conference (which celebrated the 50'th anniversary, of the AI field) INi100]. While
affirming the value of the Physical Symbol System Hypothesis that underlies symbolic AI, he
argues that "the PSSH explicitly assumes that, whenever necessary, symbols will be grounded
in objects in the environment through the perceptual and effector capabilities of a physical
symbol system." Thus, he continues,
EFTA00623849
74
4 Brief Survey of Cognitive Architectures
"I grant the need for non-symbolic processes in some intelligent systems, but I think they sup-
plement rather than replace symbol systems. I know of no examples of reasoning, understanding
language, or generating complex plans that are best understood as being performed by systems
using exclusively non-symbolic processes....
AI systems that achieve human-level intelligence will involve a combination of symbolic and
non-symbolic processing."
A few of the more important hybrid cognitive architectures are:
• CLARION ISZNI is a hybrid architecture that combines a symbolic component for reason-
ing on "explicit knowledge" with a connectionist component for managing "implicit knowl-
edge." Learning of implicit knowledge may be done via neural net, reinforcement learning,
or other methods. The integration of symbolic and subsymbolic methods is powerful, but a
great deal is still missing such as episodic knowledge and learning and creativity. Learning
in the symbolic and subsymbolic portions is carried out separately rather than dynamically
coupled, minimizing "cognitive synergy" effects.
• DUAL INICO II is the mast impressive system to come out of Marvin Minsky's "Society of
Mind" paradigm. It features a population of agents, each of which combines symbolic and
connectionist representation, self-organizing to collectively carry out tasks such as percep-
tion, analogy and associative memory. The approach seems innovative and promising, but
it is unclear how the approach will scale to high-dimensional data or complex reasoning
problems due to the lack of a more structured high-level cognitive architecture.
• LIDA [13F0!9 is a comprehensive cognitive architecture heavily based on Bernard Baars'
"Global Workspace Theory". It articulates a "cognitive cycle" integrating various forms of
memory and intelligent processing in a single processing loop. The architecture ties in well
with both neuroscience and cognitive psychology, but it deals most thoroughly with "lower
level" aspects of intelligence, handling more advanced aspects like language and reasoning
only somewhat sketchily. There is a clear mapping between LIDA structures and processes
and corresponding structures and processing in OCP; so that it's only a mild stretch to view
CogPrime as an instantiation of the general LIDA approach that extends further both in
the lower level (to enable robot action and sensation via DeSTIN) and the higher level (to
enable advanced language and reasoning via OCP mechanisms that have no direct LIDA
analogues).
• MicroPsi 113ac091 is an integrative architecture based on Dietrich Dorner's Psi model of mo-
tivation, emotion and intelligence. It has been tested on some practical control applications,
and also on simulating artificial agents in a simple virtual world. MicroPsi's comprehen-
siveness and basis in neuroscience and psychology are impressive, but in the current version
of MicroPsi, learning and reasoning are carried out by algorithms that seem unlikely to
scale. OCP incorporates the Psi model for motivation and emotion, so that MicroPsi and
CogPrime may be considered very closely related systems. But similar to LIDA, MicroPsi
currently focuses on the "lower level" aspects of intelligence, not yet directly handling ad-
vanced processes like language and abstract reasoning.
• PolyScheme lea$071 integrates multiple methods of representation, reasoning and infer-
ence schemes for general problem solving. Each Polyscheme "specialist" models a different
aspect of the world using specific representation and inference techniques, interacting with
other specialists and learning from them. Polyscheme has been used to model infant rea-
soning including object identity, events, causality, and spatial relations. The integration of
EFTA00623850
4.4 Hybrid Cognitive Architectures
75
reasoning methods is powerful, but the overall cognitive architecture is simplistic compared
to other systems and seems focused more on problem-solving than on the broader problem
of intelligent agent control.
• Shruti ISA93I is a fascinating biologically-inspired model of human reflexive inference,
which represents in connectionist architecture relations, types, entities and causal rules
using focal-clusters. However, much like Hofstadter's earlier Copycat architecture
lof95],
Shruti seems more interesting a S a prototype exploration of ideas than as a practical AGI
system; at least, after a significant time of development it has not proved significantly
effective in any applications
• James Albus's 4D/RCS robotics architecture shares a great deal with some of the emer-
gentist architectures discussed above, e.g. it has the same hierarchical pattern recognition
structure as DeSTIN and HTM, and the same three cross-connected hierarchies as DeSTIN,
and shares with the developmental robotics architectures a focus on real-time adaptation to
the structure of the world. However, 4D/RCS is not foundationally learning-based but relies
on hard-wired architecture and algorithms, intended to mimic the qualitative structure of
relevant parts of the brain (and intended to be augmented by learning, which differentiates
it front emergentist approaches.
As our own CogPrime approach is a hybrid architecture, it will come as no surprise that
we believe several of the existing hybrid architectures are fundamentally going in the right
direction. However, nearly all the existing hybrid architectures have severe shortcomings which
we feel will prevent them from achieving robust humanlike AGI.
Many of the hybrid architectures are in essence "multiple, disparate algorithms carrying out
separate functions, encapsulated in black boxes and communicating results with each other."
For instance, PolyScheme, ACT-R and CLARION all display this "modularity" property to a
significant extent. These architectures lack the rich, real-time interaction between the intents/
dynamics of various memory and learning processes that we believe is critical to achieving
humanlike general intelligence using realistic computational resources. On the other hand, those
architectures that feature richer integration - such as DUAL, Shruti, LIDA and MicroPsi - have
the flaw of relying (at least in their current versions) on overly simplistic learning algorithms,
which drastically limits their scalability.
It does seem plausible to us that some of these hybrid architectures could be dramatically
extended or modified so as to produce humanlike general intelligence. For instance, one could
replace LIDA's learning algorithms with others that interrelate with each other in a nuanced
synergetic way; or one could replace MicroPsi's simple learning and reasoning methods with
much more powerful and scalable ones acting on the same data structures. However, making
these changes would dramatically alter the cognitive architectures in question on multiple levels.
4.4.1 Neural versus Symbolic; Global versus Local
The "symbolic versus emergentist" dichotomy that we have used to structure our review of cogni-
tive architectures is not absolute nor fully precisely defined; it is more of a heuristic distinction.
In this section, before plunging into the details of particular hybrid cognitive architectures, we
review two other related dichotomies that are useful for understanding hybrid systems: neural
versus symbolic systems, and globalist versus localist knowledge representation.
EFTA00623851
76
4 Brief Survey of Cognitive Architectures
4.4.1.1 Neural-Symbolic Integration
The distinction between neural and symbolic systems has gotten fuzzier and fuzzier in recent
years, with developments such as
• Logic-based systems being used to control embodied agents (hence using logical terms to
deal with data that is apparently perception or actuation-oriented in nature, rather than
being symbolic in the semiotic sense), see ISSO3al and IGMIH081.
• Hybrid systems combining neural net and logical parts, or using logical or neural net com-
ponents interchangeably in the same role ILAonj.
• Neural net systems being used for strongly symbolic tasks such as automated grammar
learning (1E1m011, 1E11119 1 1, plus more recent work.)
Figure 4.7 presents a schematic diagram of a generic neural-symbolic system, generalizing
from [131101, a paper that gives an elegant categorization of neural-symbolic AI systems. Figure
4.8 depicts several broad categories of neural-symbolic architecture.
Interaction
Representation
Interaction
Symbolic
).
Neural
Learning (
(Localist)
(Globalist)
) Learning
System
System
Fig. 4.7: Generic neural-symbolic architecture
Bader and Hitzler categorize neural-symbolic systems according to three orthogonal axes:
interrelation, language and usage. "Language" refers to the type of language used in the symbolic
component, which may be logical, automata-based, formal grammar-based, etc. "Usage" refers
to the purpose to which the neural-symbolic interrelation is put. We tend to use "learning" as
an encompassing term for all forms of ongoing knowledge-creation, whereas Bader and Hitzler
distinguish learning from reasoning.
Of Bader and Hitzler's three axes the one that interests us most here is "interrelation", which
refers to the way the neural and symbolic components of the architecture intersect with each
other. They distinguish "hybrid" architectures which contain separate but equal, interacting
neural and symbolic components; versus "integrative" architectures in which the symbolic com-
ponent essentially rides piggyback on the neural component, extracting information from it and
helping it carry out its learning, but playing a clearly derived and secondary role. We prefer
Sun's (2001) term "monolithic" to Bader and Hitzler's "integrative" to describe this type of
system, as the latter term seems best preserved in its broader meaning.
EFTA00623852
4.4 Hybrid Cognitive Architectures
77
Monolithic:symbolic component "sits on top or neural component and
helps it do abstraction
World
4i-20
Neural
Symbolic
Hybrid:neural and symbolic components confront the world side by side,
interacting
World
Neural
4
Symbolic
Tightly interactive hybrid:neural and ymbolic components interact
frequently, on the same time scale as their internal learning operations
Fig. 4.8: Broad categories of neural-symbolic architecture
Within the scope of hybrid neural-symbolic systems, there is another axis which Bader and
Hitzler do not focus on, because the main interest of their review is in monolithic systems. We
call this axis "interactivity", and what we are referring to is the frequency of high-information-
content, high-influence interaction between the neural and symbolic components in the hybrid
system. In a low-interaction hybrid system, the neural and symbolic components don't exchange
large amounts of mutually influential information all that frequently, and basically act like
independent system components that do their learning/reasoning/thinking periodically sending
each other their conclusions. In some cases, interaction may be asymmetric: one component may
frequently send a lot of influential information to the other, but not vice versa. However, our
hypothesis is that the most capable neural-symbolic systems are going to be the symmetrically
highly interactive ones.
In a symmetric high-interaction hybrid neural-symbolic system, the neural and symbolic
components exchange influential information sufficiently frequently that each one plays a major
role in the other one's learning/reasoning/thinking processes. Thus, the learning processes of
each component mast be considered as part of the overall dynamic of the hybrid system. The
two components aren't just feeding their outputs to each other as inputs, they're mutually
guiding each others' internal processing.
One can make a speculative argument for the relevance of this kind of architecture to neuro-
science. It seems plausible that this kind of neural-symbolic system roughly emulates the kind
of interaction that exists between the brain's neural subsystems implementing localist symbolic
processing, and the brain's neural subsystems implementing globalist, classically "connection-
ist" processing. It seems most likely that, in the brain, symbolic functionality emerges from
an underlying layer of neural dynamics. However, it is also reasonable to conjecture that this
symbolic functionality is confined to a functionally distinct subsystem of the brain, which then
EFTA00623853
78
4 Brief Survey of Cognitive Architectures
interacts with other subsystems in the brain much in the manner that the symbolic and neural
components of a symmetric high-interaction neural-symbolic system interact.
Neuroscience speculations aside, however, our key conjecture regarding neural-symbolic in-
tegration is that this sort of neural-symbolic system presents a promising direction for artificial
general intelligence research. In Chapter 26 of Volume 2 we will give a more concrete idea of
what a symmetric high-interaction hybrid neural-symbolic architecture might look like, explor-
ing the potential for this sort of hybridization between the OpenCogPrime AGI architecture
(which is heavily symbolic in nature) and hierarchical attractor neural net based architectures
such as DeSTIN.
4.5 Globalist versus Localist Representations
Another interesting distinction, related to but different from "symbolic versus emergentist"
and "neural versus symbolic", may be drawn between cognitive systems (or subsystems) where
memory is essentially global, and those where memory Ls essentially local. In this section
we will pursue this distinction in various guises, along with the less familiar notion of glocal
memory.
This globalist/localist distinction is most easily conceptualized by reference to memories
corresponding to categories of entities or events in an external environment. In an Al system
that has an internal notion of "activation" - i.e. in which some of its internal elements are more
active than others, at any given point in time — one can define the internal image of an external
event or entity as the fuzzy set of internal elements that tend to be active when that event or
entity is presented to the system's sensors. If one has a particular set S of external entities or
events of interest, then, the degree of memory localization of such an AI system relative to S
may be conceived as the percentage of the system's internal elements that have a high degree
of membership in the internal image of an average element of S.
Of course, this characterization of localization has its limitations, such as the possibility of
ambiguity regarding what are the "system elements" of a given Al system; and the exclusive
focus on internal images of external phenomena rather than representation of internal abstract
concepts. However, our goal here is not to formulate an ultimate, rigorous and thorough ontology
of memory systems, but only to pose a "rough and ready" categorization so as to properly frame
our discussion of some specific AGI issues relevant to CogPrime. Clearly the ideas pursued here
will benefit from further theoretical exploration and elaboration.
In this sense, a Hopfield neural net lAmi89] would be considered "globalist" since it has a low
degree of memory localization (most internal images heavily involve a large number of system
elements); whereas Cyc would be considered "localist" as it has a very high degree of memory
localization (most internal images are heavily focused on a small set of system elements).
However, although Hopfield nets and Cyc form handy examples, the "globalist vs. localist"
distinction as described above is not identical to the "neural vs. symbolic" distinction. For it is
in principle quite possible to create localist systems using formal neurons, and also to create
globalist systems using formal logic. And "globalist-localist" is not quite identical to "symbolic vs
emergentist" either, because the latter is about coordinated system dynamics and behavior not
just about knowledge representation. CogPrime combines both symbolic and (loosely) neural
representations, and also combines globalist and localist representations in a way that we will
call "glocal" and analyze more deeply in Chapter 13; but there are many other ways these various
EFTA00623854
4.5 Globslist versus Localist Representations
79
properties could be manifested by Al systems. Rigorously studying the corpus of existing (or
hypothetical!) cognitive architectures using these ideas would be a large task, which we do not
undertake here.
In the next sections we review several hybrid architectures in more detail, focusing most
deeply on LIDA and MicroPsi which have been directly inspirational for CogPrime.
4.5.1 CLARION
Ron Sun's CLARION architecture (see Figure 4.9) is interesting in its combination of symbolic
and neural aspects - a combination that is used in a sophisticated way to embody the distinction
and interaction between implicit and explicit mental processes. From a CLARION perspective,
architectures like Soar and ACT-R are severely limited in that they deal only with explicit
knowledge and associated learning processes.
CLARION consists of a number of distinct subsystems, each of which contains a dual rep-
resentational structure, including a "rulm and chunks" symbolic knowledge store somewhat
similar to ACT-R, and a neural net knowledge store embodying implicit knowledge. The main
subsystems are:
• An action-centered subsystem to control actions;
• A non-action-centered subsystem to maintain general knowledge;
• A motivational subsystem to provide underlying motivations for perception, action, and
cognition:
• A meta-cognitive subsystem to monitor, direct, and modify the operations of all the other
subsystems.
Tap Level
anemia-Metal
nonixcitienicciaefed
ex li a aptexentatice
explicit itreientatioe
<
7
I
A
. C------
A
I
Y
I
.L-----
----4. l
T
actemicemeredienplicit
representicalon
a ni xi:non-unwed
implicit iepreientatton
i•te--•—,-
I
I
I
I
I
I
I
I
I
I
I
1
Beam Level
Fig. 4.9: The CLARION cognitive architecture.
EFTA00623855
80
4 Brief Survey of Cognitive Architectures
4-5.2 The Society of Mind and the Emotion Machine
In his influential but controversial book The Society of Mind jMin88J, Marvin Minsky described
a model of human intelligence as something that is built up from the interactions of numerous
simple agents. He spells out in great detail how various particular cognitive functions may be
achieved via agents and their interactions. He leaves no room for any central algorithms or
structures of thought, famously arguing: "What magical trick makes us intelligent? The trick
is that there is no trick. The power of intelligence stems from our vast diversity, not from any
single, perfect principle."
This perspective was extended in the more recent work The Emotion Machine IMM071, where
Minsky argued that emotions are "ways to think" evolved to handle different "problem types"
that exist in the world. The brain is posited to have rule-based mechanisms (selectors) that
turns on emotions to deal with various problems.
Overall, both of these works serve better as works of speculative cognitive science than as
works of AI or cognitive architecture per se. As neurologist Richard Restak said in his review
of Emotion Machine, "Minsky does a marvelous job parsing other complicated mental activities
into simpler elements. ... But he is less effective in relating these emotional functions to what's
going on in the brain." As Restak added, he is also not so effective at relating these emotional
functions to straightforwardly implementable algorithms or data structures.
Push Singh, in his PhD thesis and followup work ISI3C051, did the best job so far of creating
a concrete AI design based on Minsky's ideas. While Singh's system was certainly interesting,
it was also noteworthy for its lack of any learning mechanisms, and its exclusive focus on
explicit rather than implicit knowledge. Due to Singh's tragic death, his work was never brought
anywhere near completion. It seems fair to say that there has not yet been a serious cognitive
architecture posed based closely on Minsky's ideas.
4.5.5 DUAL
The closest thing to a Minsky-ish cognitive architecture is probably DUAL, which takes the
Society of Mind concept and adds to it a number of other interesting ideas. DUAL integrates
symbolic and connectionist approaches at a deeper level than CLARION, and has been used
to model various cognitive functions such as perception, analogy and judgment. Computations
in DUAL emerge from the self-organized interaction of many micro-agents, each of which is
a hybrid symbolic/connectionist device. Each DUAL agent plays the role of a neural network
node, with an activation level and activation spreading dynamics; but also plays the role of
a symbol, manipulated using formal rules. The agents exchange messages and activation via
links that can be learned and modified, and they form coalitions which collectively represent
concepts, episodes, and facts.
The structure of the model is sketchily depicted in Figure 4.10, which covers the application
of DUAL to a toy environment called TextWorld. The visual input corresponding to a stim-
ulus is presented on a two-dimensional visual array representing the front end of the system.
Perceptual primitives like blobs and terminations are immediately generated by cheap parallel
computations. Attention is controlled at each time by an object which allocates it selectively
to some area of the stimulus. A detailed symbolic representation is constructed for this area
which tends to fade away as attention is withdrawn from it and allocated to another one. Cate-
EFTA00623856
4.5 Globalist versus Localist Representations
81
gorization of visual memory contents takes place by retrieving object and scene categories from
DUAL's semantic memory and mapping them onto current visual memory representations.
RVA
B
VWM
DUAL Semantic Memory
Fig. 4.10: The three main components of the DUAL model: the retinotopic visual array (RVA),
the visual working memory (VWM) and DUAL's semantic memory. Attention is allocated to
an area of the visual array by the object in VWM controlling attention, while scene and object
categories corresponding to the contents of VWM are retrieved from the semantic memory.
In principle the DUAL framework seems quite powerful; using the language of CogPrime,
however, it seems to us that the learning mechanisms of DUAL have not been formulated in
such a way as to give rise to powerful, scalable cognitive synergy. It would likely be possible
to create very powerful AGI systems within DUAL, and perhaps some very CogPrime -like
systems as well. But the systems that have been created or designed for use within DUAL so
far seem not to be that powerful in their potential or scope.
4.5.4 4D/RCS
In a rather different direction, James Albus, while at the National Bureau of Standards, de-
veloped a very thorough and impressive architecture for intelligent robotics called 4D/RCS,
which was implemented in a number of machines including unmanned automated vehicles. This
architecture lacks critical aspects of intelligence such as learning and creativity, but combines
perception, action, planning and world-modeling in a highly effective and tightly-integrated
fashion.
The architecture has three hierarchies of memory/processing units: one for perception, one
for action and one for modeling and guidance. Each unit has a certain spatiotemporal scope,
EFTA00623857
82
4 Brief Survey of Cognitive Architectures
and (except for the lowest level) supervenes over children whose spatiotemporal scope is a sub-
set of its own. The action hierarchy takes care of decomposing tasks into subtasks; whereas the
sensation hierarchy takes care of grouping signals into entities and events. The modeling/guid-
ance hierarchy mediates interactions between perception and action based on its understanding
of the world and the system's goals.
In his book [AAIOIJ Albers describes methods for extending 4D/RCS into a complete cognitive
architecture, but these extensions have not been elaborated in full detail nor implemented.
SOO harp
SO harp
Slams
10) asap
rhos ix oat 24 loon
DO, deal. daub
=SOGAT. SATTALIOW PSI for me 2 ban
liC000Alt MOWN
TM, lehline 10 Mal *en
SIMIOGATZ frail
Eur ma 10 so
TAW
nimbi. Arm
limb aro It sad'
Talk sob don **eh of Minh=
ILSTA
=atm
valT5ILM
5 woad gas
04 ' fur
darecrunixt-
obasele.Oet pit
Ss
OS
phes
Same. wad
sop
t
i
t
es, ucessii.;
_
SIMMS Ake AC11, A I the,
Fig. 4.11: Albus's 4D-RCS architecture for a single vehicle
4.5.5 PolyScheme
5
B
Nick Cassimatis's PolyScheme architecture ras07] shares with GLAIR the use of multiple
logical reasoning methods on a common knowledge store. While its underlying ideas are quite
general, currently PolyScheme is being developed in the context of the "object tracking" domain
(construed very, broadly). As a logic framework PolyScheme is fairly conventional (unlike GLAIR
or NARS with their novel underlying formalisms), but PolyScheme has some unique conceptual
aspects, for instance its connection with Cassimatis's theory of mind, which holds that the same
core set of logical concepts and relationships underlies both language and physical reasoning
I]. This ties in with the use of a common knowledge store for multiple cognitive processes;
for instance it suggests that
• the same core relationships can be used for physical reasoning and parsing, but that each
of these domains may involve some additional relationships.
• language processing may be done via physical-reasoning-based cognitive processes, plus the
additional activity of some language-specific processes
EFTA00623858
WORLD MODELING
VALUE JUDGMENT
FRAMES
Rea Rallo
Parbas
ENNA
4.5 Globalist versus Localist Representations
83
SENSORY
PROCESSING
Mks
Almilimiat
pANIAPIERNEN
EINE
eeppla •••••••
001•100 5
RES
ONO
-AGES
usaie
A/REAss
SYS
NN
MAPS
tames Ft.."...
&eta
Ws
oxen*
INSINnalana
MN
NANDOED Now RJR
Cantle
ads
WS
Puna
ein e
towili
Nis
EARS
Cow RN
fM
'500 m MOP
fril•CMON
Se
nay
Setts Tmi
CORMS
SEEM,.
WW PLANNER
ID PIA late
IIRICVICR
NSA IEOM
Tr
Sad
PLANO
I NE ban
MMa
!IRO
IDECUTOR
Tisk
VA.
OROS
NEI
SERVO
RANIER
50—hoia
VICEID
Fig. 4.12: Albus's perceptual, motor and modeling hierarchies
4.5.6 Joshua Blue
Sam Adams and his colleagues at IBM have created a cognitive architecture called Joshua Blue
IAABL02], which has some significant similarities to CogPrime. Similar to our current research
direction with CogPrime, Joshua Blue was created with loose emulation of child cognitive
development in mind; and, also similar to CogPrime, it features a number of cognitive processes
acting on a common neural-symbolic knowledge store. The specific cognitive processes involved
in Joshua Blue and CogPrime are not particularly similar, however. At time of writing (2012)
EFTA00623859
84
4 Brief Survey of Cognitive Architectures
Joshua Blue is not under active development and has not been for some time; however, the
project may be reanimated in future.
Joshua Blue's core knowledge representation is a semantic network of nodes connected by
links along which activation spreads. Although many of the nodes have specific semantic refer-
ents, as in a classical semantic net, the spread of activation through the network is designed to
lead to the emergence of "assemblies" (which could also be thought of as dynamical attractors)
in a manner more similar to an attractor neural network.
A major difference from typical semantic or neural network models is the central role that
affect plays in the system's dynamics. The weights of the links in the knowledge base are adjusted
dynamically based on the emotional context - a very direct way of ensuring that cognitive
processes and mental representations are continuously influenced by affect. Qualitatively, this
mimics the way that particular emotions in the human brain correlate with the dissemination
throughout the brain of particular neurotransmitters, which then affect synaptic activity.
A result of this architecture is that in Joshua Blue, emotion directs attention in a very direct
way: affective weighting is important in determining which associated objects will become part of
the focus of attention, or will be retained from memory. A notable similarity between CogPrime
and Joshua Blue is that in both systems, nodes are assigned two quantitative attention values,
one governing allocation of current system resources (mainly processor time; this is CogPrime's
ShortTermImportance) and one governing the long-term allocation of memory (CogPrime's
LongTermlmportance).
The concrete work done with Joshua Blue involved using it to control a simple agent in a sim-
ulated world, with the goal that via human interaction, the agent would develop a complex and
humanlike emotional and motivational structure from its simple in-built emotions and drives,
and would then develop complex cognitive capabilities as part of this development process.
4.5.7 LIDA
The LIDA architecture developed by Stan Franklin and his colleagues [13F09] is based on the
concept of the "cognitive cycle" - a notion that is important to nearly every BICA (Biologically
Inspired Cognitive Architectures) and also to the brain, but that plays a particularly central
role in LIDA. As Franklin says, "as a matter of principle, every autonomous agent, be it human,
animal, or artificial, must frequently sample (sense) its environment, process (make sense of)
this input, and select an appropriate response (action). The agent's "life" can be viewed as
consisting of a continual sequence of iterations of these cognitive cycles. Such cycles constitute
the indivisible elements of attention, the least sensing and acting to which we can attend. A
cognitive cycle can be thought of as a moment of cognition, a cognitive "moment"."
4.5.8 The Global Workspace
LIDA is heavily based on the "global workspace" concept developed by Bernard Baars. As this
concept is also directly relevant to CogPrime it is worth briefly describing here.
In essence Baars' Global Workspace Theory (GWT) is a particular hypothesis about how
working memory works and the role it plays in the mind. Baars conceives working memory as the
EFTA00623860
4.5 Globslist versus Localist Representations
85
"inner domain in which we can rehearse telephone numbers to ourselves or, more interestingly,
in which we carry on the narrative of our lives. It is usually thought to include inner speech
and visual imagery." Baars uses the term "consciousness" to refer to the contents of working
memory - a theoretical commitment that is not part of the CogPrime design. In this section
we will use the term "consciousness" in Baars' way, but not throughout the rest of the book.
Baars conceives working memory and consciousness in terms of a "theater metaphor" - ac-
cording to which, in the "theater of consciousness" a "spotlight of selective attention" shines
a bright spot on stage. The bright spot reveals the global workspace - the contents of con-
sciousness. which may be metaphorically considered as a group of actors moving in and out of
consciousness, making speeches or interacting with each other. The unconscious is represented
by the audience watching the play ... and there is also a role for the director (the mind's ex-
ecutive processes) behind the scenes, along with a variety of helpers like stage hands, script
writers, scene designers, etc.
GWT describes a fleeting memory, with a duration of a few seconds. This is much shorter
than the 10-30 seconds of classical working memory - according to GWT there is a very brief
"cognitive cycle" in which the global workspace is refreshed, and the time period an item remains
in working memory generally spans a large number of these elementary "refresh" actions. GWT
contents are proposed to correspond to what we are conscious of, and are said to be broadcast
to a multitude of unconscious cognitive brain processes. Unconscious processes, operating in
parallel, can form coalitions which can act as input processes to the global workspace. Each
unconscious process is viewed as relating to certain goals, and seeking to get involved with
coalitions that will get enough importance to become part of the global workspace - because
once they're in the global workspace they'll be allowed to broadcast out across the mind as a
whole, which include broadcasting to the internal and external actuators that allow the mind
to do things. Getting into the global workspace is a process's best shot at achieving its goals.
Obviously, the theater metaphor used to describe the GWT is evocative but limited; for
instance, the unconscious in the mind does a lot more than the audience in a theater. The
unconscious conies up with complex creative ideas sometimes, which feed into consciousness -
almost as if the audience is also the scriptwriter. Baars' theory, with its understanding of uncon-
scious dynamics in terms of coalition-building, fails to describe the subtle dynamics occurring
within the various forms of long-term memory, which result in subtle nonlinear interactions
between long term memory and working memory. But nevertheless, GWT successfully models
a number of characteristics of consciousness, including its role in handling novel situations, its
limited capacity, its sequential nature, and its ability to trigger a vast range of unconscious
brain processes. It is the framework on which LIDA's theory of the cognitive cycle is built.
4.5.9 The LIDA Cognitive Cycle
The simplest cognitive cycle is that of an animal, which senses the world, compares sensation to
memory, and chooses an action, all in one fluid subjective moment. But the same cognitive cycle
structure/process applies to higher-level cognitive processes as well. The LIDA architecture is
based on the LIDA model of the cognitive cycle, which posits a particular structure underlying
the cognitive cycle that possess the generality to encompass both simple and complex cognitive
moments.
EFTA00623861
86
4 Brief Survey of Cognitive Architectures
The LIDA cognitive cycle itself is a theoretical construct that can be implemented in many
ways, and indeed other BICAs like CogPrime and Psi also manifest the LIDA cognitive cycle
in their dynamics, though utilizing different particular structures to do so.
Figure 4.13 shows the cycle pictorially, starting in the upper left corner and proceeding
clockwise. At the start of a cycle, the LIDA agent perceives its current situation and allocates
attention differentially to various parts of it. It then broadcasts information about the most
important parts (which constitute the agent's consciousness), and this information gets features
extracted from it, when then get passed along to episodic and semantic memory, that interact
in the "global workspace" to create a model for the agent's current situation. This model then,
in interaction with procedural memory, enables the agent to choose an appropriate action and
execute it - the critical "action-selection" phase!
Fig. 4.13: The LIDA Cognitive Cycle
The LIDA Cognitive Cycle in More Depth
2
We now nm through the cognitive cycle in more detail. It begins with sensory stimuli from
the agent's external internal environment. Low-level feature detectors in sensory memory begin
the process of making sense of the incoming stimuli. These low-level features are passed to
perceptual memory where higher-level features, objects, categories, relations, actions, situations,
2 This section paraphrases heavily from IFta061
EFTA00623862
4.5 Globslist versus Localist Representations
87
etc. are recognized. These recognized entities, called percepts, are passed to the workspace,
where a model of the agent's current situation is assembled.
Workspace structures serve as cues to the two forms of episodic memory, yielding both short
and long term remembered local associations. In addition to the current percept, the workspace
contains recent percepts that haven't yet decayed away, and the agent's model of the then-
current situation previously assembled from them. The model of the agent's current situation is
updated from the previous model using the remaining percepts and associations. This updating
process will typically require looking back to perceptual memory and even to sensory memory,
to enable the understanding of relations and situations. This assembled new model constitutes
the agent's understanding of its current situation within its world. Via constructing the model,
the agent has made sense of the incoming stimuli.
Now attention allocation comes into play, because a real agent lacks the computational re-
sources to work with all parts of its world-model with maximal mental focus. Portions of the
model compete for attention. These competing portions take the form of (potentially overlap-
ping) coalitions of structures comprising parts the model. Once one such coalition wins the
competition, the agent has decided what to focus its attention on.
And now comes the purpose of all this processing: to help the agent to decide what to do
next. The winning coalition passes to the global workspace, the namesake of Global Workspace
Theory, from which it is broadcast globally. Though the contents of this conscious broadcast
are available globally, the primary recipient is procedural memory, which stores templates of
possible actions including their context and possible results.
Procedural memory also stores an activation value for each such template - a value that
attempts to measure the likelihood of an action taken within its context producing the ex-
pected result. It's worth noting that LIDA makes a rather specific assumption here. LIDA's
"activation" values are like the probabilistic truth values of the implications in CogPrime's
Context A Procedure -> Good triples. However, in CogPrime this probability is not the same as
the ShortTermlmportance "attention value" associated with the Implication link representing
that implication. Here LIDA merges together two concepts that in CogPrime are separate.
Templates whose contexts intersect sufficiently with the contents of the conscious broadcast
instantiate copies of themselves with their variables specified to the current situation. These
instantiations are passed to the action selection mechanism, which chooses a single action from
these instantiations and those remaining from previous cycles. The chosen action then goes to
sensorimotor memory, where it picks up the appropriate algorithm by which it is then executed.
The action so taken affects the environment, and the cycle is complete.
The LIDA model hypothesizes that all human cognitive processing is via a continuing iter-
ation of such cognitive cycles. It acknowledges that other cognitive processes may also occur,
refining and building on the knowledge used in the cognitive cycle (for instance, the cognitive
cycle itself doesn't mention abstract reasoning or creativity). But the idea is that these other
processes occur in the context of the cognitive cycle, which is the main loop driving the internal
and external activities of the organism.
4.5.9.1 Avoiding Combinatorial Explosion via Adaptive Attention Allocation
LIDA avoids combinatorial explosions in its inference processes via two methods, both of which
are also important in CogPrime :
• combining reasoning via association with reasoning via deduction
EFTA00623863
88
4 Brief Survey of Cognitive Architectures
• foundational use of uncertainty in reasoning
One can create an analogy between LIDA's workspace structures and codelets and a logic-
based architecture's assertions and functions. However, LIDA's codelets only operate on the
structures that are active in the workspace during any given cycle. This includes recent percep-
tions, their closest matches in other types of memory, and structures recently created by other
codelets. The results with the highest estimate of success, i.e. activation, will then be selected.
Uncertainty plays a role in LIDA's reasoning in several ways, most notably through the base
activation of its behavior codelets, which depend on the model's estimated probability of the
codelet's success if triggered. LIDA observes the results of its behaviors and updates the base
activation of the responsible codelets dynamically.
We note that for this kind of uncertain inference/activation interplay to scale well, some
level of cognitive synergy must be present; and based on our understanding of LIDA it is not
clear to us whether the particular inference and association algorithms used in LIDA possess
the requisite synergy.
4.5.9.2 LIDA versus CogPrime
The LIDA cognitive cycle, broadly construed, exists in CogPrime as in other cognitive archi-
tectures. To see how, it suffices to map the key LIDA structures into corresponding CogPrime
structures, as is done in Table 4.1. Of course this table does not cover all CogPrime processes,
as LIDA does not constitute a thorough explanation of CogPrime structure and dynamics. And
in most cases the corresponding CogPrime and LIDA processes don't work in exactly the same
way; for instance, as noted above, LIDA's action selection relies solely on LIDA's "activation"
values, whereas CogPrime's action selection process is more complex, relying on aspects of
CogPrime that lack LIDA analogues.
4.5.10 Psi and MicroPsi
We have saved for last the architecture that has the most in common with CogPrime : .Icecha
Bach's MicroPsi architecture, closely based on Dietrich Dorner's Psi theory. CogPrime has
borrowed substantially from Psi in its handling of emotion and motivation; but Psi also has
other aspects that differ considerably from CogPrime. Here we will focus more heavily on the
points of overlap, but will mention the key points of difference as well.
The overall Psi cognitive architecture, which is centered on the Psi model of the motivational
system, is roughly depicted in Figure 4.14.
Psi's motivational system begins with Demands, which are the basic factors that motivate
the agent. For an animal these would include things like food, water, sex, novelty, socialization,
protection of one's children, and so forth. For an intelligent robot they might include things
like electrical power, novelty, certainty, socialization, well-being of others and mental growth.
Psi also specifies two fairly abstract demands and posits them as psychologically fundamental
(see Figure 415):
• competence, the effectiveness of the agent at fulfilling its Urges
• certainty, the confidence of the agent's knowledge
EFTA00623864
4.5 Globalist versus Localist Representations
89
LIDA
Declarative memory
Atomspace
attentional codelets
Schema that adjust importance of Atoms explicitly
coalitions
maps
global workspace
attentional focus
behavior codelets
schema
procedural memory (scheme net) procedures in ProcedureRepository; and network of
Schemallodes in the Atomspace
action selection (behavior net)
propagation of STICurrency front goals to actions, and
action selection process
transient episodic memory•
perceptual atoms entering AT with high STI., which
rapidly decreases in meet cases
local workspaces
bubbles of interlinked Atoms with moderate impor-
tance, focused on by a subset of MindAgents (defined
in Chapter 19 of Part 2) for a period of time
perceptual associative memory
HebbianLinks in the AT
sensory memory
spaceserver/timeserver, plus auxiliary• stores for other
senses
sensorimotor memory
Atoms storing record of actions taken, linked in with
Atoms indexed in sensory memory
CogPrime
Table 4.1 CogPrime Analogues of Key LIDA Features
Each demand is assumed to come with a certain "target level" or "target range" (and these
may fluctuate over time, or may change as a system matures and develops). An Urge is said to
develop when a demand deviates from its target range: the urge then seeks to return the demand
to its target range. For instance, in an animal-like agent the demand related to food is more
clearly described as "fullness," and there is a target range indicating that the agent is neither too
hungry nor too full of food. If the agent's fullness deviates from this range, an Urge to return
the demand to its target range arises. Similarly, if an agent's novelty deviates from its target
range, this means the agent's life has gotten either too boring or too disconcertingly weird, and
the agent gets an Urge for either more interesting activities (in the case of below-range novelty)
or more familiar ones (in the case of above-range novelty).
There is also a primitive notion of Pleasure (and its opposite, displeasure), which is consid-
ered as different from the complex emotion of "happiness." Pleasure is understood as associated
with Urges: pleasure occurs when an Urge is (at least partially) satisfied, whereas displeasure
occurs when an urge gets increasingly severe. The degree to which an Urge is satisfied is not
necessarily defined instantaneously; it may be defined, for instance, as a time-decaying weighted
average of the proximity of the demand to its target range over the recent past.
So, for instance if an agent is bored and gets a lot of novel stimulation, then it experiences
some pleasure. If it's bored and then the monotony of its stimulation gets even more extreme,
then it experiences some displeasure.
Note that, according to this relatively simplistic approach, any decrease in the amount of
dissatisfaction causes some pleasure; whereas if everything always continues within its accept-
able range, there isn't any pleasure. This may seem a little counterintuitive, but it's important
to understand that these simple definitions of "pleasure" and "displeasure" are not intended to
fully capture the natural language concepts assnriated with those words. The natural language
terms are used here simply as heuristics to convey the general character of the processes in-
EFTA00623865
90
4 Brief Survey of Cognitive Architectures
Protocol and Situation Memory
Perception
Ii
Modulators
Action selection
Planning
Currently
active
motive
Motive selection
Urges (Drives)
Adion
execution
Fig. 4.14: High-Level Architecture of the Psi Model
volved. These are very low level processes whose analogues in human experience are largely
below the conscious level.
A Goal is considered as a statement that the system may strive to make true at some future
time. A Motive is an (urge, goal) pair, consisting of a goal whose satisfaction is predicted to
imply the satisfaction of some urge. In fact one may consider Urges as top-level goals, and the
agent's other goals as their subgoals.
In Psi an agent has one "ruling motive" at any point in time, but this seems an oversimpli-
fication more applicable to simple animals than to human-like or other advanced Al systems.
In general one may think of different motives having different weights indicating the amount of
resources that will be spent on pursuing them.
Emotions in Psi are considered as complex systemic response-patterns rather than explicitly
constructed entities. An emotion is the set of mental entities activated in response to a certain
set of urges. Dorner conceived theories about how various common emotions emerge from the
dynamics of urges and motives as described in the Psi model. "Intentions" are also considered as
composite entities: an intention at a given point in time consists of the active motives, together
with their related goals. behavior programs and so forth.
EFTA00623866
4.5 Globalist versus Locolist Representations
91
The basic logic of action in Psi is carried out by "triples" that are very similar to CogPrime's
Context A Procedure -> Goal triples. However, an important role is played by four modulators
that control how the processes of perception, cognition and action selection are regulated at a
given time:
• activation, which determines the degree to which the agent is focused on rapid, intensive
activity versus reflective, cognitive activity
• resolution level, which determines how accurately the system tries to perceive the world
• certainty, which determines how hard the system tries to achieve definite, certain knowledge
• selection threshold, which determines how willing the system is to change its choice of which
goals to focus on
These modulators characterize the system's emotional and cognitive state at a very abstract
level; they axe not emotions per se, but they have a large effect on the agent's emotions. Their
intended interaction is depicted in Figure 4.15.
Eacency
annals
Satan
Exiaccalcs
Inefficercy
Sgrels
Securing
aoluvior
Acciuston ci
On:isnot Struts
Cortanty
Uncortstnty
Signals
&gnats
(COnfkrrellen
IDSCOcifinnatbn
of Expectabons)
of Expoctabon)
Monadic eraattkin %whom/
We Bsb *. lhadaillan
tea caw
lAciavalors
Fig. 4.15: Primary Interrelationships Between Psi Modulators
4.5.11 The Emergence of Emotion in the Psi Model
We now briefly review the specifics of how Psi models the emergence of emotion. The basic idea is
to define a small set of proto-emotional dimensions in terms of basic Urges and modulators.
Then, emotions are identified with regions in the space spanned by these dimensions.
The simplest approach uses a six-dimensional continuous space:
1. pleasure
EFTA00623867
92
4 Brief Survey of Cognitive Architectures
2. arousal
3. resolution level
4. selection threshold (i.e. degree of dominance of the leading motive)
5. level of background checks (the rate of the securing behavior)
6. level of goal-directed behavior
Figure 4.16 shows how the latter 5 of these dimensions are derived from underlying urges and
modulators. Note that these dimensions are not orthogonal; for instance resolution is mainly in-
versely related to arousal. Additional dimensions are also discussed, for instance it is postulated
that to deal with social emotions one may wish to introduce two more demands corresponding
to inner and outer obedience to social norms, and then define dimensions in terms of these.
Importance:
leaning Melva)
al Motives
Fig. 4.16: Five Proto-Emotional Dimensions Implicit in the Psi Model
Specific emotions are then characterized in terms of these dimensions. According to [Baal%
for instance, "Anger ... is characterized by high arousal, low resolution, strong motive dominance,
few background checks and strong goal-orientedness; sadness by low arousal, high resolution,
strong dominance, few background-checks and low goal-orientedness."
I'm a bit skeptical of the contention that these dimensions fully characterize the relevant
emotions. Anger for instance seems to have some particular characteristics not implied by the
above list of dimensional values. The list of dimensional values associated with anger doesn't
tell us that an angry person is more likely to punch someone than to bounce up and down,
for example. However, it does seem that the dimensional values associated with an emotion are
EFTA00623868
4.5 Globslist versus Localist Representations
93
informative about the emotion, so that positioning an emotion on the given dimensions tells
one a lot.
4.5.12 Knowledge Representation, Action Selection and Planning in
Psi
In addition to the basic motivation/emotion architecture of Psi, which has been adopted (with
some minor changes) for use in CogPrime, Psi has a number of other aspects that are somewhat
different from their CogPrime analogues.
First of all, on the micro level, Psi represents knowledge using structures called "quads." Each
quad is a cluster of 5 neurons containing a core neuron, and four other neurons representing
before/after and part-of/has-part relationships in regard to that core neuron. Quads are natu-
rally assembled into spatiotemporal hierarchies, though they are not required to form part of
such a structure.
Psi stores knowledge using quads arranged in three networks, which are conceptually similar
to the networks in Albus's 4D/RCS and Arel's DeSTIN architectures:
• A sensory network, which stores declarative knowledge: schemas representing images, ob-
jects, events and situations as hierarchical structures.
• A motor network, which contains procedural knowledge by way of hierarchical behavior
programs
• A motivational network handling demands
Perception in Psi, which is centered in the sensory, network, follows principles similar to
DeSTIN (which are shared also by other systems), for instance the principle of perception as
prediction. Psi's "HyPercept" mechanism performs hypothesis-basal perception: it attempts to
predict what is there to be perceived and then attempts to verify these predictions using sen-
sation and memory. Furthermore HyPercept is intimately coupled with actions in the external
world, according to the concept of "Neisser's perceptual cycle," the cycle between exploration
and representation of reality. Perceptually acquired information is translated into schemas ca-
pable of guiding behaviors, and these are enacted (sometimes affecting the world in significant
ways) and in the process used to guide further perception. Imaginary perceptions are handled
via a "mental stage" analogous to CogPrime's internal simulation world.
Action selection in Psi works based on what are called "triplets," each of which consists of
• a sensor schema (pre-conditions, "condition schema"; like CogPrime's "context")
• a subsequent motor schema (action, effector; like CogPrime's "procedure")
• a final sensor schema (post-conditions, expectations; like an CogPrime predicate or goal)
What distinguishes these triplets from classic production rules as used in (say) Soar and
ACT-R is that the triplets may be partial (some of the three elements may be missing) and
may be uncertain. However, there seems no fundamental difference between these triplets and
CogPrime's concept/procedure/goal triplets, at a high level; the difference lies in the underlying
knowledge representation used for the schemata, and the probabilistic logic used to represent
the implication.
The work of figuring out what schema to execute to achieve the chosen goal in the current
context is done in Psi using a combination of processes called the "Rasmussen ladder" (named
EFTA00623869
94
4 Brief Survey of Cognitive Architectures
after Danish psychologist Jens Rasmussen). The Rasmussen ladder describes the organization
of action as a movement between the stages of skill-based behavior, rule-based behavior and
knowledge-based behavior, as follows:
• If a given task amounts to a trained routine, an automatism or skill is activated; it can
usually be executed without conscious attention and deliberative control.
• If there is no automatism available, a course of action might be derived from rules; before a
known set of strategies can be applied, the situation has to be analyzed and the strategies
have to be adapted.
• In those cases where the known strategics are not applicable, a way of combining the
available manipulations (operators) into reaching a given goal has to be explored at first.
This stage usually requires a recomposition of behaviors, that is, a planning process.
The planning algorithm used in the Psi and MicroPsi implementations is a fairly simple
hill-climbing planner. While it's hypothesized that a more complex planner may be needed for
advanced intelligence, part of the Psi theory is the hypothesis that most real-life planning an
organism needs to do is fairly simple, once the organism has the right perceptual representations
and goals.
4.5.13 Psi versus CogPrime
On a high level, the similarities between Psi and CogPrime are quite strong:
• interlinkecl declarative, procedural and intentional knowledge structures, represented using
neural-symbolic methods (though, the knowledge structures have somewhat different high-
level structures and low-level representational mechanisms in the two systems)
• perception via prediction and perception/action integration
• action selection via triplets that resemble uncertain, potentially partial production rules
• similar motivation/emotion framework, since CogPrime incorporates a variant of Psi for
this
On the nitty-gritty level there are many differences between the systems, but on the big-
picture level the main difference lies in the way the cognitive synergy principle is pursued in
the two different approaches. Psi and MicroPsi rely on very simple learning algorithms that are
closely tied to the "quad" neurosymbolic knowledge representation, and hence interoperate in
a fairly natural way without need for subtle methods of "synergy engineering." CogPrime uses
much more diverse and sophisticated learning algorithms which thus require more sophisticated
methods of interoperation in order to achieve cognitive synergy.
EFTA00623870
Chapter 5
A Generic Architecture of Human-Like Cognition
5.1 Introduction
When writing the first draft of this book, some years ago, we had the idea to explain CogPrime
by aligning its various structures and processes with the ones in the "standard architecture
diagram" of the human mind. After a bit of investigation, though, we gradually came to the
realization that no such thing existed. There was no standard flowchart or other sort of di-
agram explaining the modern consensus on how human thought works. Many such diagrams
existed, but each one seemed to represent some particular focus or theory, rather than an overall
integrative understanding.
Since there are multiple opinions regarding nearly every aspect of human intelligence, it
would be difficult to get two cognitive scientists to fully agree on every aspect of an overall
human cognitive architecture diagram. Prior attempts to outline detailed mind architectures
have tended to follow highly specific theories of intelligence, and hence have attracted only
moderate interest from researchers not adhering to these theories. An example is Minsky's work
presented in The Emotion Machine IM MOM which arguably does constitute an architecture
diagram for the human mind, but which is only loosely grounded in current empirical knowledge
and stands more as a representation of Minsky's own intuitive understanding.
But nevertheless, it scented to us that a reasonable attempt at an integrative, relatively
theory-neutral "human cognitive architecture diagram" would be better than nothing. So nat-
urally, we took it on ourselves to create such a diagram. This chapter is the result - it draws on
the thinking of a number of cognitive science and AGI researchers, integrating their perspectives
in a coherent, overall architecture diagram for human, and human-like, general intelligence. The
specific architecture diagram of CogPrime, given in Chapter 6 below, may then be understood
as a particular instantiation of this generic architecture diagram of human-like cognition.
There is no getting around the fact that, to a certain extent, the diagram presented here
reflects our particular understanding of how the mind works. However, it was intentionally
constructed with the goal of not being just an abstracted version of the CogPrime architecture
diagram! It does not reflect our own idiosyncratic understanding of human intelligence, as much
as a combination of understandings previously presented by multiple researchers (including
ourselves), arranged according to our own taste in a manner we find conceptually coherent.
With this in mind, we call it the "Integrative Human-Like Cognitive Architecture Diagram," or
for short "the integrative diagram." We have made an effort to ensure that as many pieces of
the integrative diagram as possible are well grounded in psychological and even neuroscientific
95
EFTA00623871
96
5 A Ceneric Architecture of Human-Like Cognition
data, rather than mainly embodying speculative notions; however, given the current state of
knowledge, this could not be done to a complete extent, and there is still some speculation
involved here and there.
While based on understandings of human intelligence, the integrative diagram is intended to
serve as an architectural outline for human-like general intelligence more broadly. For example,
CogPrime is explicitly not intended as a precise emulation of human intelligence, and does many
things quite differently than the human mind, yet can still fairly straightforwardly be mapped
into the integrative diagram.
The integrative diagram focuses on structure, but this should not be taken to represent a
valuation of structure over dynamics in our approach to intelligence. Following chapters treat
various dynamical phenomena in depth.
5.2 Key Ingredients of the Integrative Human-Like Cognitive
Architecture Diagram
The main ingredients we've used in assembling the integrative diagram are as follows:
• Our own views on the various types of memory critical for human-like cognition, and the
need for tight, "synergetic" interactions between the cognitive processes focused on these
• Aaron Sloman's high-level architecture diagram of human intelligence ISIO0 II, drawn from
his CogAff architecture, which strikes me as a particularly clear embodiment of "modern
common sense" regarding the overall architecture of the human mind. We have added only
a couple items to Sloman's high-level diagram, which we felt deserved an explicit high-level
role that he did not give them: emotion, language and reinforcement.
• The LIDA architecture diagram presented by Stan Franklin and Bernard Ikuirs [13F09J.
We think LIDA is an excellent model of working memory and what Sloman calls "reactive
processes", with well-researched grounding in the psychology and neuroscience literature.
We have adapted the LIDA diagram only very slightly for use here, changing some of
the terminology on the arrows, and indicating where parts of the LIDA diagram indicate
processes elaborated in more detail elsewhere in the integrative diagram.
• The architecture diagram of the Psi model of motivated cognition, presented by Jcscha
Bach in [Bac091 based on prior work by Dietrich Dorner [Diir02]. This diagram is presented
without significant modification; however it should be noted that Bach and Dorner present
this diagram in the context of larger and richer cognitive models, the other aspects of which
are not all incorporated in the integrative diagram.
• James Albus's three-hierarchy model of intelligence IAM011, involving coupled perception,
action and reinforcement hierarchies. Albus's model, utilized in the creation of intelligent
unmanned automated vehicles, is a crisp embodiment of many ideas emergent from the field
of intelligent control systems.
• Deep learning networks as a model of perception (and action and reinforcement learning),
as embodied for example in the work of Itamar Arel EARC09] and Jeff Hawkins 11113061. The
integrative diagram adopts this as the basic model of the perception and action subsystems
of human intelligence. Language understanding and generation are also modeled according
to this paradigm.
EFTA00623872
5.3 An Architecture Diagram for Human-Like General Intelligence
97
One possible negative reaction to the integrative diagram might be to say that it's a kind
of Frankenstein monster diagram, piecing together aspects of different theories in a way that
violates the theoretical notions underlying all of them! For example, the integrative diagram
takes LIDA as a model of working memory and reactive processing, but from the papers on
LIDA it's unclear whether the creators of LIDA construe it more broadly than that. The deep
learning community tends to believe that the architecture of current deep learning networks,
in itself, is close to sufficient for human-level general intelligence - whereas the integrative
diagram appropriates the ideas from this community mainly for handling perception, action
and language, etc.
On the other hand, in a more positive perspective, one could view the integrative diagram
as consistent with LIDA, but merely providing much more detail on some of the boxes in the
LIDA diagram (e.g. dealing with perception and long-term memory). And one could view the
integrative diagram as consistent with the deep learning paradigm - via viewing it, not as
a description of components to be explicitly implemented in an AGI system, but rather as a
description of the key structures and processes that must emerge in deep learning network, based
on its engagement with the world, in order for it to achieve human-like general intelligence.
Our own view, underlying the creation of the integrative diagram, is that different commu-
nities of cognitive science researchers have focused on different aspects of intelligence, and have
thus each created models that are more fully fleshed out in some aspects than others. But these
various models all link together fairly cleanly, which is not surprising as they are all grounded
in the same data regarding human intelligence. Many judgment calls mast be made in fusing
multiple models in the way that the integrative diagram does, but we feel these can be made
without violating the spirit of the component models. In assembling the integrative diagram, we
have made these judgment calls as best we can, but we're well aware that different judgments
would also be feasible and defensible. Revisions are likely as time goes on, not only due to
new data about human intelligence but also to evolution of understanding regarding the best
approach to model integration.
Another possible argument against the ideas presented here is that there's nothing new - all
the ingredients presented have been given before elsewhere. To this our retort is to quote Pascal:
"Let no one say that I have said nothing new ... the arrangement of the subject is new." The
various architecture diagrams incorporated into the integrative diagram are either extremely
high level (Sloman's diagram) or focus primarily on one aspect of intelligence, treating the
others very concisely by summarizing large networks of distinction structures and processes in
small boxes. The integrative diagram seeks to cover all aspects of human-like intelligence at a
roughly equal granularity - a different arrangement.
This kind of high-level diagramming exercise is not precise enough, nor dynamics-focused
enough, to serve as a guide for creating human-level or more advanced AGI. But it can be a
useful tool for explaining and interpreting a concrete AGI design, such as CogPrime.
5.3 An Architecture Diagram for Human-Like General Intelligence
The integrative diagram is presented here in a series of seven Figures.
Figure 5.1 gives a high-level breakdown into components, based on Sloman's high-level
cognitive-architectural sketch iffio0 1 I. This diagram represents, roughly speaking, "modern com-
mon sense" about how a human-like mind is architected. The separation between structures
EFTA00623873
98
5 A Generic Architecture of Human-Like Cognition
E
C
e p
T
b
N
$
U
$
Y
T
E
META COGNITIVE
4.-12.
PROCESSES
f
SELF/SOCIAL
•
REACTIVE
PROCESSES
ENVIRONMENT
REINFORCEMENT
A
C
N
$
E
N
U
Fig. 5.1: High-Level Architecture of a Human-Like Mind
and processes, embodied in having separate boxes for Working Memory vs. Reactive Processes,
and for Long Term Memory vs. Deliberative Processes, could be viewed as somewhat artificial,
since in the human brain and most AGI architectures, memory and processing are closely inte-
grated. However, the tradition in cognitive psychology is to separate out Working Memory and
Long Term Memory from the cognitive processes acting thereupon, so we have adhered to that
convention. The other changes from Sloman's diagram are the explicit inclusion of language,
representing the hypothesis that language processing is handled in a somewhat special way in
the human brain; and the inclusion of a reinforcement component parallel to the perception and
action hierarchies, as inspired by intelligent control systems theory (e.g. Albus as mentioned
above) and deep learning theory. Of course Sloman's high level diagram in its original form is
intended as inclusive of language and reinforcement, but we felt it made sense to give them
more emphasis.
Figure 5.2, modeling working memory and reactive processing, is essentially the LIDA di-
agram as given in prior papers by Stan Franklin, Bernard Baars and colleagues IBF091. The
boxes in the upper left corner of the LIDA diagram pertain to sensory, and motor processing,
which LIDA does not handle in detail, and which are modeled more carefully by deep learning
theory. The bottom left corner box refers to action selection, which in the integrative diagram
is modeled in more detail by Psi. The top right corner box refers to Long-Term Memory, which
the integrative diagram models in more detail as a synergetic multi-memory system (Figure
5.4).
The original LIDA diagram refers to various "codelets", a key concept in LIDA theory. We
have replaced "attention codelets" here with "attention flow", a more generic term. We suggest
one can think of an attention codelet as: a piece of information stating that, for a certain group
of items, it's currently pertinent to pay attention to this group as a collective.
EFTA00623874
5.3 An Architecture Diagram for Human-Like General Intelligence
99
LOWER LEVEL
PORTIONS OF
SUBSYSTEMS
SENSORNOTOR
MEMORY
ACTION
SELECTION
SENSORY
MEMORY
PERCEPTUAL
ASSOCIATIVE
MEMORY
TRANSIENT
EINSOO1
MEMORY
ACTIVE
PROCEMPAL
MEMORY
consoidation
LONG
TUIM
MEMORY
GLOBAL
WORKSPACE
Fig. 5.2: Architecture of Working Memory and Reactive Processing, closely modeled on the
LIDA architecture
Figure 5.3, modeling motivation and action selection, is a lightly modified version of the
Psi diagram from Joscha Bach's book Principles of Synthetic Intelligence taac091. The main
difference from Psi is that in the integrative diagram the Psi motivated action framework is
embedded in a larger, more complex cognitive model. Psi comes with its own theory of working
and long-term memory, which is related to but different from the one given in the integrative
diagram - it views the multiple memory types distinguished in the integrative diagram as
emergent from a common memory substrate. Psi comes with its own theory of perception and
action, which seems broadly consistent with the deep learning approach incorporated in the
integrative diagram. Psi's handling of working memory lacks the detailed, explicit workflow of
LIDA, though it seems broadly conceptually consistent with LIDA.
In Figure 5.3, the box labeled "Other portions of working memory" is labeled "Protocol and
situation memory" in the original Psi diagram. The Perception, Action Execution and Action
Selection boxes have fairly similar semantics to the similarly labeled boxes in the LIDA-like
Figure 5.2, so that these diagrams may be viewed as overlapping. The LIDA model doesn't
explain action selection and planning in as much detail as Psi, so the Psi-like Figure 5.3 could
be viewed as an elaboration of the action-selection portion of the LIDA-like Figure 5.2. In
Psi, reinforcement is considered as part of the learning process involved in action selection and
planning; in Figure 5.3 an explicit "reinforcement box" has been added to the original Psi
diagram, to emphasize this.
Figure 5.4, modeling long-term memory, and deliberative processing, is derived from our own
prior work studying the "cognitive synergy" between different cognitive processes associated
with different types of memory. The division into types of memory is fairly standard. Declarative,
procedural, episodic and sensorimotor memory are routinely distinguished: we like to distinguish
attentional memory, and intentional (goal) memory as well, and view these as the interface
between long-term memory and the mind's global control systems. One focus of our AGI design
work has been on designing learning algorithms, corresponding to these various types of memory,
EFTA00623875
100
5 A Ceneric Architecture of Human-Like Cognition
ocher portions of working memory
I
Perception
Cisloduistors
Action selection
Planning
=on
Motive selection
Urges (Drives)
reinforcement
Fig. 5.3 Architecture of Motivated Action
ARE REGOLATE0 BY EMOTION, ATTENTION
WOOXIYG
MEMORY
Fig. 5.4: Architecture of Long-Term Memory and Deliberative and Metacognitive Thinking
that interact with each other in a synergetic way roe09cl, helping each other to overcome
their intrinsic combinatorial explosions. There is significant evidence that these various types
of long-term memory are differently implemented in the brain, but the degree of structure and
dynamical commonality underlying these different implementations remains unclear.
EFTA00623876
5.3 An Architecture Diagram for Human-Like General Intelligence
101
Each of these long-term memory types has its analogue in working memory as well. In some
cognitive models, the working memory and long-term memory versions of a memory type and
corresponding cognitive processes, are basically the same thing. CogPrime is mostly like this
- it implements working memory as a subset of long-term memory consisting of items with
particularly high importance values. The distinctive nature of working memory is enforced via
using slightly different dynamical equations to update the importance values of items with
importance above a certain threshold. On the other hand, many cognitive models treat working
and long term memory as more distinct than this, and there is evidence for significant functional
and anatomical distinctness in the brain in some cases. So for the purpose of the integrative
diagram, it seemed best to leave working and long-term memory subcomponents as parallel but
distinguished.
Figure 5.4 also encompasses metacognition, under the hypothesis that in human beings and
human-like minds, metacognitive thinking is carried out acing basically the same processes as
plain ordinary deliberative thinking, perhaps with various tweaks optimizing them for thinking
about thinking. If it turns out that humans have, say, a special kind of reasoning faculty
exclusively for metacognition, then the diagram would need to be modified. Modeling of self
and others is understood to occur via a combination of metacognition and deliberative thinking,
as well as via implicit adaptation based on reactive processing.
MORE ABS1PACT ASPECTS
Of SENSORIMOTOR MEMORY
Fig. 5.5: Architecture for Multimodal Perception
Figure 5.5 models perception, according to the basic ideas of deep learning theory. Vision and
audition are modeled as deep learning hierarchies. with bottom-up and top-down dynamics. The
lower layers in each hierarchy refer to more localized patterns recognized in, and abstracted from,
sensory data. Output from these hierarchies to the rest of the mind is not just through the top
layers, but via some sort of sampling from various layers, with a bias toward the top layers. The
different hierarchies cross-connect, and are hence to an extent dynamically coupled together. It
is also recognized that there are some sensory modalities that aren't strongly hierarchical, e.g
EFTA00623877
102
5 A Generic Architecture of Human-Like Cognition
touch and smell (the latter being better modeled as something like an asymmetric Hopfield net,
prone to frequent chaotic dynamics ILIAV*051) - these may also cross-connect with each other
and with the more hierarchical perceptual subnetworks. Of course the suggested architecture
could include any number of sensory modalities; the diagram is restricted to four just for
simplicity.
The self-organized patterns in the upper layers of perceptual hierarchies may become quite
complex and may develop advanced cognitive capabilities like episodic memory, reasoning, lan-
guage learning, etc. A pure deep learning approach to intelligence argues that all the aspects
of intelligence emerge from this kind of dynamics (among perceptual, action and reinforcement
hierarchies). Our own view is that the heterogeneity of human brain architecture argues against
this perspective, and that deep learning systems are probably better as models of perception
and action than of general cognition. However, the integrative diagram is not committed to
our perspective on this - a deep-learning theorist could accept the integrative diagram, but
argue that all the other portions besides the perceptual, action and reinforcement hierarchies
should be viewed as descriptions of phenomena that emerge in these hierarchies due to their
interaction.
ACTION AND REINFORCFMFNT CI niSYSIEM
MOTOR
PLANNING
MOTIVATION/
HIGHER LEVEL
RIGHT ARM
RIGHT LEG
HIERARCHY
HIERARCHY
REINFORCEMENT
HIERARCHY
PERCEPTION
HIERARCHY
Fig. 5.6: Architecture for Action and Reinforcement
Figure 5.6 shows an action subsystem and a reinforcement subsystem, parallel to the per-
ception subsystem. Two action hierarchies, one for an arm and one for a leg, are shown for
EFTA00623878
5.3 An Architecture Diagram for Human-Like General Intelligence
103
concreteness, but of course the architecture is intended to be extended more broadly. In the
hierarchy corresponding to an arm, for example, the lowest level would contain control patterns
corresponding to individual joints, the next level up to groupings of joints (like fingers), the
next level up to larger parts of the arm (hand, elbow). The different hierarchies corresponding
to different body parts cross-link, enabling coordination among body parts; and they also con-
nect at multiple levels to perception hierarchies, enabling sensorimotor coordination. Finally
there is a module for motor planning, which links tightly with all the motor hierarchies. and
also overlaps with the more cognitive, inferential planning activities of the mind, in a manner
that is modeled different ways by different theorists. Albus EANI011 has elaborated this kind of
hierarchy quite elaborately.
The reward hierarchy in Figure 5.6 provides reinforcement to actions at various levels on
the hierarchy, and includes dynamics for propagating information about reinforcement up and
down the hierarchy.
LANGLIAit froJef EEE tl
0141/4 01.4 COMMA.
(VIA MtIOIOIA Dtlail0.0 ,4 PeOCentS)
Fig. 5.7: Architecture for Language Processing
Figure 5.7 deals with language, treating it as a special case of coupled perception and action.
The traditional architecture of a computational language comprehension system is a pipeline
NMI !Coe lOdl, which is equivalent to a hierarchy with the lowest-level linguistic features (e.g.
sounds, words) at the bottom, and the highest level features (semantic abstractions) at the top,
and syntactic features in the middle. Feedback connections enable semantic and cognitive mod-
ulation of lower-level linguistic processing. Similarly, language generation is commonly modeled
hierarchically, with the top levels being the ideas needing verbalization. and the bottom level
corresponding to the actual sentence produced. In generation the primary flow is top-down,
with bottom-up flow providing modulation of abstract concepts by linguistic surface forms.
So, that's it - an integrative architecture diagram for human-like general intelligence, split
among seven different pictures, formed by judiciously merging together architecture diagrams
produced via a number of cognitive theorists with different, overlapping foci and research
paradigms.
Is anything critical left out of the diagram? A quick perusal of the table of contents of
cognitive psychology textbooks suggests to me that if anything major is left out, it's also
unknown to current cognitive psychology. However, one could certainly make an argument for
explicit inclusion of certain other aspects of intelligence, that in the integrative diagram are
EFTA00623879
104
5 A Ceneric Architecture of Human-Like Cognition
left as implicit emergent phenomena. For instance, creativity is obviously very important to
intelligence, but, there is no "creativity" box in any of these diagrams - because in our view,
and the view of the cognitive theorists whose work we've directly drawn on here, creativity
is best viewed as a process emergent from other processes that are explicitly included in the
diagrams.
5.4 Interpretation and Application of the Integrative Diagram
A tongue-partly-in-cheek definition of a biological pathway is "a subnetwork of a biological
network, that fits on a single journal page." Cognitive architecture diagrams have a similar
property - they are crude abstractions of complex structures and dynamics, sculpted in ac-
cordance with the size of the printed page, and the tolerance of the human eye for absorbing
diagrams, and the tolerance of the human author for making diagrams.
However, sometimes constraints - even arbitrary ones - are useful for guiding creative ef-
forts, due to the fact that they force choices. Creating an architecture for human-like general
intelligence that fits in a few (okay, seven) fairly compact diagrams, requires one to make many
choices about what features and relationships are most essential. In constructing the integrative
diagram, we have sought to make these choices, not purely according to our own tastes in cog-
nitive theory or AGI system de-sign, but according to a sort of blend of the taste and judgment
of a number of scientists whose views we respect, and who seem to have fairly compatible,
complementary perspectives.
What is the use of a cognitive architecture diagram like this? It can help to give newcomers
to the field a basic idea about what is known and suspected about the nature of human-like
general intelligence. Also, it could potentially be used as a tool for cross-correlating different
AGI architectures. If everyone who authored an AGI architecture would explain how their archi-
tecture accounts for each of the structures and processes identified in the integrative diagram,
this would give a means of relating the various AGI designs to each other.
The integrative diagram could also be used to help connect AGI and cognitive psychology
to neuroscience in a more systematic way. In the case of LIDA, a fairly careful correspondence
has been drawn up between the LIDA diagram nodes and links and various neural structures
and processes IFI308]. Similar knowledge exists for the rest of the integrative diagram, though
not organized in such a systematic fashion. A systematic curation of links between the nodes
and links in the integrative diagram and current neuroscience knowledge, would constitute an
interesting first approximation of the holistic cognitive behavior of the human brain.
Finally (and harking forward to later chapters), the big omission in the integrative diagram
is dynamics. Structure alone will only get you so far, and you could build an AGI system with
reasonable-looking things in each of the integrative diagram's boxes, interrelating according to
the given arrows, and yet still fail to make a viable AGI system. Given the limitations the
real world places on computing resources, it's not enough to have adequate representations
and algorithms in all the boxes, communicating together properly and capable doing the right
things given sufficient resources. Rather, one needs to have all the boxes filled in properly
with structures and processes that, when they act together using feasible computing resources,
will yield appropriately intelligent behaviors via their cooperative activity. And this has to do
with the complex interactive dynamics of all the processes in all the different boxes - which is
EFTA00623880
5.4 Interpretation and Application of the Integrative Diagram
105
something the integrative diagram doesn't touch at all. This brings us again to the network of
ideas we've discussed under the name of "cognitive synergy," to be discussed later on.
It might be possible to make something similar to the integrative diagram on the level of
dynamics rather than structures, complementing the structural integrative diagram given here;
but this would seem significantly more challenging, because we lack a standard set of tools for
depicting system dynamics. Most cognitive theorists and AGI architects describe their structural
ideas using boxes-and-lines diagrams of some sort, but there is no standard method for depicting
complex system dynamics. So to make a dynamical analogue to the integrative diagram, via
a similar integrative methodology, one would first need to create appropriate diagrammatic
formalizations of the dynamics of the various cognitive theories being integrated - a fascinating
but onerous task.
When we first set out to make an integrated cognitive architecture diagram, via combining
the complementary insights of various cognitive science and AGI theorists, we weren't sure how
well it would work. But now we feel the experiment was generally a success - the resultant
integrated architecture seems sensible and coherent, and reasonably complete. It doesn't come
close to telling you everything you need to know to understand or implement a human-like
mind — but it tells you the various processes and structures you need to deal with, and which of
their interrelations are most critical. And, perhaps just as importantly, it gives a concrete way
of understanding the insights of a specific but fairly diverse set of cognitive science and AGI
theorists as complementary rather than contradictory. In a CogPrime context, it provides a
way of tying in the specific structures and dynamics involved in CogPrime, with a more generic
portrayal of the structures and dynamics of human-like intelligence.
EFTA00623881
EFTA00623882
Chapter 6
A Brief Overview of CogPrime
6.1 Introduction
Just as there are many different approaches to human flight - airplanes, helicopters, balloons,
spacecraft, and doubtless many methods no person has thought of yet - similarly, there are likely
many different approaches to advanced artificial general intelligence. All the different approaches
to flight exploit the same core principles of aerodynamics in different ways; and similarly, the
various different approaches to AGI will exploit the same core principles of general intelligence
in different ways.
In the chapters leading up to this one, we have taken a fairly broad view of the project
of engineering AGI. We have presented a conception and formal model of intelligence, and
described environments, teaching methodologies and cognitive and developmental pathways
that we believe are collectively appropriate for the creation of AGI at the human level and
ultimately beyond, and with a roughly human-like bias to its intelligence. These ideas stand
alone and may be compatible with a variety of approaches to engineering AGI systems. However,
they also set the stage for the presentation of CogPrime, the particular AGI design on which
we are currently working.
The thorough presentation of the CogPrime design is the job of Part 2 of this book - where,
not only are the algorithms and structures involved in CogPrime reviewed in more detailed,
but their relationship to the theoretical ideas underlying CogPrime is pursued more deeply.
The job of this chapter is a smaller one: to give a high-level overview of some key aspects the
CogPrime architecture at a mostly nontechnical level, so as to enable you to approach Part
2 with a little more idea of what to expect. The remainder of Part 1, following this chapter,
will present various theoretical notions enabling the particulars, intent and consequences of the
CogPrime design to be more thoroughly understood.
6.2 High-Level Architecture of CogPrime
Figures 6.1, 6.2 , 6.4 and 6.5 depict the high-level architecture of CogPrime, which involves
the use of multiple cognitive processes associated with multiple types of memory to enable
an intelligent agent to execute the procedures that it believes have the best probability of
working toward its goals in its current context. In a robot preschool context, for example, the
107
EFTA00623883
108
6 A Brief Overview of CogPrime
top-level goals will be simple things such as pleasing the teacher, learning new information
and skills, and protecting the robot's body. Figure 6.3 shows part of the architecture via which
cognitive processes interact with each other, via commonly acting on the AtomSpace knowledge
repository.
Comparing these diagrams to the integrative human cognitive architecture diagrams given
in Chapter 5, one sees the main difference is that the CogPrime diagrams commit to specific
structures (e.g. knowledge representations) and processes, whereas the generic integrative archi-
tecture diagram refers merely to types of structures and processes. For instance, the integrative
diagram refers generally to declarative knowledge and learning, whereas the CogPrime diagram
refers to PLN, as a specific system for reasoning and learning about declarative knowledge. Ta-
ble 6.1 articulates the key connections between the components of the CogPrime diagram and
those of the integrative diagram, thus indicating the general cognitive functions instantiated by
each of the CogPrime components.
6.3 Current and Prior Applications of OpenCog
Before digging deeper into the theory, and elaborating sonic of the dynamics underlying the
above diagrams, we pause to briefly discuss some of the practicalities of work done with the
OpenCog system currently implementing parts of the CogPrime architecture.
OpenCog, the open-source software framework underlying the "OpenCogPrime" (currently
partial) implementation of the CogPrime architecture, has been used for commercial applica-
tions in the area of natural language processing and data mining; for instance, see ICPPG061
where OpenCogPrime's PLN reasoning and RelEx language processing are combined to do
automated biological hypothesis generation based on information gathered from PubMed ab-
stracts. Most relevantly to the present work, it has also been used to control virtual agents in
virtual worlds IGEA081.
Prototype work done during 2007-2008 involved using an OpenCog variant called the Open-
PetBrain to control virtual dogs in a virtual world (see Figure 6.6 for a screenshot of an
OpenPetBrain-controlled virtual dog). While these OpenCog virtual dogs did not display in-
telligence closely comparable to that of real dogs (or human children), they did demonstrate a
variety of interesting and relevant functionalities including:
• learning new behaviors based on imitation and reinforcement
• responding to natural language commands and questions, with appropriate actions and
natural language replies
• spontaneous exploration of their world, remembering their experiences and using them to
bias future learning and linguistic interaction
One current OpenCog initiative involves extending the virtual dog work via using OpenCog
to control virtual agents in a game world inspired by the game Minecraft. These agents are
initially specifically concerned with achieving goals in a game world via constructing structures
with blocks and carrying out simple English communications. Representative example tasks
would be:
• Learning to build steps or ladders to get desired objects that are high up
• Learning to build a shelter to protect itself from aggressors
EFTA00623884
6.3 Current and Prior Applications of OpenCog
109
rut
Malik
MOUNIACt
ilOWVORIt
01-0 1.44
PM:C*OwII.
MM SION'
In
1,0040
ATOMS In
WACIPOILV
1 -410C0711tir
,
•
1
•
y
.
1
t
"
,
a
n
t
-
.
„
•
Unit AWNS MANI
*
MIL INC
TRION VAVAS
MOMS
ATOM)
40MIVIINIMISO
14•10.4.04T
163W-C.Cti
MOM /Mt
Fig. 6.1: High-Level Architecture of CogPrime. This is a conceptual depiction, not a
detailed flowchart (which would be too complex for a single image). Figures 6.2 , 6.4 and 6.5
highlight specific aspects of this diagram.
• Learning to build structures resembling structures that it's shown (even if the available
materials are a bit different)
• Learning how to build bridges to cross chasms
Of course, the AI significance of learning tasks like this all depends on what kind of feedback
the system is given, and how complex its environment is. It would be relatively simple to make
an Al system do things like this in a trivial and highly specialized way, but that is not the intent
of the project the goal is to have the system learn to carry out tasks like this using general
learning mechanisms and a general cognitive architecture, based on embodied experience and
EFTA00623885
110
6 A Brief Overview of CogPrime
only scant feedback from human teachers. If successful, this will provide an outstanding platform
for ongoing AGI development, as well as a visually appealing and immediately meaningful demo
for OpenCog.
Specific, particularly simple tasks that are the focus of this project team's current work at
time of writing include:
• Watch another character build steps to reach a high-up object
• Figure out via imitation of this that, in a different context, building steps to reach a high
up object may be a good idea
• Also figure out that, if it wants a certain high-up object but there are no materials for
building steps available, finding some other way to get elevated will be a good idea that
may help it get the object
6.3.1 lb-ansitioning from Virtual Agents to a Physical Robot
Preliminary experiments have also been conducted using OpenCog to control a Nao robot as well
as a virtual dog HIG081. This involves hybridizing OpenCog with a separate (but interlinked)
subsystem handling low-level perception and action. In the experiments done so far, this has
been accomplished in an extremely simplistic way. How to do this right is a topic treated in
detail in Chapter 26 of Part 2.
We suspect that reasonable level of capability will be achievable by simply interposing DeS-
TIN (or some other system in its place) as a perception/action "black box" between OpenCog
and a robot. Some preliminary experiments in this direction have already been carried out, con-
necting the OpenPetBrain to a Nao robot using simpler, less capable software than DeSTIN in
the intermediary role (off-the-shelf speech-to-text, text-to-speech and visual object recognition
software).
However. we also suspect that to achieve robustly intelligent robotics we mast go beyond this
approach. and connect robot perception and actuation software with OpenCogPrime in a "white
box" manner that allows intimate dynamic feedback between perceptual, motoric, cognitive
and linguistic functions. We will achieve this via the creation and real-time utilization of links
between the nodes in CogPrime's and DeSTIN's internal networks (a topic to be explored in
more depth Inter in this chapter).
6.4 Memory Types and Associated Cognitive Processes in CogPrime
Now we return to the basic description of the CogPrime approach, turning to aspects of the
relationship between structure and dynamics. Architecture diagrams are all very well, but,
ultimately it is dynamics that makes an architecture come alive. Intelligence is all about learning,
which is by definition about change, about dynamical response to the environment and internal
self-organizing dynamics.
CogPrime relies on multiple memory, types and, as discussed above, is founded on the premise
that the right course in architecting a pragmatic, roughly human-like AGI system is to handle
different types of memory differently in terms of both structure and dynamics.
EFTA00623886
6.4 Memory Types and Associated Cognitive Processes in CogPrime
111
CogPrime's memory types are the declarative, procedural, sensory, and episodic memory
types that are widely discussed in cognitive neuroscience urcom, plus attentional memory for
allocating system resources generically, and intentional memory for allocating system resources
in a goal-directed way. Table 6.2 overviews these memory types, giving key references and indi-
cating the corresponding cognitive processes, and also indicating which of the generic patternist
cognitive dynamics each cognitive process corresponds to (pattern creation, association, etc.).
Figure 6.7 illustrates the relationships between several of the key memory types in the context
of a simple situation involving an OpenCogPrime-controlled agent in a virtual world.
In terms of patternist cognitive theory, the multiple types of memory in CogPrime should be
considered as specialized ways of storing particular types of patterns, optimized for spacetime
efficiency. The cognitive processes associated with a certain type of memory deal with creating
and recognizing patterns of the type for which the memory is specialized. While in principle all
the different sorts of pattern could be handled in a unified memory and processing architecture,
the sort of specialization used in CogPrime is necvsbary in order to achieve acceptable efficient
general intelligence using currently available computational resources. And as we have argued
in detail in Chapter 7, efficiency is not a side-issue but rather the essence of real-world AGI
(since as Hutter has shown, if one casts efficiency aside, arbitrary levels of general intelligence
can be achieved via a trivially simple program).
The essence of the CogPrime design lies in the way the structures and processes associated
with each type of memory are designed to work together in a closely coupled way, yielding coop-
erative intelligence going beyond what could be achieved by an architecture merely containing
the same structures and processes in separate "black boxes."
The inter-cognitive-process interactions in OpenCog are designed so that
• conversion between different types of memory is possible, though sometimes computation-
ally costly (e.g. an item of declarative knowledge may with some effort be interpreted
procedurally or episodically, etc.)
• when a learning process concerned centrally with one type of memory encounters a situation
where it learns very slowly, it can often resolve the issue by converting some of the relevant
knowledge into a different type of memory: i.e. cognitive synergy
6.4.1 Cognitive Synergy in PLN
To put a little meat on the bones of the "cognitive synergy" idea, discussed repeatedly in prior
chapters and more extensively in latter chapters, we now elaborate a little on the role it plays
in the interaction between procedural and declarative learning.
While MOSES handles much of CogPrime's procedural learning, and CogPrime's internal
simulation engine handles most episodic knowledge, CogPrime's primary tool for handling
declarative knowledge is an uncertain inference framework called Probabilistic Logic Networks
(PLN). The complexities of PLN are the topic of a lengthy technical monograph IGNIIH08], and
are summarized in Chapter 34; here we will eschew most details and focus mainly on pointing
out how PLN seeks to achieve efficient inference control via integration with other cognitive
processes.
As a logic, PLN is broadly integrative: it combines certain term logic rules with more standard
predicate logic rules, and utilizes both fuzzy truth values and a variant of imprecise probabilities
called indefinite probabilities. PLN mathematics tells how these uncertain truth values propagate
EFTA00623887
112
6 A Brief Overview of CogPrime
through its logic rules, so that uncertain premises give rise to conclusions with reasonably
accurately estimated uncertainty values. This careful management of uncertainty is critical for
the application of logical inference in the robotics context, where most knowledge is abstracted
from experience and is hence highly uncertain.
PLN can be used in either forward or backward chaining mode; and in the language intro-
duced above, it can be used for either analysis or synthesis. As an example, we will consider
backward chaining analysis, exemplified by the problem of a robot preschool-student trying to
determine whether a new playmate -Bob" is likely to be a regular visitor to is preschool or not
(evaluating the truth value of the implication Bob —> regular _visitor). The basic backward
chaining process for PLN analysis looks like:
1. Given an implication L
A —> B whose truth value must be estimated (for instance
L E Concept A Procedure
Coal as discussed above), create a list (A1, ..., An) of (inference
rule, stored knowledge) pairs that might be used to produce L
2. Using analogical reasoning to prior inferences. assign each A; a probability of success
• If some of the A; are estimated to have reasonable probability of success at generating
reasonably confident estimates of L's truth value, then invoke Step 1 with A; in place
of L (at this point the inference process becomes recursive)
• If none of the Ai looks sufficiently likely to succeed, then inference has "gotten stuck"
and another cognitive process should be invoked, e.g.
— Concept creation may be used to infer new concepts related to A and B, and then
Step 1 may be revisited, in the hope of finding a new, more promising Ai involving
one of the new concepts
— MOSES may be invoked with one of several special goals, e.g. the goal of finding
a procedure P so that P(X) predicts whether X -t B. If MOSES finds such a
procedure P then this can be converted to declarative knowledge understandable
by PLN and Step 1 may be revisited....
— Simulations may be run in CogPrime's internal simulation engine, so as to observe
the truth value of A r B in the simulations; and then Step 1 may be revisited....
The combinatorial explosion of inference control is combatted by the capability to defer to
other cognitive processes when the inference control procedure is unable to make a sufficiently
confident choice of which inference steps to take next. Note that just as MOSES may rely
on PLN to model its evolving populations of procedures, PLN may rely on MOSES to create
complex knowledge about the terms in its logical implications. This is just one example of the
multiple ways in which the different cognitive processes in CogPrime interact synergetically; a
more thorough treatment of these interactions is given in [Goe09a1.
In the "new playmate" example, the interesting case is where the robot initially seems not
to know enough about Bob to make a solid inferential judgment (so that none of the Ai seem
particularly promising). For instance, it might carry out a number of possible inferences and not
come to any reasonably confident conclusion, so that the reason none of the A; seem promising
is that all the decent-looking ones have been tried already. So it might then recourse to MOSES,
simulation or concept creation.
For instance, the PLN controller could make a list of everyone who has been a regular
visitor, and everyone who has not been, and pose MOSES the task of figuring out a procedure
for distinguishing these two categories. This procedure could then be used directly to make the
needed assessment, or else be translated into logical rules to be used within PLN inference. For
EFTA00623888
6.5 Coal-Oriented Dynamics in CogP
113
example, perhaps MOSES would discover that older males wearing ties tend not to become
regular visitors. If the new playmate is an older male wearing a tie, this is directly applicable.
But if the current playmate is wearing a tuxedo, then PLN may be helpful via reasoning that
even though a tuxedo is not a tie, it's a similar form of fancy dress - so PLN may extend the
MOSES-learned rule to the present case and infer that the new playmate is not likely to be a
regular visitor.
6.5 Goal-Oriented Dynamics in CogPrime
CogPrime's dynamics has both goal-oriented and "spontaneous" aspects; here for simplicity's
sake we will focus on the goal-oriented ones. The basic goal-oriented dynamic of the CogPrime
system, within which the various types of memory are utilized, is driven by implications known
as "cognitive schematics", which take the form
Context A Procedure —> Goal < p >
(summarized C A P
G). Semi-formally, this implication may be interpreted to mean: "If the
context C appears to hold currently, then if I enact the procedure P, I can expect to achieve the
goal G with certainty p." Cognitive synergy means that the learning processes corresponding to
the different types of memory actively cooperate in figuring out what procedures will achieve
the system's goals in the relevant contexts within its environment.
CogPrime's cognitive schematic is significantly similar to production rules in classical ar-
chitectures like SOAR and ACT-R (as reviewed in Chapter 4; however, there are significant
differences which are important to CogPrime's functionality. Unlike with classical production
rules systems, uncertainty is core to CogPrime's knowledge representation, and each CogPrime
cognitive schematic is labeled with an uncertain truth value, which is critical to its utilization by
CogPrime's cognitive processes. Also, in CogPrime, cognitive schematics may be incomplete,
missing one or two of the terms, which may then be filled in by various cognitive processes
(generally in an uncertain way). A stronger similarity is to MicroPsi's triplets; the differences
in this case are more low-level and technical and have already been mentioned in Chapter 4.
Finally, the biggest difference between CogPrime's cognitive schematics and production rules
or other similar constructs, is that in CogPrime this level of knowledge representation is not
the only important one. CLARION ISZ0-1], as reviewed above, is an example of a cognitive
architecture that uses production rules for explicit knowledge representation and then uses a
totally separate subsymbolic knowledge store for implicit knowledge. In CogPrime
both explicit and implicit knowledge are stored in the same graph of nodes and links, with
• explicit knowledge stored in probabilistic logic based nodes and links such as cognitive
schematics (see Figure 6.8 for a depiction of some explicit linguistic knowledge.)
• implicit knowledge stored in patterns of activity among these same nodes and links, defined
via the activity of the "importance" values (see Figure 6.9 for an illustrative example thereof)
associated with nodes and links and propagated by the ECAN attention allocation process
The meaning of a cognitive schematic in CogPrime is hence not entirely encapsulated in its
explicit logical form, but resides largely in the activity patterns that ECAN causes its activation
or exploration to give rise to. And this fact is important because the synergetic interactions
of system components are in large part modulated by ECAN activity. Without the real-time
EFTA00623889
114
6 A Brief Overview of CogPrime
combination of explicit and implicit knowledge in the system's knowledge graph, the synergetic
interaction of different cognitive processes would not work so smoothly, and the emergence of
effective high-level hierarchical, heterarchical and self structures would be less likely.
6.6 Analysis and Synthesis Processes in CogPrime
We now return to CogPrime's fundamental cognitive dynamics, using examples from the "virtual
dog" application to motivate the discussion.
The cognitive schematic Context A Procedure —> Goal leads to a conceptualization of the
internal action of an intelligent system as involving two key categories of learning:
• Analysis: Estimating the probability p of a posited C A P
G relationship
• Synthesis: Filling in one or two of the variables in the cognitive schematic, given as-
sumptions regarding the remaining variables, and directed by the goal of maximizing the
probability of the cognitive schematic
More specifically, where synthesis Ls concerned,
• The MOSES probabilistic evolutionary program learning algorithm is applied to find P,
given fixed C and G. Internal simulation Ls also used, for the purpose of creating a simulation
embodying C and seeing which P lead to the simulated achievement of G.
- Example: A virtual dog learns a procedure P to please its owner (the goal G) in the
context C where there is a ball or stick present and the owner is saying "fetch".
• PLN inference, acting on declarative knowledge, is used for choosing C, given fixed P and
G (also incorporating sensory and episodic knowledge as appropriate). Simulation may also
be used for this purpose.
- Example: A virtual dog wants to achieve the goal G of getting food, and it knows that
the procedure P of begging has been successful at this before, so it seeks a context C
where begging can be expected to get it food. Probably this will be a context involving a
friendly person.
• PLN-based goal refinement is used to create new subgoals C to sit on the right hand side
of instances of the cognitive schematic.
- Example: Given that a virtual dog has a goal of finding food, it may learn a subgoal of
following other dogs, due to observing that other dogs are often heading toward their
food.
• Concept formation heuristics are used for choosing G and for fueling goal refinement, but
especially for choosing C (via providing new candidates for C). They are also used for
choosing P, via a process called "predicate schematization" that turns logical predicates
(declarative knowledge) into procedures.
- Example: At first a virtual dog may have a hard time predicting which other dogs are
going to be mean to it But it may eventually observe common features among a number
of mean dogs, and thus form its own concept of "sit bull," without anyone ever teaching
it this concept explicitly.
EFTA00623890
6.6 Analysis and Synthesis Processes in CogP
115
Where analysis is concerned:
• PLN inference, acting on declarative knowledge, is used for estimating the probability of
the implication in the cognitive schematic, given fixed C, P and G. Episodic knowledge
is also used in this regard, via enabling estimation of the probability via simple similarity
matching against past experience. Simulation is also used: multiple simulations may be run,
and statistics may be captured therefrom.
- Example: To estimate the degree to which asking Bob for food (the procedure P is "asking
for food", the context C is "being with Bob") will achieve the goal G of getting food, the
virtual dog may study its memory to see what happened on previous occasions where it
or other dogs asked Bob for food or other things, and then integrate the evidence from
these occasions.
• Procedural knowledge, mapped into declarative knowledge and then acted on by PLN in-
ference, can be useful for estimating the probability of the implication CAP
G, in cases
where the probability of C A Pi r G is known for some Pi related to P.
- Example: knowledge of the internal similarity between the procedure of asking for food
and the procedure of asking for toys, allows the virtual dog to reason that if asking Bob
for toys has been successful, maybe asking Bob for food will be successful too.
• Inference, acting on declarative or sensory knowledge, can be useful for estimating the
probability of the implication C A P
G, in cases where the probability of C1 AP —)G is
known for some CI related to C.
- Example: if Bob and Jim have a lot of features in common, and Bob often responds
positively when asked for food, then maybe Jim will too.
• Inference can be used similarly for estimating the probability of the implication CAP
0,
in cases where the probability of C A P
GI is known for some G1 related to G. Concept
creation can be useful indirectly in calculating these probability estimates, via providing
new concepts that can be used to make useful inference trails more compact and hence
easier to construct.
- Example: The dog may reason that because Jack likes to play, and Jack and Jill are both
children, maybe Jill likes to play too. It can carry out this reasoning only if its concept
creation process has invented the concept of "child" via analysis of observed data.
In these examples we have focused on cases where two terms in the cognitive schematic are
fixed and the third must be filled in; but just as often, the situation is that only one of the
terms is fixed. For instance, if we fix G, sometimes the best approach will be to collectively
learn C and P. This requires either a procedure learning method that works interactively with a
declarative-knowledge-focused concept learning or reasoning method; or a declarative learning
method that works interactively with a procedure learning method. That is, it requires the sort
of cognitive synergy built into the CogPrime design.
EFTA00623891
116
6 A Brief Overview of CogPrime
6.7 Conclusion
To thoroughly describe a comprehensive, integrative AGI architecture in a brief chapter would
be an impossible task; all we have attempted here is a brief overview, to be elaborated on in
the 800-odd pages of Part 2 of this book. We do not expect this brief summary to be enough to
convince the skeptical reader that the approach described here has a reasonable odds of success
at achieving its stated goals, or even of fulfilling the conceptual notions outlined in the preceding
chapters. However, we hope to have given the reader at least a rough idea of what sort of AG1
design we are advocating, and why and in what sense we believe it can lead to advanced artificial
general intelligence. For more details on the structure, dynamics and underlying concepts of
CogPrime, the reader is encouraged to proceed to Part 2- after completing Part 1, of course.
Please be patient - building a thinking machine is a big topic, and we have a lot to say about
it!
EFTA00623892
6.7 Conclusion
117
IrISOOK
c
(atHIPSITIOITAI)
00400IIK
KAAIOCIATIVt)
I OILY
eliNCITY
VOA
%PAC(
("It...X"""
00.1
ITA,OSSTOI.
040"4.!
Ai."
CLITOSLAC
0.0(1.0•04
•TOTIUT
siTHAATIOTT
OKTIMA
4.01.10DIC
IIITCGOING
'aka
0.1.0•01.1011•
INALOOING
—t
[
A
u rt tolitio.
(KATI.'"
LAYKLIK
("AAR.late
•U•
INIOSANT.STK
IliortaINAS
ea( AAAAA Iv./
KHAKI IC •TOITI.
4
4
TATOCIOuott
moon
NoTOR
0.000
,Leit
ATOMS
III Mal
"TONS x
MW ATOMS
•sa Toitono
("Ors An
"ALEC VVVVVV
u yi..•TONS,
DIATOCATO
(V •
PITOCtOote
sae TOTAHIAN
ATOMS
,
IleettANCE vAtITIS
•
`
I
05`
d
MONS
"TOM
fame ATOMS NAvr
On(
Saw VAIN"
I
ONIA Pal
MUSS
"ATTAIN
THATIOcila
ATOM SPAR
/c\
/c\
/
\
/c\
/c\
PITKrintOis
siiitiactv
\=-1-%
i
l
l
<
1
3
0
•
1
\
4
,
1
4
-
.
`
„
/
\Toori..yo.
'Cr
\
le
ICITSOTIS
ACTuATOTTS
Fig. 6.2: Key Explicitly Implemented Processes of CogPrime . The large box at the
center is the Atomspace, the system's central store of various forms of (long-term and working)
memory, which contains a weighted labeled hypergraph whose nodes and links are "Atoms" of
various sorts. The hexagonal boxes at the bottom denote various hierarchies devoted to recog-
nition and generation of patterns: perception, action and linguistic. Intervening between these
recognition/generation hierarchies and the Atomspace, we have a pattern mining/imprinting
component (that recognizes patterns in the hierarchies and passes them to the Atomspace; and
imprints patterns from the Atomspace on the hierarchies); and also OpenPsi, a special dynam-
ical framework for choosing actions based on motivations. Above the Atomspace we have a
host of cognitive processes, which act on the Atomspace, some continually and some only as
context dictates, carrying out various sorts of learning and reasoning (pertinent to various sorts
of memory) that help the system fulfill its goals and motivations.
EFTA00623893
118
6 A Brief Overview of CogPrime
Atom
%QM C
Mind Agents
Mind Agents
Mind Agents
Fig. 6.3: MindAgents al d AtomSpace in OpenCog. This is a conceptual depiction of
one way cognitive processes may interact in OpenCog - they may be wrapped in MindAgent
objects, which interact via cooperatively acting on the AtomSpace.
EFTA00623894
6.7 Conclusion
119
(
[11101:::,
affIX.IATIVE) 1
(
@AAUP
ha
s
1 (
1•0•11.4
srylitClOttit
litAPIMIC
it
IIIIAMOOD3
CWYILCIRG
POI
KKL
•Nodialkil
C
=I
%\ce.ROIPIP):::/ /'
I
4
(AWL.%)
aligre44004)
Man
1
e•11140011.4
WW2
NIS
IOWA/ION
=s
' i• I 4
DICILANAtIvt/
SINAMIC•10/t)
A
PlkaCtOuitt
LION
V
•
MOMS
L L
MONS
M=1
V
DIMON,
AV .
Nan 5,10.0
4
iiI40 LON* II..
% 134•001Inna VALANS
,'
Unit 51044$ MANI
IIMCI* 1•OO
fieLINO
TIMM %MARS
LIONS
ATOM)
l=
40..4•10n
IS=
Fig. 6.4: Links Between Cognitive Processes and the Atomspace. The cognitive pro-
cessm depicted all act on the Atomspace, in the sense that they operate by observing certain
Atoms in the Atomspace and then modifying (or in rare cases deleting) them, and potentially
adding new Atoms as well. Atoms represent all forms of knowledge, but sonic forms of knowl-
edge are additionally represented by external data stores connected to the Atomspace, such as
the Procedure Repository; these are also shown as linked to the Atomspace.
EFTA00623895
120
6 A Brief Overview of CogPrime
\
PEOCIPTIOls
sItalaCiat
Xi:Fr
:Vow
SCIPPZIDS
KASSOCLUIVt)
EPISODIC
-MCAT
IIIPOS/104,
ATOM SPACi
I
CIPIM Pal
PATTED%
INOIIIMEN•
/c\
\
(
O.00
INSESSIOIIIA4)
6400IIK
SPACE
/c\
\e/
/
\
aCISLEIOSS
Fig. 6.5: Invocation of Atom Operations By Cognitive Processes. This diagram depicts
some of the Atom modification, creation and deletion operations carried out by the abstract
cognitive processes in the CogPrime architecture.
EFTA00623896
6.7 Conclusion
121
CogPrime
Component
Int. Diag.
Sub-Diagram
Int. Diag. Component
Procedure Repository
Long-Term Memory
Procedural
Procedure Repository
Working Memory
Active Procedural
Associative Episodic
Memory
Long-Term Memory
Episodic
Associative Episodic
Memory
Working Memory
Transient Episodic
Backup Store
Long-Term Memory
no correlate: a function not
necessarily possessed by the human
mind
Spacetime Server
Long-Term Memory
Declarative and Sensorimotor
Dimensional
Embedding Space
no clear correlate: a
tool for helping
multiple types of LThf
Dimensional
Embedding Agent
no clear correlate
Blending
Long-Term and
Working Memory
Concept Formation
Clustering
Long-Term and
Working Memory
Concept Formation
PLN Probabilistic
Inference
Long-Term and
Working Memory
Reasoning and Plan
Learning/Optimization
MOSES / Hillclimbing
Long-Term and
Working Memory
Procedure Learning
World Simulation
Long-Term and
Working Memory
Simulation
Episodic Encoding /
Recall
Long-Term g Memory
Story-telling
Episodic Encoding /
Recall
Working Memory
Consolidation
Forgetting / Fkeezing
/ Defrosting
Long-Term and
Working Memory
no correlate: a function not
necessarily possessed by the human
mind
Map Formation
Long-Term Memory
Concept Formation and Pattern
Mining
Attention Allocation
Long-Term and
Working Memory
Hebbian/Attentional Learning
Attention Allocation
High-Level Mind
Architecture
Reinforcement
Attention Allocation
Working Memory
Perceptual Associative Memory and
Local Association
AtomSpace
High-Level Mind
Architecture
no clear correlate: a general tool for
representing memory including
long-term and working, plus some of
perception and action
AtomSpace
Working Memory
Global Workspace (the high-STI
portion of AtomSpace) & other
Workspaces
Declarative Atoms
Long-Term and
Working Memory
Declarative and Sensorimotor
Procedure Atoms
Long-Term and
Working Memory
Procedural
Hebbian Atoms
Long-Term and
Working Memory
Attentional
Goal Atoms
Long-Term and
Working Memory
Intentional
Feeling Atoms
Long-Term and
Working Memory
spanning Declarative, Intentional and
Soma r i ttt otor
OpenPsi
High-Level Mind
Architecture
Motivation / Action Selection
OpenPsi
Working Memory
Action Selection
Pattern Miner
High-Level Mind
Architecture
arrows between perception and
working and long-term memory
Pattern Miner
Working Memory
arrows between sensory memory and
perceptual associative and transient
episodic memory
arrows between action selection and
EFTA00623897
6 A Brief Overview of CogPrime
• ha.
local
PitsAy
COMM.
I
yc too, moon° you.
ted hail Dem to the tree?
Ido): Yei
rsallYI: Is the bone next to the tounlabi?
likM: No
ItsIYI: What h the solos ol the ball?
[VIEW): 'he ball It ted
ISaltYl: What Is next to the tree?
liklol: The ted ball Is next in the nee
as
Fig. 6.6: Screenshot of OpenCog-controlled virtual dog
Fig. 6.7: Relationship Between Multiple Memory Types. The bottom left corner shows
a program tree, constituting procedural knowledge. The upper left shows declarative nodes and
links in the Atomspace. The upper right corner shows a relevant system goal. The lower right
corner contains an image symbolizing relevant episodic and sensory knowledge. All the various
types of knowledge link to each other and can be approximatively converted to each other.
EFTA00623898
6.7 Conclusion
123
Memory Type
Specific Cognitive Processes
General Cognitive
Functions
Declarative
Probabilistic Logic Networks (PLN)
IGMIIIOSI; conceptual blending
IV1 021
pattern creation
Procedural
MOSES (a novel probabilistic
evolutionary program learning
algorithm) [1.<,0061
pattern creation
Episodic
internal simulation engine V:1,.. N.01
ation, pattern
assoc i
creation
on
Attentional
Economic Attention Networks
ECAN
:1'1' In
association, credit
assignment
Intentional
probabilistic goal hierarchy refined by
PLN and ECAN, structured
according to MicroPsi Phi( MI
credit assignment,
pattern creation
Sensory
In CogBot, this will be supplied by
the DeSTIN component
association, attention
allocation, pattern
creation, credit
assignment
Table 6.2: Memory Types and Cognitive Processes in CogPrime. The third colmm indicates
the general cognitive function that each specific cognitive process carries out, according to the
patternist theory of cognition.
EFTA00623899
124
6 A Brief Overview of CogPrime
We'd Mode
Fig. 6.8: Example of Explicit Knowledge in the Atomspace. One simple example of
explicitly represented knowledge in the Atomspace is linguistic knowledge, such as words and
the concepts directly linked to them. Not all of a CogPrime system's concepts correlate to
words, but some do.
EFTA00623900
6.7 Conclusion
125
<Nato Map
Fig. 6.9: Example of Implicit Knowledge in the Atomspace. A simple example of implicit
knowledge in the Atomspace. The "chicken" and "food" concepts are represented by "maps"
of ConceptNodes interconnected by HebbianLinks, where the latter tend to form between Con-
ceptNodes that are often simultaneously important. The bundle of links between nodes in the
chicken map and nodes in the food map. represents an "implicit, emergent link" between the
two concept maps. This diagram also illustrates "glocal" knowledge representation, in that the
chicken and food concepts are each represented by individual nodes, but also by distributed
maps. The "chicken" ConceptNode, when important, will tend to make the rest of the map
important - and vice versa. Part of the overall chicken concept possessed by the system is ex-
pressed by the explicit links coming out of the chicken ConceptNode, and part is represented
only by the distributed chicken map as a whole.
EFTA00623901
EFTA00623902
Section II
Toward a General Theory of General Intelligence
EFTA00623903
EFTA00623904
Chapter 7
A Formal Model of Intelligent Agents
7.1 Introduction
The artificial intelligence field is full of sophisticated mathematical models and equations, but
most of these are highly specialized in nature - e.g. formalizations of particular logic systems,
analyzes of the dynamics of specific sorts of neural nets, etc. On the other hand, a number of
highly general models of intelligent systems also exist, including Hutter's recent formalization
of universal intelligence illuM51 and a large body of work in the disciplines of systems science
and cybernetics - but these have tended not to yield many specific lessons useful for engineering
AGI systems, serving more as conceptual models in mathematical form.
It would be fantastic to have a mathematical theory bridging these extremes - a real "general
theory of general intelligence," allowing the derivation and analysis of specific structures and
processes playing a role in practical AGI systems, from broad mathematical models of general
intelligence in various situations and under various constraints. However, the path to such a
theory is not entirely clear at present; and, as valuable as such a theory would be, we don't
believe such a thing to be necessary for creating advanced AGI. One possibility is that the
development of such a theory will occur contemporaneously and synergetically with the advent
of practical AGI technology.
Lacking a mature, pragmatically useful "general theory of general intelligence," however, we
have still found it valuable to articulate certain theoretical ideas about the nature of general
intelligence, with a level of rigor a bit greater than the wholly informal discussions of the previous
chapters. The chapters in this section of the book articulate some ideas we have developed in
pursuit of a general theory of general intelligence; ideas that, even in their current relatively
undeveloped form, have been very helpful in guiding our concrete work on the CogPrime design.
This chapter presents a more formal version of the notion of intelligence as "achieving complex
goals in complex environments," based on a formal model of intelligent agents. These formal-
izations of agents and intelligence will be used in later chapters as a foundation for formalizing
other concepts like inference and cognitive synergy. Chapters 8 and 9 pursue the notion of cog-
nitive synergy a little more thoroughly than was done in previous chapters. Chapter 10 sketches
a general theory of general intelligence using tools from category theory — not bringing it to the
level where one can use it to derive specific AGI algorithms and structures; but still, presenting
ideas that will be helpful in interpreting and explaining specific aspects of the CogPrime design
in Part 2. Finally, Appendix ?? explores an additional theoretical direction, in which the mind
of an intelligent system Ls viewed in terms of certain curved spaces - a novel way of thinking
129
EFTA00623905
130
7 A Formal Model of Intelligent Agents
about the dynamics of general intelligence, which has been useful in guiding development of the
ECAN component of CogPrime, and we expect will have more general value in future.
Despite the intermittent use of mathematical formalism, the ideas presented in this section
are fairly speculative, and we do not propose them as constituting a well-demonstrated theory
of general intelligence. Rather, we propose them as an interesting way of thinking about general
intelligence, which appears to be consistent with available data, and which has proved inspira-
tional to us in conceiving concrete structures and dynamics for AGL as manifested for example
in the CogPrime design. Understanding the way of thinking described in these chapters is valu-
able for understanding why the CogPrime design is the way it is, and for relating CogPrime to
other practical and intellectual systems, and extending and improving CogPrime.
7.2 A Simple Formal Agents Model (SRAM)
We now present a formalization of the concept of "intelligent agents" - beginning with a for-
malization of "agents" in general.
Drawing on Ilitit05, LHO7aJ, we consider a class of active agents which observe and explore
their environment and also take actions in it, which may affect the environment. Formally,
the agent sends information to the environment by sending symbols from some finite alphabet
called the action space E; and the environment sends signals to the agent with symbols from
an alphabet called the perception space, denoted P. Agents can also experience rewards, which
lie in the reward space, denoted R, which for each agent is a subset of the rational unit interval.
The agent and environment are understood to take turns sending signals back and forth,
yielding a history, of actions, observations and rewards, which may be denoted
aioiria2o2r2...
or else
auxua2x2...
if x is introduced as a single symbol to denote both an observation and a reward. The
complete interaction history up to and including cycle t is denoted axid; and the history before
cycle t is denoted ax<t =
The agent is represented as a function tr which takes the current history as input, and pro-
duces an action as output. Agents need not be deterministic, an agent may for instance induce a
probability distribution over the space of possible actions, conditioned on the current history. In
this case we may characterize the agent by a probability distribution a(arlax<t)• Similarly, the
environment may be characterized by a probability distribution p(xklax<kak). Taken together,
the distributions r and p define a probability measure over the space of interaction sequences.
Next, we extend this model in a few ways, intended to make it better reflect the realities of
intelligent computational agents. The first modification is to allow agents to maintain memories
(of finite size), via adding memory actions drawn from a set M into the history of actions,
observations and rewards. The second modification is to introduce the notion of goals.
EFTA00623906
7.2 A Simple FOrmal Agents Model (SRAM)
131
7.2.1 Goals
We define goals as mathematical functions (to be specified below) associated with symbols
drawn from the alphabet g; and we consider the environment as sending goal-symbols to the
agent along with regular observation-symbols. (Note however that the presentation of a goal-
symbol to an agent does not necessarily entail the explicit communication to the agent of the
contents of the goal function. This must be provided by other, correlated observations.) We also
introduce a conditional distribution l(g, µ) that gives the weight of a goal g in the context of
a particular environment it.
In this extended framework, an interaction sequence looks like
atolgina20292r2...
or else
yia23/2•••
where gi are symbols corresponding to goals, and y is introduced as a single symbol to denote
the combination of an observation, a reward and a goal.
Each goal function maps each finite interaction sequence
ays,t with 9, to gt corre-
sponding to g, into a value rg (4,84 E [0, II indicating the value or "raw reward" of achieving
the goal during that interaction sequence. The total reward ri obtained by the agent is the sum
of the raw rewards obtained at time t from all goals whose symbols occur in the agent's history,
before t.
This formalism of goal-seeking agents allows us to formalize the notion of intelligence as
"achieving complex goals in complex environments" - a direction that is pursued in Section 7.3
below.
Note that this is an external perspective of system goals, which is natural from the perspective
of formally defining system intelligence in terms of system behavior, but is not necessarily very
natural in terms of system design. From the point of view of AGI design, one is generally more
concerned with the (implicit or explicit) representation of goals inside an AGI system, as in
CogPrime's Goal Atoms to be reviewed in Chapter 22 below.
Further, it is important to also consider the case where an AGI system has no explicit goals,
and the system's environment has no immediately identifiable goals either. But in this case, we
don't see any clear way to define a system's intelligence, except via approximating the system in
terms of other theoretical systems which do have explicit goals. This approximation approach
is developed in Section 7.3.5 below.
The awkwardness of linking the general formalism of intelligence theory presented here, with
the practical business of creating and designing AGI systems, may indicate a shortcoming on
the part of contemporary intelligence theory or AGI designs. On the other hand, this sort of
situation often occurs in other domains as well - e.g. the leap from quantum theory, to the
analysis of real-world systems like organic molecules involves a lot of awkwardness and large
leaps a well.
EFTA00623907
132
7 A Formal Model of Intelligent Agents
7.2.2 Memory Stores
As well as goals, we introduce into the model a long-term memory and a workspace. Regarding
long-term memory we assume the agent's memory consists of multiple memory stores corre-
sponding to various types of memory, e.g.: procedural (Kproc), declarative (A-Dec), episodic
(Ksj,), attentional (KAII) and Intentional (King). In Appendix ?? a category-theoretic model
of these memory stores is introduced; but for the moment, we need only assume the existence
of
• an injective mapping 9s : ICDF, i
71 where 7i is the space of fuzzy sets of subhistories
(subhistories being "episodes" in this formalism)
• an injective mapping eproc IfProcxMxW
A, where M is the set of memory states,
W is the set of (observation, goal, reward) triples, and A is the set of actions (this maps
each procedure object into a function that enacts actions in the environment or memory,
based on the memory state and current world-state)
• an injective mapping ebe, : ICD"
C, where G is the set of expressions in some formal lan-
guage (which may for example be a logical language), which possesses words corresponding
to the observations, goals, reward values and actions in our agent formalism
• an injective mapping ern, :
Q, where g is the space of goals mentioned above
• an injective mapping Bette : Kta1 U C Dp U Kproo U KEc -> V, where V is the space of
"attention values" (structures that gauge the importance of paying attention to an item of
knowledge over various time-scales or in various contexts)
We also assume that the vocabulary of actions contains memory-actions corresponding to the
operations of inserting the current observation, goal, reward or action into the episodic and/or
declarative memory store. And, we assume that the activity of the agent, at each time-step,
includes the enaction of one or more of the procedures in the procedural memory, store. If several
procedures are enacted at once, then the end result is still formally modeled as a single action
a = a111 *
* rein where * is an operator on action-space that composes multiple actions into a
single one.
Finally, we assume that, at each time-step, the agent may carry out an external action ai
on the environment, a memory action nut on the (long-term) memory, and an action bi on its
internal workspace. Among the actions that can be carried out on the workspace, are the
ability to insert or delete observations, goals, actions or reward-values from the workspace.
The workspace can be thought of as a sort of short-term memory or else in terms of Baars'
"global workspace" concept mentioned above. The workspace provides a medium for interaction
between the different memory types.
The workspace provides a mechanism by which declarative, episodic and procedural memory
may interact with each other. For this mechanism to work, we must assume that there are
actions corresponding to query operations that allow procedures to look into declarative and
episodic memory. The nature of these query operations will vary, among different agents, but
we can assume that in general an agent has
• one or more procedures Qpec(x) serving as declarative queries, meaning that when QD" is
enacted on some x that is an ordered set of items in the workspace, the result is that one
or more items from declarative memory is entered into the workspace
• one or more procedures QEp(x) serving as episodic queries, meaning that when Qsp is
enacted on some x that is an ordered set of items in the workspace, the result is that one
or more items from episodic memory is entered into the workspace
EFTA00623908
7.2 A Simple FOrmel Agents Model (SRAM)
133
One additional aspect of CogPrime's knowledge representation that is important to PLN is
the attachment of nonnegative weights 9nt corresponding to elementary observations
These
weights denote the amount of evidence contained in the observation. For instance, in the context
of a robotic agent, one could use these values to encode the assumption that an elementary, visual
observation has more evidential value than an elementary olfactory observation.
We now have a model of an agent with long-term memory comprising procedural, declarative
and episodic aspects, an internal cognitive workspace, and the capability to use procedures to
drive actions based on items in memory and the workspace, and to move items between long-
term memory and the workspace.
7.2.2.1 Modeling CogPrime
Of course, this formal model may be realized differently in various real-world AGI systems. In
CogPrime we have
• a weighted, labeled hypergraph structure called the AtomSpace used to store declarative
knowledge (this is the representation used by PLN)
• a collection of programs in a LISP-like language called Combo, stored in a ProcedureRepos-
itory data structure, used to store procedural knowledge
• a collection of partial "movies" of the system's experience, played back using an internal
simulation engine, used to store episodic knowledge
• AttentionValue objects, minimally containing ShortTermlmportance (STI) and LongTer-
mImportance (LTI) values used to store attentional knowledge
• Goal Atoms for intentional knowledge, stored in the same format as declarative knowledge
but whose dynamics involve a special form of artificial currency that is used to govern action
selection
The AtomSpace is the central repcsitory, and procedures and episodes are linked to Atoms
in the AtomSpace which serve as their symbolic representatives. The "workspace" in CogPrime
exists only virtually: each item in the AtomSpace has a "short term importance" (STI) level, and
the workspace consists of those items in the AtomSpace with highest STI, and those procedures
and episodes whose symbolic representatives in the AtomSpace have highest STI.
On the other hand, as we saw above, the LIDA architecture uses separate representations for
procedural, declarative and episodic memory, but also has an explicit workspace component,
where the mast currently contextually relevant items from all different types of memory are
gathered and used together in the course of actions. However, compared to CogPrime, it lacks
comparably fine-grained methods for integrating the different types of memory.
Systematically mapping various existing cognitive architectures, or human brain structure,
into this formal agents model would be a substantial though quite plausible exercise; but we
will not undertake this here.
7.2.3 The Cognitive Schematic
Next we introduce an additional specialization into SRAM: the cognitive schematic, written
informally as
EFTA00623909
134
7 A Formal Model of Intelligent Agents
Context & Procedure -, Goal
and considered more formally as holds(C) & ex(P)—> hi where h may be an externally specified
goal gi or an internally specified goal h derived as a (possibly uncertain) subgoal of one of more
th; C is a piece of declarative or episodic knowledge and P is a procedure that the agent can
internally execute to generate a series of actions. ex(P) is the proposition that P is successfully
executed. If C is episodic then holds(C) may be interpreted as the current context (i.e. some
finite slice of the agent's history) being similar to C; if C is declarative then holds(C) may be
interpreted as the truth value of C evaluated at the current context. Note that C may refer to
some part of the world quite distant from the agent's current sensory observations; but it may
still be formally evaluated based on the agent's history.
In the standard CogPrinte notation as introduced formally in Chapter 20 (where indentation
has function-argument syntax similar to that in Python, and relationship types are prepended
to their relata without parentheses), for the case C is declarative this would be written as
PredictiveExtensionalImplication
AND
C
Execution P
G
and in the case C is episodic one replaces C in this formula with a predicate expressing C's
similarity to the current context. The semantics of the PredictiveExtensionallnheritance relation
will be discussed below. The Execution relation simply denotes the proposition that procedure
P has been executed.
For the class of SRAM agents who (like CogPrime) use the cognitive schematic to govern
many or all of their actions, a significant fragment of agent intelligence boils down to estimating
the truth values of PredictiveExtensionalImplication relationships. Action selection procedures
can be used, which choose procedures to enact based on which ones are judged most likely
to achieve the current external goals th in the current context. Rather than enter into the
particularities of action selection or other cognitive architecture issues, we will restrict ourselves
to PLN inference, which in the context of the present agent model is a method for handling
Predictivelmplication in the cognitive schematic.
Consider an agent in a virtual world, such as a virtual dog, one of whose external goals is to
please its owner. Suppose its owner has asked it to find a cat, and it can translate this into a
subgoal "find cat?' If the agent operates according to the cognitive schematic, it will search for
P so that
PredictiveExtensionalImplication
AND
C
Execution P
Evaluation
found
cat
holds.
EFTA00623910
7.3 Toward a Formal Characterization of Real-World Ceneral Intelligence
135
7.3 Toward a Formal Characterization of Real-World General
Intelligence
Having defined what we mean by an agent acting in an environment, we now turn to the
question of what it means for such an agent to be "intelligent."
As we have reviewed extensively in Chapter 2 above, "intelligence" is a commonsense, "folk
psychology" concept, with all the imprecision and contextuality that this generally entails.
One cannot expect any compact, elegant formalism to capture all of its meanings. Even in
the psychology and AI research communities, divergent definitions abound; Legg and Hater
ILI I07al lists and organizes 70+ definitions from the literature.
Practical study of natural intelligence in humans and other organisms, and practical de-
sign, creation and instruction of artificial intelligences, can proceed perfectly well without an
agreed-upon formalization of the "intelligence- concept. Some researchers may conceive their
own formalisms to guide their own work, others may feel no need for any such thing.
But nevertheless, it is of interest to seek formalizations of the concept of intelligence, which
capture useful fragments of the commonsense notion of intelligence, and provide guidance for
practical research in cognitive science and AI. A number of such formalizations have been given
in recent decades, with varying degrees of mathematical rigor. Perhaps the most carefully-
wrought formalization of intelligence so far is the theory of "universal intelligence" presented by
Shane Legg and Marcus Hater in ILI I0714, which draws on ideas from algorithmic information
theory.
Universal intelligence captures a certain aspect of the "intelligence" concept very well, and
has the advantage of connecting closely with ideas in learning theory, decision theory, and
computation theory. However, the kind of general intelligence it captures best, is a kind which
is in a sense more general in scope than human-style general intelligence. Universal intelligence
does capture the sense in which humans are more intelligent than worms, which are more
intelligent than rocks; and the sense in which theoretical AGI systems like Hater's AIXI or
A/X/a
Nall would be much more intelligent than humans. But it misses essential aspects
of the intelligence concept as it is used in the context of intelligent natural systems like humans
or real-world Al systems.
Our main goal in this section is to present variants of universal intelligence that better
capture the notion of intelligence as it is typically understood in the context of real-world
natural and artificial systems. The first variant we describe is pragmatic general intelligence,
which is inspired by the intuitive notion of intelligence as "the ability to achieve complex goals
in complex environments." given in IGoentl. After assuming a prior distribution over the
space of possible environments, and one over the space of possible goals, one then defines the
pragmatic general intelligence as the expected level of goal-achievement of a system relative
to these distributions. Rather than measuring truly broad mathematical general intelligence,
pragmatic general intelligence measures intelligence in a way that's specifically biased toward
certain environments and goals.
Another variant definition is then presented, the efficient pragmatic general intelligence,
which takes into account the amount of computational resources utilized by the system in
achieving its intelligence. Some argue that making efficient use of available resources is a defining
characteristic of intelligence, see e.g. [Wanthil.
A critical question left open is the characterization of the prior distributions corresponding
to everyday human reality; we give a semi-formal sketch of some ideas on this in Chapter 9
below, where we present the notion of a "communication prior," which assigns a probability
EFTA00623911
136
7 A Formal Model of Intelligent Agents
weight to a situation S based on the ease with which one agent in a society can communicate
S to another agent in that society, using multimodal communication (including verbalization,
demonstration, dramatic and pictorial depiction, etc.).
Finally, we present a formal measure of the "generality" of an intelligence, which precisiates
the informal distinction between "general Al" and "narrow AL"
7.3.1 Biased Universal Intelligence
To define universal intelligence, Legg and Hutter consider the class of environments that are
reward-summable, meaning that the total amount of reward they return to any agent is bounded
by 1. Where r1 denotes the reward experienced by the agent from the environment at time
the expected total reward for the agent ir from the environment it is defined as
V' =- E(Er i) ≤ 1
To extend their definition in the direction of greater realism, we first introduce a second-order
probability distribution v, which is a probability distribution over the space of environments
it. The distribution v assigns each environment a probability. One such distribution v is the
Solomonoff-Levin universal distribution in which one sets v = 2-1(10); but this is not the only
distribution v of interest. In fact a great deal of real-world general intelligence consists of the
adaptation of intelligent systems to particular distributions v over environment-space, differing
from the universal distribution.
We then define
Definition 4 The biased universal intelligence of an agent n is its expected performance
with respect to the distribution v over the space of all computable reward-summable environ-
ments, E, that is,
TOO a L v(Ii)1';
PEE
Legg and Hutter's universal intelligence is obtained by setting v equal to the universal
distribution.
This framework is more flexible than it might seem. E.g. suppose one wants to incorporate
agents that die. Then one may create a special action, say a665, corresponding to the state of
death, to create agents that
• in certain circumstances output action 0666
• have the property that if their previous action was a€66, then all of their subsequent actions
must be a666
and to define a reward structure so that actions 0666 always bring zero reward. It then follows
that death is generally a bad thing if one wants to maximize intelligence. Agents that die will
not get rewarded after they're dead; and agents that live only 70 years, say, will be restricted
from getting rewards involving long-term patterns and will hence have specific limits on their
intelligence.
EFTA00623912
7.3 Toward a Formal Characterization of Real-World General Intelligence
137
7.3.2 Connecting Legg and Hutter's Model of Intelligent Agents to
the Real World
A notable aspect of the Legg and Hutter formalism is the separation of the reward mechanism
from the cognitive mechanisms of the agent. While commonplace in the reinforcement learning
literature, this seems rcychologically unrealistic in the context of biological intelligences and
many types of machine intelligences. Not all human intelligent activity is specifically reward-
seeking in nature; and even when it is, humans often pursue complexly constructed rewards,
that are defined in terms of their own cognitions rather than separately given. Suppose a certain
human's goals are true love, or world peace, and the proving of interesting theorems - then these
goals are defined by the human herself, and only she knows if she's achieved them. An externally-
provided reward signal doesn't capture the nature of this kind of goal-seeking behavior, which
characterizes much human goal-seeking activity (and will presumably characterize much of the
goal-seeking activity of advanced engineered intelligences also) ... let alone human behavior that
is spontaneous and unrelated to explicit goals, yet may still appear commonsensically intelligent.
One could seek to bypass this complaint about the reward mechanisms via a sort of "neo-
Freudian" argument, via
• associating the reward signal, not with the "external environment" as typically conceived,
but rather with a portion of the intelligent agent's brain that is separate from the cognitive
component
• viewing complex goals like true love, world peace and proving interesting theorems as in-
direct ways of achieving the agent's "basic goals", created within the agent's memory via
subgoaling mechanisms
but it seems to us that a general formalization of intelligence should not rely on such strong
assumptions about agents' cognitive architectures. So below, after introducing the pragmatic
and efficient pragmatic general intelligence measures, we will propose an alternate interpreta-
tion wherein the mechanism of external rewards is viewed as a theoretical test framework for
assessing agent intelligence, rather than a hypothesis about intelligent agent architecture.
In this alternate interpretation, formal measures like the universal, pragmatic and efficient
pragmatic general intelligence are viewed as not directly applicable to real-world intelligences,
because they involve the behaviors of agents over a wide variety of goals and environments,
whereas in real life the opportunities to observe agents are more limited. However, they are
viewed as being indirectly applicable to real-world agents, in the sense that an external intelli-
gence can observe an agent's real-world behavior and then infer its likely intelligence according
to these measures.
In a sense, this interpretation makes our formalized measures of intelligence the opposite of
real-world IQ tests. An IQ test is a quantified, formalized test which is designed to approxi-
mately predict the informal, qualitative achievement of humans in real life. On the other hand,
the formal definitions of intelligence we present here are quantified, formalized tests that are
designed to capture abstract notions of intelligence, but which can be approximately evaluated
on a real-world intelligent system by observing what it does in real life.
EFTA00623913
138
7 A Formal Model of Intelligent Agents
7.3.5 Pragmatic General Intelligence
The above concept of biased universal intelligence is perfectly adequate for many purposes, but
it is also interesting to explicitly introduce the notion of a goal into the calculation. This allows
us to formally capture the notion presented in IGoe93al of intelligence as "the ability to achieve
complex goals in complex environments."
If the agent is acting in environment µ, and is provided with g, corresponding to g at the
start and the end of the time-interval T = {i E (s,...,t)}, then the expected goal-achievement
of the agent, relative to g, during the interval is the expectation
E(
rg(/g.„.;))
J.,
where the expectation is taken over all interaction sequences 4,,., drawn according to au. We
then propose
Definition 5 The pragmatic general intelligence of an agent sr, relative to the distribution
v over environments and the distribution 7 over goals, is its expected performance with respect
to goals drawn from 7 in environments drawn from v, over the time-scales natural to the goals;
that is,
H(R)— E voimg,
IsEE,gECT
(in those cases where this stun is convergent).
This definition formally captures the notion that "intelligence is achieving complex goals in
complex environments," where "complexity" is gauged by the assumed measures v and y.
If v is taken to be the universal distribution, and 7 is defined to weight goals according to
the universal distribution, then pragmatic general intelligence reduces to universal intelligence.
Furthermore, it is clear that a universal algorithmic agent like AIM illut051 would also
have a high pragmatic general intelligence, under fairly broad conditions. As the interaction
history grows longer, the pragmatic general intelligence of AIXI would approach the theoretical
maximum; as AIXI would implicitly infer the relevant distributions via experience. However,
if significant reward discounting is involved, so that near-tenn rewards are weighted much
higher than long-term rewards, then AIXI might compare very unfavorably in pragmatic general
intelligence, to other agents designed with prior knowledge of u, y and r in mind.
The most interesting case to consider is where v and y are taken to embody some particular
bias in a real-world space of environments and goals, and this bias is appropriately reflected
in the internal structure of an intelligent agent. Note that an agent needs not lack universal
intelligence in order to possess pragmatic general intelligence with respect to some non-universal
distribution over goals and environments. However, in general, given limited resources, there
may be a tradeoff between universal intelligence and pragmatic intelligence. Which leads to the
next point: how to encompass resource limitations into the definition.
One might argue that the definition of Pragmatic General Intelligence is already encompassed
by Legg and Hutter's definition because one may bias the distribution of environments within
the latter by considering different Turing machines underlying the Kohnogorov complexity.
However this is not a general equivalence because the Solomonoff-Levin measure intrinsically
EFTA00623914
7.3 Toward a Formal Characterization of Real-World Ceneral Intelligence
139
decays exponentially, whereas an assumptive distribution over environments might decay at
some other rate. This issue seems to merit further mathematical investigation.
7.3.4 Incorporating Computational Cost
Let 11,,.0,5,2 be a probability distribution describing the amount of computational resources con-
sumed by an agent w while achieving goal g over time-scale T. This is a probability distribution
because we want to account for the possibility of nondeterministic agents. So. qw,i,,g,T(Q) tells
the probability that Q units of resources are consumed. For simplicity we amalgamate space
and time resources, energetic resources, etc. into a single number Q, which is assumed to live
in some subset of the positive reals. Space resources of course have to do with the size of the
system's memory. Then we may define
Definition 6 The efficient pragmatic general intelligence of an agent sr with resource
consumption 11„,0.9,T, relative to the distribution v over environments and the distribution 7
over gods, is its expected performance with respect to goals drawn fmm 7 in environments drawn
from v, over the time-scales natural to the goals, normalized by the amount of computational
effort expended to achieve each goal; that is,
HEffor)
E
voL)7(9, t)t/..„,,,T(Q) v.
P,LT
PEE•gEga.T
(in those cases where this sum is convergent).
This is a measure that rates an agent's intelligence higher if it uses fewer computational
resources to do its business. Roughly, it measures reward achieved per spacetime computation
unit.
Note that, by abandoning the universal prior, we have also abandoned the proof of conver-
gence that comes with it. In general the sums in the above definitions need not converge; and
exploration of the conditions under which they do converge is a complex matter.
7.3.5 Assessing the Intelligence of Real-World Agents
The pragmatic and efficient pragmatic general intelligence measures are more "realistic" than
the Legg and Rutter universal intelligence measure, in that they take into account the innate
biasing and computational resource restrictions that characterize real-world intelligence. But as
discussed earlier, they still live in "fantasy-land" to an extent - they gauge the intelligence of an
agent via a weighted average over a wide variety of goals and environments; and they presume
a simplistic relationship between agents and rewards that does not reflect the complexities
of real-world cognitive architectures. It is not obvious from the foregoing how to apply these
measures to real-world intelligent systems, which lack the ability to exist in such a wide variety
of environments within their often brief lifespans, and mostly go about their lives doing things
other than pursuing quantified external rewards. In this brief section we describe an approach
to bridging this gap. The treatment is left semi-formal in places.
EFTA00623915
140
7 A Formal Model of Intelligent Agents
We suggest to view the definitions of pragmatic and efficient pragmatic general intelligence
in terms of a "possible worlds" semantics - i.e. to view them as asking, counterfactually, how
an agent would perform, hypothetically, on a series of tests (the tests being goals, defined in
relation to environments and reward signals).
Real-world intelligent agents don't normally operate in terms of explicit goals and rewards;
these are abstractions that we use to think about intelligent agents. However, this is no objection
to characterizing various sorts of intelligence in terms of counterfactuals like: how would system
S operate if it were trying to achieve this or that goal, in this or that environment, in order to
seek reward? We can characterize various sorts of intelligence in terms of how it can be inferred
an agent would perform on certain tests, even though the agent's real life does not consist of
taking these tests.
This conceptual approach may seem a bit artificial but we don't currently see a better
alternative, if one wishes to quantitatively gauge intelligence (which is, in a sense, an "artificial"
thing to do in the first place). Given a real-world agent X and a mandate to assess its intelligence,
the obvious alternative to looking at possible worlds in the manner of the above definitions,
is just looking directly at the properties of the things X has achieved in the real world during
its lifespan. But this isn't an easy solution, because it doesn't disambiguate which aspects of
X's achievements were due to its own actions versus due to the rest of the world that X was
interacting with when it made its achievements. To distinguish the amount of achievement that
X "caused" via its own actions requires a model of causality, which is a complex can of worms in
itself; and, critically, the standard models of causality also involve counterfactuals (asking "what
would have been achieved in this situation if the agent X hadn't been there", etc.) INIWO7].
Regardless of the particulars, it seems impassible to avoid counterfactual realities in assessing
intelligence.
The approach we suggest - given a real-world agent X with a history of actions in a particular
world, and a mandate to assess its intelligence - is to introduce an additional player, an inference
agent 8, into the picture. The agent 77 modeled above is then viewed as TX: the model of X that
constructs, in order to explore X's inferred behaviors in various counterfactual environments.
In the test situations embodied in the definitions of pragmatic and efficient pragmatic general
intelligence, the environment gives srx rewards, based on specifically configured goals. In X's
real life, the relation between goals, rewards and actions will generally be significantly subtler
and perhaps quite different.
We model the real world similarly to the "fantasy world" of the previous section, but with
the omission of goals and rewards. We define a naturalistic context as one in which all goals and
rewards are constant, i.e. th = go and ri = rO for all i. This is just a mathematical convention
for stating that there are no precisely-defined external goals and rewards for the agent. In a
naturalistic context, we then have a situation where agents create actions based on the past
history of actions and perceptions, and if there is any relevant notion of reward or goal, it
is within the cognitive mechanism of some agent. A naturalistic agent X is then an agent sr
which is restricted to one particular naturalistic context, involving one particular environment
p (formally, we may achieve this within the framework of agents described above via dictating
that X issues constant "null actions" aO in all environments except p).
Next, we posit a metric space (Er, d) of naturalistic agents defined on a naturalistic context
involving environment au, and a subspace ,a E El, of inference agents, which are naturalistic
agents that output predictions of other agents' behaviors (a notion we will not fully formalize
here). If agents are represented as program trees, then d may be taken as edit distance on tree
space 113i1051. Then, for each agent d E 4, we may assess
EFTA00623916
7.4 Intellectual Breadth: Quantifying the Generality of an Agent's Intelligence
141
• the prior probability 0(8) according to some assumed distribution
• the effectiveness p(8, X) of 8 at predicting the actions of an agent X E EN
We may then define
Definition 7 The inference ability of the agent 6, relative to It and X, is
EYE £ SIM(XI Y)P( 5, Y)
%,x(6) = e(6)
•
Ey€E.1, nm(x, Y)
where sim is a specified decreasing function of d(X,Y), such as sim(X,Y) —
l
i÷d(x.y).
To construct 7rx, we may then use the model of X created by the agent 6 E d with the
highest inference ability relative to it and X (using some specified ordering, in case of a tie).
Having constructed 7rx, we can then say that
Definition 8 The inferred pragmatic general intelligence (relative to v and 'y) of a naturalistic
agent X defined relative to an environment µ, is defined as the pragmatic general intelligence
of the model wx of X produced by the agent 6 E d with maximal inference ability relative to µ
(and in the case of a tie, the first of these in the ordering defined over 4). The inferred efficient
pragmatic general intelligence of X relative to p is defined similarly.
This provides a precise characterization of the pragmatic and efficient pragmatic intelligence
of real-world systems, based on their observed behaviors. It's a bit messy; but the real world
tends to be like that.
7.4 Intellectual Breadth: Quantifying the Generality of an Agent's
Intelligence
We turn now to a related question: How can one quantify the degree of generality that an
intelligent agent possesses? Above we have discussed the qualitative distinction between AGI
and "Narrow AI", and intelligence as we have formalized it above is specifically intended as
a measure of general intelligence. But quantifying intelligence is different than quantifying
generality versus narrowness.
To make the discussion simpler, we introduce the term "context" as a shorthand for "envi-
ronment/interval triple (p, g , T)." Given a context (p, g, T), and a set E of agents, one may
construct a fuzzy set Agi" gr gathering those agents that are intelligent relative to the context;
and given a set of contexts, one may also define a fuzzy set Con. gathering those contexts with
respect to which a given agent if is intelligent. The relevant formulas are:
1 n
1/0.9,TMVpx,g,r
(71) = Xcan., (it, T) = N 2_,
Q
where N = N (p, g,T) is a normalization factor defined appropriately, e.g. via N (p, g,T) =
max Vil a .
One could make similar definitions leaving out the computational cost factor Q, but we
suspect that incorporating Q is a more promising direction. We then propose
EFTA00623917
142
7 A Formal Model of Intelligent Agents
Definition 9 The intellectual breadth of an agent sr, relative to the distribution v over
environments and the distribution ry over goals, is
11(xtc„„ 0;9, T))
where H is the entropy and
u0s)7(9, ii)xcon„ 02, g, n
xttn. (t, T)
E
voichs,pcaconw(1t.,90,T.)
Gla,geti
is the probability distribution formed by normalizing the fuzzy set xa,n,(a, 9, T).
A similar definition of the intellectual breadth of a context (µ, g, T), relative to the distri-
bution a over agents, may be posited. A weakness of these definitions is that they don't try to
account for dependencies between agents or contexts; perhaps more refined formulations may
be developed that account explicitly for these dependencies.
Note that the intellectual breadth of an agent as defined here is largely independent of
the (efficient or not) pragmatic general intelligence of that agent. One could have a rather
(efficiently or not) pragmatically generally intelligent system with little breadth: this would be
a system very good at solving a fair number of hard problems, yet wholly incompetent on a
larger number of hard problems. On the other hand, one could also have a terribly (efficiently or
not) pragmatically generally stupid system with great intellectual breadth: i.e a system roughly
equally dumb in all contexts!
Thus, one can characterize an intelligent agent as "narrow" with respect to distribution v over
environments and the distribution 7 over goals, based on evaluating it as having low intellectual
breadth. A "narrow AI" relative to v and 7 would then be an AI agent with a relatively high
efficient pragmatic general intelligence but a relatively low intellectual breadth.
7.5 Conclusion
Our main goal in this chapter has been to push the formal understanding of intelligence in a more
pragmatic direction. Much more work remains to be done, e.g. in specifying the environment,
goal and efficiency distributions relevant to real-world systems, but we believe that the ideas
presented here constitute nontrivial progress.
If the line of research suggested in this chapter succeeds, then eventually, one will be able to
do AGI research as follows: Specify an AGI architecture formally, and then use the mathematics
of general intelligence to derive interesting results about the environments, goals and hardware
platforms relative to which the AGI architecture will display significant pragmatic or efficient
pragmatic general intelligence, and intellectual breadth. The remaining chapters in this section
present further ideas regarding how to work toward this goal. For the time being, such a mode
of AGI research remains mainly for the future, but we have still found the formalism given in
these chapters useful for formulating and clarifying various aspects of the CogPrime design as
will be presented in later chapters.
EFTA00623918
Chapter 8
Cognitive Synergy
8.1 Cognitive Synergy
As we have seen, the formal theory of general intelligence, in its current form, doesn't really
tell us much that's of use for creating real-world AGI systems. It tells us that creating extraor-
dinarily powerful general intelligence is almost trivial if one has unrealistically huge amounts
of computational resources; and that creating moderately powerful general intelligence using
feasible computational resources is all about creating AI algorithms and data structures that
(explicitly or implicitly) match the restrictions implied by a certain class of situations, to which
the general intelligence is biased.
We've also described, in various previous chapters, some non-rigorous, conceptual principles
that seem to explain key aspects of feasible general intelligence: the complementary reliance on
evolution and autopoiesis, the superposition of hierarchical and heterarchical structures, and so
forth. These principles can be considered as broad strategies for achieving general intelligence
in certain broad classes of situations. Although, a lot of research needs to be done to figure out
nice ways to describe, for instance, in what class of situations evolution is an effective learning
strategy, in what class of situations dual hierarchical/heterarchical structure is an effective way
to organize memory, etc.
In this chapter we'll dig deeper into one of the "general principle of feasible general intel-
ligences" briefly alluded to earlier: the cognitive synergy principle, which is both a conceptual
hypothesis about the structure of generally intelligent systems in certain classes of environments,
and a design principle used to guide the architecting of CogPrime.
We will focus here on cognitive synergy specifically in the case of "multi-memory systems,"
which we define as intelligent systems (like CogPrime) whose combination of environment,
embodiment and motivational systems make it important for them to possess memories that
divide into partially but not wholly distinct components corresponding to the categories of:
• Declarative memory
• Procedural memory (memory about how to do certain things)
• Sensory and episodic memory
• Attentional memory (knowledge about what to pay attention to in what contexts
• Intentional memory (knowledge about the system's own goals and subgoals)
In Chapter 9 below we present a detailed argument as to how the requirement for a multi-
memory underpinning for general intelligence emerges from certain underlying assumptions
143
EFTA00623919
144
8 Cognitive Synergy
regarding the measurement of the simplicity of goals and environments; but the points made
here do not rely on that argument. What they do rely on is the assumption that, in the
intelligence in question, the different components of memory are significantly but not wholly
distinct. That is, there are significant "family resemblances" between the memories of a single
type, yet there are also thoroughgoing connections between memories of different types.
The cognitive synergy principle, if correct, applies to any AI system demonstrating intelli-
gence in the context of embodied, social communication. However, one may also take the theory
as an explicit guide for constructing AGI systems; and of course, the bulk of this book describes
one AGI architecture, CogPrime, designed in such a way.
It is possible to cast these notions in mathematical form, and we make some efforts in this
direction in Appendix ??, using the languages of category theory and information geometry.
However, this formalization has not yet led to any rigorous proof of the generality of cognitive
synergy nor any other exciting theorems; with luck this will come as the mathematics is further
developed. In this chapter the presentation is kept on the heuristic level, which Ls all that is
critically needed for motivating the CogPrime design.
8.2 Cognitive Synergy
The essential idea of cognitive synergy, in the context of multi-memory systems, may be ex-
pressed in terms of the following points:
1. Intelligence, relative to a certain set of environments, may be understood as the capability
to achieve complex goals in these environments.
2. With respect to certain classes of goals and environments (see Chapter 9 for a hypothe-
sis in this regard), an intelligent system requires a "multi-memory" architecture, meaning
the possession of a number of specialized yet interconnected knowledge types, including:
declarative, procedural, attentions', sensory, episodic and intentional (goal-related). These
knowledge types may be viewed as different sorts of patterns that a system recognizes in
itself and its environment. Knowledge of these various different types must be interlinked,
and in some cases may represent differing views of the same content (see Figure ??)
3. Such a system mast possess knowledge creation (i.e. pattern recognition / formation) mech-
anisms corresponding to each of these memory types. These mechanisms are also called
"cognitive processes."
4. Each of these cognitive processes, to be effective, must have the capability to recognize when
it lacks the information to perform effectively on its own; and in this case, to dynamically
and interactively draw information from knowledge creation mechanisms dealing with other
types of knowledge
5. This cross-mechanism interaction must have the result of enabling the knowledge creation
mechanisms to perform much more effectively in combination than they would if operated
non-interactively. This is "cognitive synergy."
While these points are implicit in the theory of mind given in IGoernial, they are not articulated
in this specific form there.
Interactions as mentioned in Points 4 and 5 in the above list are the real conceptual meat
of the cognitive synergy idea. One way to express the key idea here is that most Al algorithms
suffer from combinatorial explosions: the number of possible elements to be combined in a
EFTA00623920
8.2 Cognitive Synergy
145
Prelictive Implication
ve
14_7 °kern )
labarltaws
Lew
OM WV ens'
Ctorr•nt LOCallecy
r
ity
N",
'Evaluation Is,.
'Fa
NM (NNW New t
a
9tcreSaa,
corn
MIOCIOUVLAL
leJlOWLIDGI
ileltratalta)00.2:0,
batten'
IMISOOlti SENSORY
KNOWUOGI
Fig. 8.1: Illustrative example of the interactions between multiple types of knowledge, in repre-
senting a simple piece of knowledge. Generally speaking, one type of knowledge can be converted
to another, at the cost of some loss of information. The synergy between cognitive processes
associated with corresponding pieces of knowledge, possessing different type, is a critical aspect
of general intelligence.
synthesis or analysis is just too great, and the algorithms are unable to filter through all the
possibilities, given the lack of intrinsic constraint that comes along with a "general intelligence"
context (as opposed to a narrow-Al problem like chess-playing, where the context is constrained
and hence restricts the scope of possible combinations that needs to be considered). In an AGI
architecture based on cognitive synergy, the different learning mechanisms mast be designed
specifically to interact in such a way as to palliate each others' combinatorial explosions - so
that, for instance, each learning mechanism dealing with a certain sort of knowledge, must
synergize with learning mechanisms dealing with the other sorts of knowledge, in a way that
decreases the severity of combinatorial explosion.
One prerequisite for cognitive synergy to work is that each learning mechanism must rec-
ognize when it is "stuck," meaning it's in a situation where it has inadequate information to
make a confident judgment about what steps to take next. Then, when it does recognize that
it's stuck, it may request help from other, complementary cognitive mechanisms.
A theoretical notion closely related to cognitive synergy is the cognitive schematic, formalized
in Chapter 7 above, which states that the activity of the different cognitive processes involved
in an intelligent system may be modeled in terms of the schematic implication
Context A Procedure —> Goal
EFTA00623921
146
8 Cognitive Synergy
where the Context involves sensory, episodic and/or declarative knowledge; and attentional
knowledge is used to regulate how much resource is given to each such schematic implication in
memory. Synergy among the learning processes dealing with the context, the procedure and the
goal is critical to the adequate execution of the cognitive schematic using feasible computational
resources.
Finally, drilling a little deeper into Point 3 above, one arrives at a number of possible knowl-
edge creation mechanisms (cognitive processes) corresponding to each of the key types of knowl-
edge. Figure ?? below gives a high-level overview of the main types of cognitive process con-
sidered in the current version of Cognitive Synergy Theory, categorized according to the type
of knowledge with which each process deals.
8.3 Cognitive Synergy in CogPrime
Different cognitive systems will use different processes to fulfill the various roles identified in
Figure ?? above. Here we briefly preview the basic cognitive processes that the CogPrime ACT
design uses for these roles, and the synergies that exist between these.
8.3.1 Cognitive Processes in CogPrime
: a Cognitive Synergy Based Architecture..." from ICCI 2009
Table 8.1: default
ITabk will go herel
Table 8.2: The OpenCogPrime data structures used to represent the key knowledge types in-
volved
Table 8.3: default
lTabk will go here'
Table 8.4: Key cognitive processes, and the algorithms that play their roles in CogPrime
Tables 8.1 and 8.3 present the key structures and processes involved in CogPrime, identifying
each one with a certain memory/process type as considered in cognitive synergy theory. That
is: each of these cognitive structures or processes deals with one or more types of memory -
declarative, procedural, sensory, episodic or attentional. Table 8.5 describes the key CogPrime
EFTA00623922
8.3 Cognitive Synergy in CogP
I
Oselownolloon
MMOMMold Motto
MOMIMMoMmy
boob. booby
Cogruirve Processes
Associated with Types
of Memory
Alleasul Way
WNW CMS
Mt efOOMO•Or Memory
Ihmliew pea
wears"
Ile* amp kr hams & bassfella
lilismtdal nary fa Sot M.
Ilmoldid paws reterIllem
Ospos.
atall440011
ebooliodoMellm pores my
may
OMSK CS
auto.
Dorcas.), earmnni no wag in, tr. maims Mouled b men}, *MA
foe moon skeane t ream met morr
Map lonoilko
Meetkohtm red iyekrico of Moho emeroymo eavory moon*
G. W.M.
Renomel Sfe.goa OM ItAMOn Ocala, el IMMOOIG MOOMIIIOS
Procodural MornOry
MOSS. Sow,
esprawpwswasig
umisq•S ii•••••••• •
A
MOS Meseiteo
Doe no nwersom mow 1
Inland Simulabon
a/ Nslorfsal and hypothetical
cdernal events
:
venfulainsim foamy
fieesecesersiolhiewlefes
Is
ese pessise Nal
Fig. 8.2: High-level overview of the key cognitive dynamics considered here in the context of
cognitive synergy. The cognitive synergy principle describes the behavior of a system as it
pursues a set of goals (which in most cases may be assumed to be supplied to the system
"a priori", but then refined by inference and other processes). The assumed intelligent agent
model is roughly as follows: At each time the system chooses a set of procedures to execute,
based on its judgments regarding which procedures will best help it achieve its goals in the
current context. These procedures may involve external actions (e.g. involving conversation,
or controlling an agent in a simulated world) and/or internal cognitive actions. In order to
make these judgments it must effectively manage declarative, procedural, episodic, sensory
and attentional memory, each of which is associated with specific algorithms and structures
as depicted in the diagram. There are also global processes spanning all the forms of memory,
including the allocation of attention to different memory items and cognitive processes, and the
identification and reification of system-wide activity patterns (the latter referred to as "map
formation")
Table 8.5: default
ITabk will go here'
Table 8.6: Key OpenCogPrime cognitive processes categorized according to knowledge type and
process type
EFTA00623923
148
8 Cognitive Synergy
processes in terms of the "analysis vs. synthesis" distinction. Finally, Tables ?? and ?? exemplify
these structures and processes in the context of embodied virtual agent control.
In the CogPrime context, a procedure in this cognitive schematic is a program tree stored
in the system's procedural knowledge base; and a context is a (fuzzy, probabilistic) logical
predicate stored in the AtomSpace, that holds, to a certain extent, during each interval of time.
A goal is a fuzzy logical predicate that has a certain value at each interval of time, as well.
Attentions] knowledge is handled in CogPrime by the ECAN artificial economics mechanism,
that continually updates ShortTermImportance and LongTerm Importance values associated
with each item in the CogPrime system's memory, which control the amount of attention other
cognitive mechanisms pay to the item, and how much motive the system has to keep the
item in memory. HebbianLinks are then created between knowledge items that often possess
ShortTermlmportance at the same time; this is CogPrime's version of traditional Hebbian
learning.
ECAN has deep interactions with other cognitive mechanisms as well, which are essential
to its efficient operation; for instance. PLN inference may be used to help ECAN extrapolate
conclusions about what is worth paying attention to, and MOSES may be used to recognize
subtle attentional patterns. ECAN also handles "assignment of credit", the figuring-out of the
causes of an instance of successful goal-achievement, drawing on PLN and MOSES as needed
when the causal inference involved here becomes difficult.
The synergies between CogPrime's cognitive processes are well summarized below, which is
a 16x16 matrix summarizing a host of interprocess interactions generic to CST.
One key aspect of how CogPrime implements cognitive synergy is PLN's sophisticated man-
agement of the confidence of judgments. This tics in with the way OpenCogPrime's PLN in-
ference framework represents truth values in terms of multiple components (as opposed to the
single probability values used in many probabilistic inference systems and formalisms): each
item in OpenCogPrime's declarative memory has a confidence value associated with it, which
tells how much weight the system places on its knowledge about that memory item. This assists
with cognitive synergy as follows: A learning mechanism may consider itself "stuck", generally
speaking, when it has no high-confidence estimates about the next step it should take.
Without reasonably accurate confidence assessment to guide it, inter-component interaction
could easily lead to increased rather than decreased combinatorial explosion. And of course
there is an added recursion here, in that confidence assessment is carried out partly via PLN
inference, which in itself relies upon these same synergies for its effective operation.
To illustrate this point further, consider one of the synergetic aspects described in ?? below:
the role cognitive synergy plays in deductive inference. Deductive inference is a hard problem
in general - but what is hard about it is not carrying out inference steps, but rather "inference
control" (i.e., choosing which inference steps to carry out). Specifically, what must happen for
deduction to succeed in CogPrime is:
1. the system must recognize when its deductive inference process is "stuck", i.e. when the
PLN inference control mechanism carrying out deduction has no clear idea regarding which
inference step(s) to take next, even after considering all the domain knowledge at is disposal
2. in this case, the system must defer to another learning mechanism to gather more informa-
tion about the different choices available - and the other learning mechanism chosen must,
a reasonable percentage of the time, actually provide useful information that helps PLN to
get "unstuck" and continue the deductive process
EFTA00623924
8.4 Some Critical Synergies
149
For instance, deduction might defer to the "attentions' knowledge" sulmystem, and make
a judgment as to which of the many possible next deductive steps are most associated with
the goal of inference and the inference steps taken so far. according to the HebbianLinks con-
structed by the attention allocation subsystem, based on observed associations. Or, if this fails,
deduction might ask MOSES (running in supervised categorization mode) to learn predicates
characterizing some of the terms involving the possible next inference steps. Once MOSES pro-
vides these new predicates, deduction can then attempt to incorporate these into its inference
process, hopefully (though not necesbarily) arriving at a higher-confidence next step.
8.4 Some Critical Synergies
Referring back to Figure ??, and summarizing many of the ideas in the previous section, Table
?? enumerates a number of specific ways in which the cognitive processes mentioned in the
Figure may synergize with one another, potentially achieving dramatically greater efficiency
than would be possible on their own.
Of course, realizing these synergies on the practical algorithmic level requires significant
inventiveness and may be approached in many different ways. The specifics of how CogPrime
manifests these synergies are discussed in many following chapters.
Mee —a
•••••• i
Ili
Ms SSC.
Seel..—
•••••••••
Seneerlel—
ten•••• receenlbon
Unterelle 6666 tie.
Cr..... Ps teeter.
end neakendwile
stele. Slav sill
Sees
fl ee
Gee deers.
Sams
Link'
...West
wens
Os.
feenens Os.
• •••••••.i wi 'Sy
non none
rens
C•Owneen•
.99• 001 weenetee
le le
we
s
eed
geese
Ono.* owe resell.
eel •••••••••••••••••
sting
Slag ••••••••
taste
.Ye
OepleillAnl
••••••re Winery
<seine re.
ereSurel• le be Se
ee fettle. le
5—
_Sawa
Oslo
sin
' tre•
resew
eleven,
sews
See
enneene•
resew.
a illeilletenekv
ewes* ells
aselkin et
elneeled elne441
Onletlelleit
own • •••••••SIO
nnntatet reseal,
onkel Sall
aloe
were psis
*OS.
0 geese %ones
••••••••• p•••••••••
We's,
lel nee
Allenee• &Henna
C.W. sews.
Onele. '100••••••••er
nese" nes,
Neva
*noble. M le 5•0
WS
elleellewel
pawn. sew.
es
—Se
Gee news.
el....
Odle eke
Sue dorsi..
et
re.
Yee
fl eece
•••••••••
0•110•41. 1110
Olean&
0101••••••e,
Sled
es co
ereleinnee
obested et
SAMOA
s.
Aele OW WAS/
ellerletel Mils
see.
these nee.
Genteel aniallen
Gree•••• rat oval..
4 to lee let Ma
°Seel SOS
••••••••slems
Cod •••••••••••
Isere
se
lees
deineen d we
we
S.es
...we.
se seed
Wry et Metope
nee be lased
we
Cswelen .-war
OS.
ere Ceneleti le
be lad Me ofe• Oteele•
SOS
Serene
We wee way..
awn SIG *eel
ea, leaten in
•••••11100.
Fig. 8.3: This table, and the following ones, show sonic of the synergies between the primary
cognitive processes explicitly used in CogPrime.
EFTA00623925
8 Cognitive Synergy
PSI. I
4,
~run
~imp.
Supt.i.4 proceeur•
se ....Ap
Ababa elinea
Cabin ~Se
~fun
~nab.
PAA
~eft Somme puA
en p dip plus«.
ire4. le an MA proc~
~mg bin" nra
boolone• Imes inn
ontb.p of ~see
web
~In
~MPS.
~pub bblor kes•J
ultapps wok
IA~
~Sub
PAP•pb In PA •••••At•
bul prinen it NippeuP
USA nips*. lb
-I
t.i.Ii~le
ph..04.11••
Pay
Pf~4
film e. rwm is
AS. We. iflipørlort, •
b•Anle•••••
0
ppepaps IN~
NA
btu" ins Ps sip
b. eal~ NW ~III
boa a be ~nu 0
insuow• 1,~
op POPP n..
~pi ifflib•••
MAPPY•Mherp
ib.•
Ø SPY b
Powe• me ~op.
~W. nedtqln.
PT..
unro PP
Albino» ~am
Ile«
~pa
41 pp,
••••••••• et 4.11
rper~un
kun
..sua...
Puttehn bump eun
...W.,
ttttt .9, .
tnt
id • p ten &.p.
an-. ••• in., ••••41 »
sup ,pp•ps and
pacronsibt pabp
ant,
tlabta
NA
CubleSb• ce «Abb.%
~NI ••• nap
laffilion, IIIII~4 I»
bp ~Pi
NI unt,
IS
erne ampn
< 0 ni.pi tromp
Abp. ~es
alepart or op .-au•
c* la CaLtlal
Pit•da ...nag can
ee us., b arch b
Mnikait, MHO% Cl
nt.;
CC..pt. vat?
• • Ott»
t3
Annn‘nis t»hefloe n
n
warts bps:~
.4~ ••••••••••.1
Fn
be um.* &new
WIC*" tagi Co
-.Yet!. Yeas
1,04.4.0}
KA» I
p ,
~Pa
~ant*
tup~d
~in
Punsr
Monlbei SUMO
~bpi
CMS«
mg %Asa«
1:p. (ASA,
rbpA:. cur A41p
- ,40 la-et- 9u...
sIMII 17.4 I. 114.,I
b
boon." bump Ain
00) USK t frilfal br
tip% rat tlitt mY.
CCell,p40 Pal Ma 'CO.
OCCW.«••
~AA
EWA»,
"bib< lb ri• 00
. ithp tat ~II
lb %bib.« Prep/
fyr•egy
WM 1171.~
Int st • ( in f ill,,
uAt paii rrApnret
t
.9' ne."...a
onuw
fl oe M ~Mink',
"hap saipAalli
Opl•nues. ~ph
tiara
90~4
~AA ~9 NAPPAPI
Co...01 opasp (an
ei 'surd AU ponb• Cab
PS AY p). rub•inAP
09 • A** ~5~
MP
SSA ~WM°
big••••••
b ANN 10 linebb
bee kw been up
linAPPub
Supra el cowl
Y ~OS
AP ApaIWP IIPPI
burp,
~Men lobtatan libi
ankh pinata
•
~Sued
b LP
....nri was 0~
IS iirlIcavallal
bbelY
Saaarkter
0
IIC•I IAN.«
SPACUbb•
SPAN NS II 0
MA b Ab~ PAO
ITOC•b• Inmvp Uri
OP v.« • SW ~a
WA*
p
APINCalato Os
ØØ
~II
WPM
~
I.".
«bang SA
~AS
bubo
SP paw% PM le li•
~mil
Nib COUPS Pep t•
nal
NM be vs
OM tO APAP PI
lertalni pbmii n
~nob
data
sptsaMp
EFTA00623926
8.5 The Cognitive Schematic
151
Noe ...ft
fa.
I
W
Se Sallie
IFNI Oran
SIMilata
Sedelflia•lef
nom
IIKOOMM.
lift tentitast
HA
1.4O10O,11fIn Nay
...... CM "el", ...Ps
Iffl 4 40140.
Ind
NO S9W.N
din, .....v.
••••11.11
NO .q.ae.
(Ma plegy
gcos -goal
v•neneereOli no..
Mw fl ea
COADY" i.e.... *own
IMF, My DI. •••41
MO Nieleilal la %Meng
*MOS
MA
N. urea anr
CIITC1 irlOrt,
NO ...tan/
—
"VW
el•NOISII
No apArsil dna
WWII
NO Wnreart OnCe
trInpy
NA
Plnrea CO Niar.rod
SMICeIVICe I.Ifine My
Os won tO 1•49* •••••••• •
I. M.acefity
1154,160
•a••••••••
ISM
of forpltla
Carageli ',mei. /*In
sof soy ulnae.
Cbrilairq psalm torn
into. plow
mania
entirsia
as aa.
us Mr
ISOM
Porn
....fora. ft
ihna:10,4 na,
th. oe °mow
for reoffnce ft
toSMONOrill••••
OM
NA
pas SillINSIIIIII
MINS Wadi
8.5 The Cognitive Schematic
Now we return to the "cognitive schematic" notion, according to which various cognitive pro-
cesses involved in intelligence may be understood to work together via the implication
Context A Procedure —> Goal < p
(summarized C A P -, G). Semi-formally, this implication may be interpreted to mean: "If the
context C appears to hold currently, then if I enact the procedure P, I can expect to achieve
the goal G with certainty p."
The cognitive schematic leads to a conceptualization of the internal action of an intelligent
system as involving two key categories of learning:
• Analysis: Estimating the probability p of a posited CAP —)Grelationship
• Synthesis: Filling in one or two of the variables in the cognitive schematic, given as-
sumptions regarding the remaining variables, and directed by the goal of maximizing the
probability of the cognitive schematic
More specifically, where synthesis is concerned, some key examples are:
• The MOSES probabilistic evolutionary program learning algorithm is applied to find P,
given fixed C and G. Internal simulation is also used, for the purpose of creating a simulation
embodying C and seeing which P lead to the simulated achievement of G.
- Example: A virtual dog learns a procedure P to please its owner (the goal C) in the
context C where there is a ball or stick present and the owner is saying "fetch".
• PLN inference, acting on declarative knowledge, is used for choosing C, given fixed P and
G (also incorporating sensory and episodic knowledge as appropriate). Simulation may also
be used for this purpose.
EFTA00623927
152
8 Cognitive Synergy
- Example: A virtual dog wants to achieve the god G of getting food, and it knows that
the procedure P of begging has been successful at this before, so it seeks a context C
where begging can be expected to get it food. Probably this will be a context involving a
friendly person.
• PLN-based goal refinement is used to create new subgoals C to sit on the right hand side
of instances of the cognitive schematic.
- Example: Given that a virtual dog has a goal of finding food, it may learn a subgoal of
following other dogs, due to observing that other dogs are often heading toward their
food.
• Concept formation heuristics are used for choosing G and for fueling goal refinement, but
especially for choosing C (via providing new candidates for C). They are also used for
choosing P, via a process called "predicate schematization" that turns logical predicates
(declarative knowledge) into procedures.
- Example: At first a virtual dog may have a hard time predicting which other dogs are
going to be mean to it. But it may eventually observe common features among a number
of mean dogs, and thus form its own concept of "pit bull," without anyone ever teaching
it this concept explicitly.
Where analysis is concerned:
• PLN inference, acting on declarative knowledge, is used for estimating the probability of
the implication in the cognitive schematic, given fixed C, P and G. Episodic knowledge
is also used this regard, via enabling estimation of the probability via simple similarity
matching against past experience. Simulation is also used: multiple simulations may be
run, and statistics may be captured therefrom.
- Example: To estimate the degree to which asking Bob for food (the procedure P is "asking
for food", the context C is "being with Bob") will achieve the goal G of getting food, the
virtual dog may study its memory to see what happened on previous occasions where it
or other dogs asked Bob for food or other things, and then integrate the evidence from
these occasions.
• Procedural knowledge, mapped into declarative knowledge and then acted on by PLN in-
ference, can be useful for estimating the probability of the implication CAP
G, in cases
where the probability of C A Pi
G is known for some Pi related to P.
- Example: knowledge of the internal similarity between the procedure of asking for food
and the procedure of asking for toys, allows the virtual dog to reason that if asking Bob
for toys has been successful, maybe asking Bob for food will be successful too.
• Inference, acting on declarative or sensory knowledge, can be useful for estimating the
probability of the implication CAP —> G, in cases where the probability of C1 AP —)G is
known for some C1 related to C.
- Example: if Bob and Jim have a lot of features in common, and Bob often responds
positively when asked for food, then maybe Jim will too.
• Inference can be used similarly for estimating the probability of the implication CAP
G,
in cases where the probability of C A P
GI is known for some G1 related to G. Concept
EFTA00623928
8.6 Cognitive Synergy for Procedural and Declarative Learning
153
creation can be useful indirectly in calculating these probability estimates, via providing
new concepts that can be used to make useful inference trails more compact and hence
easier to construct.
- Example: The dog may reason that because Jack likes to play, and Jack and Jill are both
children, maybe Jill likes to play too. It can carry out this reasoning only if its concept
creation process has invented the concept of "child" via analysis of observed data.
In these examples we have focused on cases where two terms in the cognitive schematic are
fixed and the third must be filled in; but just as often, the situation is that only one of the
terms is fixed. For instance, if we fix G, sometimes the best approach will be to collectively
learn C and P. This requires either a procedure learning method that works interactively with a
declarative-knowledge-focused concept learning or reasoning method; or a declarative learning
method that works interactively with a procedure learning method. That is, it requires the sort
of cognitive synergy built into the CogPrime design.
8.6 Cognitive Synergy for Procedural and Declarative Learning
We now present a little more algorithmic detail regarding the operation and synergetic in-
teraction of CogPrime's two most sophisticated components: the MOSES procedure learning
algorithm (see Chapter 33), and the PLN uncertain inference framework (see Chapter 34). The
treatment is necessarily quite compact, since we have not yet reviewed the details of either
MOSES or PLN; but as well as illustrating the notion of cognitive synergy more concretely,
perhaps the high-level discussion here will make clearer how MOSES and PLN fit into the big
picture of CogPrime.
8.6.1 Cognitive Synergy in MOSES
MOSES, CogPrime's primary algorithm for learning procedural knowledge, has been tested on
a variety of application problems including standard GP test problems, virtual agent control,
biological data analysis and text classification ELoo061. It represents procedures internally as
program trees. Each node in a MOSES program tree is supplied with a "knob," comprising a
set of values that may potentially be chosen to replace the data item or operator at that node.
So for instance a node containing the number 7 may be supplied with a knob that can take
on any integer value. A node containing a while loop may be supplied with a knob that can
take on various possible control flow operators including conditionals or the identity. A node
containing a procedure representing a particular robot movement, may be supplied with a knob
that can take on values corresponding to multiple possible movements. Following a metaphor
suggested by Douglas Hofstadter
MOSES learning covers both "knob twiddling" (setting
the values of knobs) and "knob creation."
MOSES is invoked within CogPrime in a number of ways, but most commonly for finding a
procedure P satisfying a probabilistic implication C&P r G as described above, where C is an
observed context and G is a system goal. In this case the probability value of the implication
provides the "scoring function" that MOSES uses to assess the quality of candidate procedures.
EFTA00623929
154
8 Cognitive Synergy
1
Representation-Building
Randomscoring
fp a ling)
Optimization
Fig. 8.4: High-Level Control Flow of MOSES Algorithm
For example, suppose an CogPrime -controlled robot is trying to learn to play the game
of "tag." (I.e. a multi-agent game in which one agent is specially labeled "it", and runs after
the other player agents, trying to touch them. Once another agent is touched, it becomes the
new "it" and the previous "it" becomes just another player agent.) Then its context C is that
others are trying to play a game they call "tag" with it; and we may assume its goals are to
please them and itself, and that it has figured out that in order to achieve this goal it should
learn some procedure to follow when interacting with others who have said they are playing
"tag." In this case a potential tag-playing procedure might contain nodes for physical actions
like step_forward(speed s), as well as control flow nodes containing operators like if else
(for instance, there would probably be a conditional telling the robot to do something different
depending on whether someone seems to be chasing it). Each of these program tree nodes would
have an appropriate knob assigned to it. And the scoring function would evaluate a procedure
P in terms of how successfully the robot played tag when controlling its behaviors according to
P (noting that it may also be using other control procedures concurrently with P). It's worth
noting here that evaluating the scoring function in this case involves some inference already.
because in order to tell if it is playing tag successfully, in a real-world context, it must watch
and understand the behavior of the other players.
MOSES follows the high-level control flow depicted in Figure 8.4, which corresponds to the
following process for evolving a metapopulation of "denies" of programs (each dome being a set
of relatively similar programs, forming a sort of island in program space):
1. Construct an initial set of knobs based on some prior (e.g., based on an empty program;
or more interestingly, using prior knowledge supplied by PLN inference based on the
system's memory) and use it to generate an initial random sampling of programs. Add this
deme to the metapopulation.
2. Select a deme from the metapopulation and update its sample, as follows:
EFTA00623930
8.6 Cognitive Synergy for Procedural and Declarative Learning
155
a. Select some promising programs from the deme's existing sample to use for modeling,
according to the scoring function.
b. Considering the promising programs as collections of knob settings, generate new collec-
tions of knob settings by applying some (competent) optimization algorithm. For best
performance on difficult problems, it is important to use an optimization algorithm that
makes use of the system's memory in its choices, consulting PLN inference to help
estimate which collections of knob settings will work best.
c. Convert the new collections of knob settings into their corresponding programs, re-
duce the programs to normal form, evaluate their scores. and integrate them into the
dome's sample, replacing less promising programs. In the case that scoring is expensive,
score evaluation may be preceded by score estimation, which may use PLN inference,
enaction of procedures in an internal simulation environment, and/or similarity
matching against episodic memory.
3. For each new program that meet the criterion for creating a new demo, if any:
a. Construct a new set of knobs (a process called "representation-building") to define a
region centered around the program (the deme's exemplar), and use it to generate a
new random sampling of programs, producing a new dome.
b. Integrate the new deme into the metapopulation, possibly displacing less promising
domes.
4. Repeat from step 2.
MOSES is a complex algorithm and each part plays its role; if any one part is removed the
performance suffers significantly I1Am06I. However, the main point we want to highlight here is
the role played by synergetic interactions between MOSES and other cognitive components such
as PLN, simulation and episodic memory, as indicated in boldface in the above cceudocode.
MOSES is a powerful procedure learning algorithm, but used on its own it nuts into scalability
problems like any other such algorithm; the reason we feel it has potential to play a major role
in a human-level AI system is its capacity for productive interoperation with other cognitive
components.
Continuing the "tag" example, the power of MOSES's integration with other cognitive pro-
cesses would come into play if, before learning to play tag, the robot has already played simpler
games involving chasing. If the robot already has experience chasing and being chased by other
agents, then its episodic and declarative memory will contain knowledge about how to pursue
and avoid other agents in the context of running around an environment full of objects, and this
knowledge will be deployable within the appropriate parts of MOSES's Steps 1 and 2. Cross-
process and cross-memory-type integration make it tractable for MOSES to act as a "transfer
learning" algorithm, not just a task-specific machine-learning algorithm.
8.6.2 Cognitive Synergy in PLN
While MOSES handles much of CogPrime's procedural learning, and OpenCogPrimes inter-
nal simulation engine handles most episodic knowledge. CogPrime's primary tool for handling
declarative knowledge is an uncertain inference framework called Probabilistic Logic Networks
(PLN). The complexities of PLN are the topic of a lengthy technical monograph IGMIHOSI, and
EFTA00623931
156
8 Cognitive Synergy
here we will eschew most details and focus mainly on pointing out how PLN seeks to achieve
efficient inference control via integration with other cognitive processes.
As a logic, PLN is broadly integrative: it combines certain term logic rules with more standard
predicate logic rules, and utilizes both fuzzy truth values and a variant of imprecise probabilities
called indefinite probabilities. PLN mathematics tells how these uncertain truth values propagate
through its logic rules, so that uncertain premises give rise to conclusions with reasonably
accurately estimated uncertainty values. This careful management of uncertainty is critical for
the application of logical inference in the robotics context, where most knowledge is abstracted
from experience and is hence highly uncertain.
PLN can be used in either forward or backward chaining mode; and in the language intro-
duced above, it can be used for either analysis or synthesis. As an example, we will consider
backward chaining analysis, exemplified by the problem of a robot preschool-student trying to
determine whether a new playmate "Bob" is likely to be a regular visitor to its preschool or not
(evaluating the truth value of the implication Bob —> regular _visitor). The basic backward
chaining process for PLN analysis looks like:
1. Given an implication L
A —> B whose truth value must be estimated (for instance
L a C&P
C as discussed above), create a list
A„) of (inference rule, stored
knowledge) pairs that might be used to produce L
2. Using analogical reasoning to prior inferences, assign each A; a probability of success
• If some of the A; are estimated to have reasonable probability of success at generating
reasonably confident estimates of L's truth value, then invoke Step 1 with A; in place
of L (at this point the inference process becomes recursive)
• If none of the Ai looks sufficiently likely to succeed, then inference has "gotten stuck"
and another cognitive process should be invoked, e.g.
— Concept creation may be used to infer new concepts related to A and B, and then
Step 1 may be revisited, in the hope of finding a new, more promising Ai involving
one of the new concepts
— MOSES may be invoked with one of several special goals, e.g. the goal of finding
a procedure P so that P(X) predicts whether X r
B. If MOSES finds such a
procedure P then this can be converted to declarative knowledge understandable
by PLN and Step 1 may be revisited....
— Simulations may be run in CogPrime's internal simulation engine, so as to observe
the truth value of A r B in the simulations; and then Step 1 may be revisited....
The combinatorial explosion of inference control is combatted by the capability to defer to
other cognitive processes when the inference control procedure is unable to make a sufficiently
confident choice of which inference steps to take next. Note that just as MOSES may rely
on PLN to model its evolving populations of procedures, PLN may rely on MOSES to create
complex knowledge about the terms in its logical implications. This is just one example of the
multiple ways in which the different cognitive processes in CogPrime interact synergetically; a
more thorough treatment of these interactions is given in Chapter 49.
In the "new playmate" example, the interesting case is where the robot initially seems not
to know enough about Bob to make a solid inferential judgment (so that none of the Ai seem
particularly promising). For instance, it might carry out a number of possible inferences and not
come to any reasonably confident conclusion, so that the reason none of the A; seem promising
is that all the decent-looking ones have been tried already. So it might then recourse to MOSES,
simulation or concept creation.
EFTA00623932
8.7 Is Cognitive Synergy Tricky?
157
For instance, the PLN controller could make a list of everyone who has been a regular
visitor, and everyone who has not been, and pose MOSES the task of figuring out a procedure
for distinguishing these two categories. This procedure could then used directly to make the
needed assessment, or else be translated into logical rules to be used within PLN inference. For
example, perhaps MOSES would discover that older males wearing ties tend not to become
regular visitors. If the new playmate is an older male wearing a tie, this is directly applicable.
But if the current playmate is wearing a tuxedo, then PLN may be helpful via reasoning that
even though a tuxedo is not a tie, it's a similar form of fancy dress - so PLN may extend the
MOSES-learned rule to the present case and infer that the new playmate is not likely to be a
regular visitor.
8.7 Is Cognitive Synergy Tricky?
In this section we use the notion of cognitive synergy to explore a question that arises
frequently in the AGI community: the well-known difficulty of measuring intermediate progress
toward human-level AGI. We explore some potential reasons underlying this, via extending the
notion of cognitive synergy to a more refined notion of "tricky cognitive synergy." These ideas
are particularly relevant to the problem of creating a roadmap toward AGI, as we'll explore in
Chapter 17 below.
8.7.1 The Puzzle: Why Is It So Hard to Measure Partial Progress
Toward Human-Level AGI?
It's not entirely straightforward to create tests to measure the final achievement of human-level
AGI, but there are some fairly obvious candidates here. There's the Turing Test (fooling judges
into believing you're human, in a text chat), the video Turing Test, the Robot College Student
test (passing university, via being judged exactly the same way a human student would), etc.
There's certainly no agreement on which is the most meaningful such goal to strive for, but
there's broad agreement that a number of goals of this nature basically make sense.
On the other hand, how does one measure whether one is, say, 50 percent of the way to
human-level AGI? Or, say, 75 or 25 percent?
It's possible to pose many "practical tests" of incremental progress toward human-level AGI,
with the property that if a proto-AGI system passes the test using a certain sort of architecture
and/or dynamics, then this implies a certain amount of progress toward human-level AGI based
on particular theoretical assumptions about AOL However, in each case of such a practical test,
it seems intuitively likely to a significant percentage of AGI researchers that there is some way
to "game" the test via designing a system specifically oriented toward passing that test, and
which doesn't constitute dramatic progress toward AGI.
Some examples of practical tests of this nature would be
This section co-authored with Jared Wigntore
EFTA00623933
158
8 Cognitive Synergy
• The Wozniak "coffee test": go into an average American house and figure out how to make
coffee, including identifying the coffee machine, figuring out what the buttons do, finding
the coffee in the cabinet, etc.
• Story understanding - reading a story, or watching it on video, and then answering questions
about what happened (including questions at various levels of abstraction)
• Graduating (virtual-world or robotic) preschool
• Passing the elementary school reading curriculum (which involves reading and answering
questions about some picture books as well as purely textual ones)
• Learning to play an arbitrary video game based on experience only, or based on experience
plus reading instructions
One interesting point about tests like this is that each of them seems to some AGI researchers
to encapsulate the crux of the AGI problem, and be unsolvable by any system not far along
the path to human-level AGI - yet seems to other AGI researchers, with different conceptual
perspectives, to be something probably game-able by narrow-Al methods. And of course, given
the current state of science, there's no way to tell which of these practical tests really can be
solved via a narrow-Al approach, except by having a lot of people try really hard over a long
period of time.
A question raised by these observations is whether there is some fundamental reason why
it's hard to make an objective, theory-independent measure of intermediate progress toward
advanced AGI. Is it just that we haven't been smart enough to figure out the right test - or is
there some conceptual reason why the very notion of such a test is problematic?
We don't claim to know for sure - but in the rest of this section we'll outline one possible
reason why the latter might be the case.
8.7.2 A Possible Answer: Cognitive Synergy is Tricky!
Why might a solid, objective empirical test for intermediate progress toward AGI be an in-
feasible notion? One possible reason, we suggest, is precisely cognitive synergy, as discussed
above.
The cognitive synergy hypothesis, in its simplest form, states that human-level AGI in-
trinsically depends on the synergetic interaction of multiple components (for instance, as in
CogPrime, multiple memory systems each supplied with its own learning process). In this hy-
pothesis, for instance, it might be that there are 10 critical components required for a human-
level AGI system. Having all 10 of them in place results in human-level AGI, but having only
8 of them in place results in having a dramatically impaired system - and maybe having only
6 or 7 of them in place results in a system that can hardly do anything at
Of course, the reality is almost surely not as strict as the simplified example in the above
paragraph suggests. No AGI theorist has really posited a list of 10 crisply-defined subsystems
and claimed them necessary and sufficient for AGI. We suspect there are many different routes
to AGI, involving integration of different sorts of subsystems. However, if the cognitive synergy
hypothesis is correct, then human-level AGI behaves roughly like the simplistic example in the
prior paragraph suggests. Perhaps instead of using the 10 components, you could achieve human-
level AGI with 7 components, but having only 5 of these 7 would yield drastically impaired
functionality - etc. Or the point could be made without any decomposition into a finite set
of components, using continuous probability distributions. To mathematically formalize the
EFTA00623934
8.7 Is Cognitive Synergy Tricky?
159
cognitive synergy hypothesis becomes complex, but here we're only aiming for a qualitative
argument. So for illustrative purposes, we'll stick with the "10 components" example, just for
communicative simplicity.
Next, let's suppose that for any given task, there are ways to achieve this task using a system
that is much simpler than any subset of size 6 drawn from the set of 10 components needed
for human-level AGI, but works much better for the task than this subset of 6 components
(assuming the latter are used as a set of only 6 components, without the other 4 components).
Note that this supposition is a good bit stronger than mere cognitive synergy. For lack of
a better name, we'll call it tricky cognitive synergy. The tricky cognitive synergy hypothesis
would be true if, for example, the following possibilities were true:
• creating components to serve as parts of a synergetic AGI is harder than creating compo-
nents intended to serve as parts of simpler AI systems without synergetic dynamics
• components capable of serving as parts of a synergetic AGI are necessarily more complicated
than components intended to serve as parts of simpler AGI systems.
These certainly seem reasonable possibilities, since to serve as a component of a synergetic AGI
system, a component must have the internal flexibility to usefully handle interactions with a lot
of other components as well as to solve the problems that come its way. In a CogPrime context,
these possibilities ring true, in the sense that tailoring an AI process for tight integration with
other Al processes within CogPrime, tends to require more work than preparing a conceptually
similar Al process for use on its own or in a more task-specific narrow AI system.
It seems fairly obvious that, if tricky cognitive synergy really holds up as a property of
human-level general intelligence, the difficulty of formulating tests for intermediate progress
toward human-level AGI follows as a consequence. Because, according to the tricky cognitive
synergy hypothesis, any test is going to be more easily solved by some simpler narrow AI process
than by a partially complete human-level AGI system.
8.7.3 Conclusion
We haven't proved anything here, only made some qualitative arguments. However, these argu-
ments do seem to give a plausible explanation for the empirical observation that positing tests
for intermediate progress toward human-level AGI is a very, difficult prospect. If the theoret-
ical notions sketched here are correct, then this difficulty is not due to incompetence or lack
of imagination on the part of the AGI community, nor due to the primitive state of the AGI
field, but is rather intrinsic to the subject matter. And if these notions are correct, then quite
likely the future rigorous science of AGI will contain formal theorems echoing and improving
the qualitative observations and conjectures we've made here.
If the ideas sketched here are true, then the practical consequence for AGI development
is, very simply, that one shouldn't worry a lot about producing intermediary results that are
compelling to skeptical observers. Just at 2/3 of a human brain may not be of much use,
similarly, 2/3 of an AGI system may not be much use. Lack of impressive intermediary results
may not imply one is on a wrong development path; and comparison with narrow AI systems on
specific tasks may be badly misleading as a gauge of incremental progress toward human-level
AGI.
EFTA00623935
160
8 Cognitive Synergy
Hopefully it's clear that the motivation behind the line of thinking presented here is a desire
to understand the nature of general intelligence and its pursuit - not a desire to avoid testing our
AGI software! Really, as AGI engineers, we would love to have a sensible rigorous way to test our
interniediary progress toward AGI, so as to be able to pass convincing arguments to skeptics,
funding sources, potential collaborators and so forth. Our motivation here is not a desire to
avoid having the intermediate progress of our efforts measured, but rather a desire to explain
the frustrating (but by now rather well-established) difficulty of creating such intermediate
goals for human-level AGI in a meaningful way.
If we or someone else figures out a compelling way to measure partial progress toward AGI,
we will celebrate the occasion. But it seems worth seriously considering the possibility that the
difficulty in finding such a measure reflects fundamental properties of general intelligence.
Front a practical CogPrime perspective, we are interested in a variety of evaluation and
testing methods, including the "virtual preschool" approach mentioned briefly above and more
extensively in later chapters. However, our focus will be on evaluation methods that give us
meaningful information about CogPrime's progress, given our knowledge of how CogPrime
works and our understanding of the underlying theory. We are unlikely to focus on the achieve-
ment of intermediate test results capable of convincing skeptics of the reality of our partial
progress, because we have not yet seen any credible tests of this nature, and because we suspect
the reasons for this lack may be rooted in deep properties of feasible general intelligence, such
as tricky cognitive synergy.
EFTA00623936
Chapter 9
General Intelligence in the Everyday Human
World
9.1 Introduction
Intelligence is not just about what happens inside a system, but also about what happens outside
that system, and how the system interacts with its environment. Real-world general intelligence
is about intelligence relative to some particular class of environments, and human-like general
intelligence is about intelligence relative to the particular class of environments that humans
evolved in (which in recent millennia has included environments humans have created using
their intelligence). In Chapter 2, we reviewed some specific capabilities characterizing human-
like general intelligence; to connect these with the general theory of general intelligence from the
last few chapters, we need to explain what aspects of human-relevant environments correspond
to these human-like intelligent capabilities. We begin with aspects of the environment related
to communication, which turn out to tie in closely with cognitive synergy. Then we turn to
physical aspects of the environment, which we suspect also connect closely with various human
cognitive capabilities. Finally we turn to physical aspects of the human body and their relevance
to the human mind. In the following chapter we present a deeper. more abstract theoretical
framework encompassing these ideas.
These ideas are of theoretical importance, and they're also of practical importance when one
turns to the critical area of AG/ environment design. If one is going to do anything besides
release one's young AGI into the "wilds" of everyday human life, then one has to put some
thought into what kind of environment it will be raised in. This may be a virtual world or it
may be a robot preschool or some other kind of physical environment, but in any case some
specific choices mast be made about what to include. Specific choices must also be made about
what kind of body to give one's AGI system - what sensors and actuators, and so forth. In
Chapter 16 we will present some specific suggestions regarding choices of embodiment and
environment that we find to be ideal for AGI development - virtual and robot preschools - but
the material in this chapter is of more general import, beyond any such particularities. If one
has an intuitive idea of what properties of body and world human intelligence is biased for,
then one can make practical choices about embodiment and environment in a principled rather
than purely ad hoc or opportunistic way.
161
EFTA00623937
162
9 General Intelligence in the Everyday Human World
9.2 Some Broad Properties of the Everyday World That Help
Structure Intelligence
The properties of the everyday world that help structure intelligence are diverse and span
multiple levels of abstraction. Most of this chapter will focus on fairly concrete patterns of this
nature, such as are involved in inter-agent communication and naive physics; however, it's also
worth noting the potential importance of more abstract patterns distinguishing the everyday
world from arbitrary, mathematical environments.
The propensity to search for hierarchical patterns is one huge potential example of an ab-
stract everyday-world property. We strongly suspect the reason that searching for hierarchical
patterns works so well, in so many everyday-world contexts, lies in the particular structure of
the everyday world - it's not something that would be true across all passible environments
(even if one weights the space of possible environments in some clever way, say using program-
length according to some standard computational model). However, this sort of assertion is of
course highly "philosophical," and becomes complex to formulate and defend convincingly given
the current state of science and mathematics.
Going one step further, we recall from Chapter 3 a structure called the "dual network", which
consists of superposed hierarchical and heterarchical networks: basically a hierarchy in which
the distance between two nodes in the hierarchy is correlated with the distance between the
nodes in some metric space. Another high level property of the everyday world may be that dual
network structures are prevalent. This would imply that minds biased to represent the world in
terms of dual network structure are likely to be intelligent with respect to the everyday world.
In a different direction, the extreme commonality of symmetry groups in the (everyday and
otherwise) physical world is another example: they occur so often that minds oriented toward
recognizing patterns involving symmetry groups are likely to be intelligent with respect to the
real world.
We suspect that the number of cognitively-relevant properties of the everyday world is huge
... and that the essence of everyday-world intelligence lies in the list of varyingly abstract and
concrete properties, which must be embedded implicitly or explicitly in the structure of a natural
or artificial intelligence for that system to have everyday-world intelligence.
Apart from these particular yet abstract properties of the everyday world, intelligence is just
about "finding patterns in which actions tend to achieve which goals in which situations" ... but,
the simple meta-algorithm needed to accomplish this universally is, we suggest, only a small
percentage what it takes to make a mind.
You might say that a sufficiently generally intelligent system should be able to infer the
various cognitively-relevant properties of the environment from looking at data about the ev-
eryday world. We agree in principle, and in fact Ben Kuipers and his colleagues have done
some interesting work in this direction, showing that learning algorithms can infer some basics
about the structure of space and time from experience 1M K071. But we suggest that doing this
really thoroughly would require a massively greater amount of processing power than an AGI
that embodies and hence automatically utilizes these principles. It may be that the problem of
inferring these properties is so hard as to require a wildly infeasible Al XI"
Godel Machine
type system.
EFTA00623938
9.3 Embodied Communication
163
9.3 Embodied Communication
Next we turn to the potential cognitive implications of seeking to achieve goals in an environ-
ment in which multimodal communication with other agents plays a prominent role.
Consider a community of embodied agents living in a shared world, and suppose that the
agents can communicate with each other via a set of mechanisms including:
• Linguistic communication, in a language whose semantics Ls largely (not necessarily
wholly) interpretable based on the mutually experienced world
• Indicative communication, in which e.g. one agent points to some part of the world or
delimits some interval of time, and another agent is able to interpret the meaning
• Demonstrative communication, in which an agent carries out a set of actions in the
world, and the other agent is able to imitate these actions, or instruct another agent as to
how to imitate these actions
• Depictive communication, in which an agent creates some sort of (visual, auditory, etc.)
construction to show another agent, with a goal of causing the other agent to experience
phenomena similar to what they would experience upon experiencing some particular entity
in the shared environment
• Intentional communication, in which an agent explicitly communicates to another agent
what its goal is in a certain situation I
It is clear that ordinary everyday communication between humans possesses all these aspects.
We define the Embodied Communication Prior (ECP) as the probability distribution in
which the probability of an entity (e.g. a goal or environment) is proportional to the difficulty of
describing that entity, for a typical member of the community in question, using a particular set
of communication mechanisms including the above five modes. We will sometimes refer to the
prior probability of an entity under this distribution, as its "simplicity" under the distribution.
Next, to further specialize the Embodied Communication Prior, we will assuine that for
each of these modes of communication, there are some aspects of the world that are much
more easily communicable using that mode than the other modes. For instance, in the human
everyday world:
• Abstract (declarative) statements spanning large classes of situations are generally much
easier to communicate linguistically
• Complex, multi-part procedures are much easier to communicate either demonstratively, or
using a combination of demonstration with other modes
• Sensory or episodic data is often much easier to communicate demonstratively
• The current value of attending to some portion of the shared environment is often much
easier to communicate indicatively
• Information about what goals to follow in a certain situation is often much easier to com-
municate intentionally, i.e. via explicitly indicating what one's own goal is
These simple observations have significant implications for the nature of the Embodied Com-
munication Prior. For one thing they let us define multiple forms of knowledge:
• Isolatedly declarative knowledge is that which is much more easily communicable lin-
guistically
a in Appendix ?? we recount some interesting recent results showing that mirror neurons fire in response to
some cases of intentional communication as thus defined
EFTA00623939
164
9 Cenral Intelligence in the Everyday Human World
• Isolatedly procedural knowledge is that which is much more easily communicable
demonstratively
• Isolatedly sensory knowledge is that which is much more easily communicable depic-
tively
• Isolatedly attentive knowledge is that which is much more easily communicable indica-
tively
• Isolatedly intentional knowledge is that which is much more easily communicable in-
tentionally
This categorization of knowledge types resembles many ideas from the cognitive theory of
memory IT(1)5I, although the distinctions drawn here are a little crisper than any classification
currently derivable from available neurological or psychological data.
Of course there may be much knowledge, of relevance to systems seeking intelligence accord-
ing to the ECP, that does not fall into any of these categories and constitutes "mixed knowledge."
There are some very important specific subclasses of mixed knowledge. For instance, episodic
knowledge (knowledge about specific real or hypothetical sets of events) will most easily be
communicated via a combination of declarative, sensory, and (in some cases) procedural com-
munication. Scientific and mathematical knowledge are generally mixed knowledge, as is most
everyday commonsense knowledge.
Some cases of mixed knowledge are reasonably well decomposable, in the sense that they
decompose into knowledge items that individually fall into some specific knowledge type. For
instance, an experimental chemistry procedure may be much more easily communicable pro-
cedurally, whereas an allied piece of knowledge from theoretical chemistry may be much more
easily communicable declaratively; but in order to fully communicate either the experimental
procedure or the abstract piece of knowledge, one may ultimately need to communicate both
aspects.
Also, even when the best way to communicate something is mixed-mode, it may be possible
to identify one mode that poses the most important part of the communication. An example
would be a chemistry experiment that is best communicated via a practical demonstration
together with a running narrative. It may be that the demonstration without the narrative
would be vastly more valuable than the narrative without the demonstration. To cover such
cases we may make less restrictive definitions such as
• Interactively declarative knowledge is that which is much more easily communicable
in a manner dominated by linguistic communication
and so forth. We call these "interactive knowledge categories," by contrast to the "isolated
knowledge categories" introduced earlier.
9.3.0.1 Naturalness of Knowledge Categories
Next we introduce an assumption we call NKC, for Naturalness of Knowledge Categories.
The NKC assumption states that the knowledge in each of the above isolated and interac-
tive communication-modality-focused categories forms a "natural category," in the sense that
for each of these categories, there are many different properties shared by a large percentage of
the knowledge in the category, but not by a large percentage of the knowledge in the other cat-
egories. This means that, for instance, procedural knowledge systematically (and statistically)
has different characteristics than the other kinds of knowledge.
EFTA00623940
9.3 Embodied Communication
165
The NKC assumption seems commonsensically to hold true for human everyday knowledge,
and it has fairly dramatic implications for general intelligence. Suppose we conceive general
intelligence as the ability to achieve goals in the environment shared by the communicating
agents underlying the Embodied Communication Prior. Then, NKC suggests that the best way
to achieve general intelligence according to the Embodied Communication Prior is going to
involve
• specialized methods for handling declarative, procedural, sensory and attentional knowledge
(due to the naturalness of the isolated knowledge categories)
• specialized methods for handling interactions between different types of knowledge, includ-
ing methods focused on the case where one type of knowledge is primary and the others are
supporting (the latter due to the naturalness of the interactive knowledge categories)
9.3.0.2 Cognitive Completeness
Suppose we conceive an Al system as consisting of a set of learning capabilities, each one
characterized by three features:
• One or more knowledge types that it is competent to deal with, in the sense of the two
key learning problems mentioned above
• At least one learning type: either analysis, or synthesis, or both
• At least one interaction type, for each (knowledge type, learning type) pair it handles:
"isolated" (meaning it deals mainly with that knowledge type in isolation), or "interactive"
(meaning it focuses on that knowledge type but in a way that explicitly incorporates other
knowledge types into its process), or "fully mixed" (meaning that when it deals with the
knowledge type in question, no particular knowledge type tends to dominate the learning
process).
Then, intuitively, it seems to follow from the ECP with NKC that systems with high efficient
general intelligence should have the following properties, which collectively we'll call cognitive
completeness:
• For each (knowledge type, learning type, interaction type) triple, there should be a learning
capability corresponding to that triple.
• Furthermore the capabilities corresponding to different (knowledge type, interaction type)
pairs should have distinct characteristics (since according to the NKC the isolated knowledge
corresponding to a knowledge type is a natural category, as is the dominant knowledge
corresponding to a knowledge type)
• For each (knowledge type, learning type) pair (K,L), and each other knowledge type K1
distinct from K, there should be a distinctive capability with interaction type "interactive"
and dealing with knowledge that is interactively K but also includes aspects of K1
Furthermore, it seems intuitively sensible that according to the ECP with NKC, if the ca-
pabilities mentioned in the above points are reasonably able, then the system possessing the
capabilities will display general intelligence relative to the ECP. Thus we arrive at the hypothesis
that
EFTA00623941
166
9 General Intelligence in the Everyday Human World
Under the assumption of the Embodied Communication Prior (with the Natural
Knowledge Categories assumption), the property above called "cognitive complete-
ness" is necessary and sufficient for efficient general intelligence at the level of an
inteligent adult human (e.g. at the Piagetan formal level IP ia531).
Of course, the above considerations are very, far from a rigorous mathematical proof (or
even precise formulation) of this hypothesis. But we are presenting this here as a conceptual
hypothesis, in order to qualitatively guide our practical AGI R&D and also to motivate further,
more rigorous theoretical work.
9.8.1 Generalizing the Embodied Communication Prior
One interesting direction for further research would be to broaden the scope of the inquiry, in
a manner suggested above: instead of just looking at the ECP, look at simplicity measures in
general, and attack the question of how a mind must be structured in order to display efficient
general intelligence relative to a specified simplicity measure. This problem seems unapproach-
able in general, but some special cases may be more tractable.
For instance, suppose one has
• a simplicity measure that (like the ECP) is approximately decomposable into a set of fairly
distinct components, plus their interactions
• an assumption similar to NKC, which states that the entities displaying simplicity according
to each of the distinct components, are roughly clustered together in entity-space
Then one should be able to say that, to achieve efficient general intelligence relative to
this decomposable simplicity measure, a system should have distinct capabilities corresponding
to each of the components of the simplicity measure interactions between these capabilities,
corresponding to the interaction terms in the simplicity measure.
With copious additional work, these simple observations could potentially serve as the seed for
a novel sort of theory of general intelligence - a theory of how the structure of a system depends
on the structure of the simplicity measure with which it achieves efficient general intelligence.
Cognitive Synergy Theory would then emerge as a special case of this more abstract theory.
9.4 Naive Physics
Multimodal communication is an important aspect of the environment for which human in-
telligence evolved - but not the only one. It seems likely that our human intelligence is also
closely adapted to various aspects of our physical environment - a matter that is worth carefully
attending as we design environments for our robotically or virtually embodied AGI systems to
operate in.
One interesting guide to the most cognitively relevant aspects of human environments is the
subfield of AI known as "naive physics" illw85I - a term that refers to the theories about the
physical world that human beings implicitly develop and utilize during their lives. For instance,
EFTA00623942
9.4 Naive Physics
167
when you figure out that you need to pressure the knife slightly harder when spreading peanut
butter rather than jelly, you're not making this judgment using Newtonian physics or the
Navier-Stokes equations of fluid dynamics; you're using heuristic patterns that you figured out
through experience. Maybe you figured out these patterns through experience spreading peanut
butter and jelly in particular. Or maybe you figured these heuristic patterns out before you ever
tried to spread peanut butter or jelly specifically, via just touching peanut butter and jelly to
see what they feel like, and then carrying out inference based on your experience manipulating
similar tools in the context of similar substances.
Other examples of similar "naive physics" patterns are easy to come by, e.g.
1. What goes up must come down.
2. A dropped object falls straight down.
3. A vacuum sucks things towards it.
4. Centrifugal force throws rotating things outwards.
5. An object is either at rest or moving, in an absolute sense.
6. Two events are simultaneous or they are not.
7. When running downhill, one must lift one's knees up high.
8. When looking at something that you just barely can't discern accurately, squint.
Attempts to axiomatically formulate naive physics have historically come up short, and we
doubt this is a promising direction for AGI. However, we do think the naive physics literature
does a good job of identifying the various phenomena that the human mind's naive physics deals
with. So, from the point of view of AGI environment design, naive physics is a useful source
of requirements. Ideally, we would like an AGI's environment to support all the fundamental
phenomena that naive physics deals with.
We now describe some key aspects of naive physics in a more systematic manner. Naive
physics has many different formulations; in this section we draw heavily on JSC9Ij, who divide
naive physics phenomena into 5 categories. Here we review these categories and identify a
number of important things that humanlike intelligent agents must be able to do relative to
each of them.
9.4.1 Objects, Natural Units and Natural Kinds
One key aspect of naive physics involves recognition of various aspects of objects, such as:
1. Recognition of objects amidst noisy perceptual data
2. Recognition of surfaces and interiors of objects
3. Recognition of objects as manipulable units
4. Recognition of objects as potential subjects of fragmentation (splitting, cutting) and of
unification (gluing, bonding)
5. Recognition of the agent's body as an object, and as parts of the agent's body as objects
6. Division of universe of perceived objects into "natural kinds", each containing typical and
atypical instances
EFTA00623943
168
9 General Intelligence in the Everyday Human World
9.4.2 Events, Processes and Causality
Specific aspects of naive physics related to temporality and causality are:
1. Distinguishing roughly-subjectively-instantaneous events from extended processes
2. Identifying beginnings, endings and crossings of processes
3. Identifying and distinguishing internal and external changes
4. Identifying and distinguishing internal and external changes relative to one's own body
5. Interrelating body-changes with changes in external entities
Notably, these aspects of naive physics involve a different processes occurring on a variety of
different time scales, intersecting in complex patterns, and involving processes inside the agent's
body, outside the agent's body, and crossing the boundary, of the agent's body.
9.4.5 Stuffs, States of Matter, Qualities
Regarding the various states of matter, some important aspects of naive physics are:
1. Perceiving gaps between objects: holes, media, illusions like rainbows, mirages and holo-
grams
2. Distinguishing the manners in which different sorts of entities (e.g. smells, sounds, light) fill
space
3. Distinguishing properties such as smoothness, roughness, graininess, stickiness, runniness,
etc.
4. Distinguishing degrees of elasticity and fragility
5. Assessing separability of aggregates
9.4.4 Surfaces, Limits, Boundaries, Media
Gibson IGit)77, Gil)791 has argued that naive physics is not mainly about objects but rather
mainly about surfaces. Surfaces have a variety of aspects and relationships that are important
for naive physics, such as:
1. Perceiving and reasoning about surfaces as two-sided or one-sided interfaces
2. Inference of the various ecological laws of surfaces
3. Perception of various media in the world as separated by surfaces
4. Recognition of the textures of surfaces
5. Recognition of medium/surface layout relationships such as: ground, open environment,
enclosure, detached object, attached object, hollow object, place, sheet, fissure, stick, fibre,
dihedral, etc.
As a concrete, evocative "toy" example of naive everyday knowledge about surfaces and
boundaries, consider Sloman's iSloOSal example scenario, depicted in Figure 9.1 and drawn
largely from ISS71] (see also related discussion in )S10081)J, in which "A child can be given one
EFTA00623944
9.4 Naive Physics
Fig. 9.1: One of Sloman's example test domains for real-world inference. Left: a number of pins
and a rubber band to be stretched around them. Right: use of the pins and rubber hand to
make a letter T.
or more rubber bands and a pile of pins, and asked to use the pins to hold the band in place to
form a particular shape)... For example, things to be learnt could include":
1. There is an area inside the band and an area outside the band.
2. The possible effects of moving a pin that is inside the band towards or further away
front other pins inside the band. (The effects can depend on whether the band is already
stretched.)
3. The passible effects of moving a pin that is outside the band towards or further away front
other pins inside the hand.
4. The passible effects of adding a new pin, inside or outside the band, with or without pushing
the band sideways with the pin first.
5. The possible effects of removing a pin, from a position inside or outside the band.
6. Patterns of motion/change that can occur and how they affect local and global shape
(e.g. introducing a concavity or convexity, introducing or removing symmetry, increasing or
decreasing the area enclosed).
7. The possibility of causing the band to cross over itself. (NB: Is an odd number of crosses
passible?)
8. How adding a second, or third band can enrich the space of structures, processes and effects
of processes.
9.4.5 What Kind of Physics Is Needed to Foster Human-like
Intelligence?
We stated above that we would like an AGI's environment to support all the fundamental phe-
nomena that naive physics deals with; and we have now reviewed a number of these specific
phenomena. But it's not entirely clear what the "fundamental" aspects underlying these phe-
nomena are. One important question in the environment-design context is how close an AGI
environment needs to stick to the particulars of real-world naive physics. Is it important that a
young AGI can play with the specific differences between spreading peanut butter versus jelly?
Or is it enough that it can play with spreading and smearing various substances of different
consistencies? How close does the analogy between an AGI environment's naive physics and
EFTA00623945
170
9 Cenral Intelligence in the Everyday Human World
real-world naive physics need to be? This is a question to which we have no scientific answer at
present. Our own working hypothesis is that the analogy does not need to be extremely close,
and with this in mind in Chapter 16 we propose a virtual environment BlocksNBeadsWorld
that encompasses all the basic conceptual phenomena of real-world naive physics, but does not
attempt to emulate their details.
Framed in terms of human psychology rather than environment design, the question be-
comes: At what level of detail must one model the physical world to understand the ways in
which human intelligence has adapted to the physical world?. Our suspicion, which underlies
our BlocksNBeadsWorld design, is that it's approximately enough to have
• Newtonian physics, or some close approximation
• Matter in multiple phases and forms vaguely similar to the ones we see in the real world:
solid, liquid, gas, paste, goo, etc.
• Ability to transform some instances of matter from one form to another
• Ability to flexibly manipulate matter in various forms with various solid tools
• Ability to combine instances of matter into new ones in a fairly rich way: e.g. glue or tie
solids togethermix liquids together, etc.
• Ability to position instances of matter with respect to each other in a rich way: e.g. put
liquid in a solid cavity, cover something with a lid or a piece of fabric, etc.
It seems to us that if the above are present in an environment, then an AGI seeking to
achieve appropriate goals in that environment will be likely to form an appropriate "human-
like physical-world intuition." We doubt that the specifics of the naive physics of different
forms of matter are critical to human-like intelligence. But, we suspect that a great amount
of unconscious human metaphorical thinking is conditioned on the fact that humans evolved
around matter that takes a variety of forms, can be changed from one form to another, and can
be fairly easily arranged and composited to form new instances from prior ones. Without many
diverse instances of matter transformation, arrangement and composition in its experience, an
AGI is unlikely to form an internal "metaphor-base" even vaguely similar to the human one -
so that, even if it's highly intelligent, its thinking will be radically non-human-like in character.
Naturally this is all somewhat speculative and must be explored via experimentation. Maybe
an elaborate blocks-world with only solid objects will be sufficient to create human-level. roughly
human-like AGI with rich spatiotemporal and manipulative intuition. Or maybe human intel-
ligence is more closely adapted to the specifics of our physical world - with water and dirt and
plants and hair and so forth - than we currently realize. One thing that is very clear is that, as
we proceed with embodying, situating and educating our AGI systems, we need to pay careful
attention to the way their intelligence is conditioned by their environment.
9.5 Folk Psychology
Related to naive physics is the notion of "naive psychology" or "folk psychology" IRav041, which
includes for instance the following aspects:
1. Mental simulation of other agents
2. Mental theory regarding other agents
3. Attribution of beliefs, desires and intentions (BDI) to other agents via theory, or simulation
EFTA00623946
9.6 Body and Mind
171
4. Recognition of emotions in other agents via their physical embodiment
5. Recognition of desires and intentions in other agents via their physical embodiment
6. Analogical and contextual inferences between self and other, regarding BDI and other as-
pects
7. Attribute causes and meanings to other agents behaviors
8. Anthropomorphize non-human, including inanimate objects
The main special requirement placed on an AGI's embodiment by the above aspects pertains
to the ability of agents to express their emotions and intentions to each other. Humans do this
via facial expressions, gestures and language.
9.5.1 Motivation, Requiredness, Value
Relatedly to folk psychology, Gestalt [Koh:181 and ecological rib77, Gili7S1 psychology suggest
that humans perceive the world substantially in terms of the affordances it provides them for
goal-directed action. This suggests that, to support human-like intelligence, an AGI must be
capable of:
1. Perception of entities in the world as differentially associated with goal-relevant value
2. Perception of entities in the world in terms of the potential actions they afford the agent,
or other agents
The key point is that entities in the world need to provide a wide variety of ways for agents
to interact with them, enabling richly complex perception of affordances.
9.6 Body and Mind
The above discussion has focused on the world external to the body of the AGI agent embodied
and embedded in the world, but the issue of the AGI's body also merits consideration. There
seems little doubt that a human's intelligence is highly conditioned by the particularities of the
human body.
9.6.1 The Human Sensorium
Here the requirements seem fairly simple: while surely not strictly necessary, it would certainly
be preferable to provide an AGI with fairly rich analogues of the human senses of touch, sight,
sound, kinesthesia, taste and smell. Each of these senses provides different sorts of cognitive
stimulation to the human mind: and while similar cognitive stimulation could doubtless be
achieved without analogous senses, the provision of such seems the most straightforward ap-
proach. It's hard to know how much of human intelligence is specifically biased to the sorts of
outputs provided by human senses.
As vision already is accorded such a prominent role in the AI and cognitive science literature
- and is discussed in moderate depth in Chapter 26 of Part 2, we won't take time elaborating
EFTA00623947
172
9 General Intelligence in the Everyday Human World
on the importance of vision processing for humanlike cognition. The key thing an AGI requires
to support humanlike "visual intelligence" is an environment containing a sufficiently robust
collection of materials that object and event recognition and identification become interesting
problems.
Audition is cognitively valuable for many reasons, one of which is that it gives a very rich
and precise method of sensing the world that is different from vision. The fact that humans can
display normal intelligence while totally blind or totally deaf is an indication that, in a sense,
vision and audition are redundant for understanding the everyday world. However, it may be
important that the brain has evolved to account for both of these senses, because this forced it
to account for the presence of two very rich and precise methods of sensing the world - which
may have forced it to develop more abstract representation mechanisms than would have been
necessary with only one such method.
Touch is a sense that is, in our view, generally badly underappreciated within the Al commu-
nity. In particular the cognitive robotics community seems to worry too little about the terribly
impoverished sense of touch possessed by most current robots (though fortunately there are
recent technologies that may help improve robots in this regard; see e.g. [Natal). Touch is how
the human infant learns to distinguish self from other, and in this way it is the most essential
sense for the establishment of an internal self-model. Touching others' bodies is a key method
for developing a sense of the emotional reality and responsiveness of others, and is hence key to
the development of theory of mind and social understanding in humans. For this reason, among
others, human children lacking sufficient tactile stimulation will generally wind up badly im-
paired in multiple ways. A good-quality embodiment should supply an AI agent with a body
that possesses skin, which has varying levels of sensitivity on different parts of the skin (so that
it can effectively distinguish between reality and its perception thereof in a tactile context);
and also varying types of touch sensors (e.g. temperature versus friction), so that it experiences
textures as multidimensional entities.
Related to touch, kinesthesia refers to direct sensation of phenomena happening inside the
body. Rarely mentioned in AI, this sense seems quite critical to cognition, as it underpins many
of the analogies between self and other that guide cognition. Again, it's not important that an
AGI's virtual body have the same internal body parts as a human body. But it seems valuable
to have the AGI's virtual body display some vaguely human-body-like properties, such as feeling
internal strain of various sorts after getting exercise, feeling discomfort in certain places when
running out of energy, feeling internally different when satisfied versus unsatisfied, etc.
Next, taste is a cognitively interesting sense in that it involves the interplay between the
internal and external world; it involves the evaluation of which entities from the external world
are worthy of placing inside the body. And smell is cognitively interesting in large part because
of its relationship with taste. A smell is, among other things, a long-distance indicator of what
a certain entity might taste like. So, the combination of taste and smell provides means for
conceptualizing relationships between self, world and distance.
9.6.2 The Human Body's Multiple Intelligences
While most unique aspect of human intelligence is rooted in what one might call the "cognitive
cortex" - the portions of the brain dealing with self-reflection and abstract thought. But the
cognitive cortex does its work in close coordination with the body's various more specialized
EFTA00623948
9.6 Body and Mind
173
intelligent subsystems, including those associated with the gut, the heart, the liver, the immune
and endocrine systems, and the perceptual and motor cortices.
In the perspective underlying this book, the human cognitive cortex - or the core cognitive
network of any roughly human-like AGI system - should be viewed as a highly flexible, self-
organizing network. These cognitive networks are modelable e.g. as a recurrent neural net with
general topology, or a weighted labeled hypergraph, and are centrally concerned with recognizing
patterns in its environment and itself, especially patterns regarding the achievement of the
system's goals in various appropriate contexts. Here we augment this perspective, noting that
the human brain's cognitive network is closely coupled with a variety of simpler and more
specialized intelligent "body-system networks" which provide it with structural and dynamical
inductive biasing. We then discuss the implications of this observation for practical AGI design.
One recalls Pascal's famous quote "The heart has its reasons, of which reason knows not."
As we now know, the intuitive sense that Pascal and so many others have expressed, that the
heart and other body systems have their own reasons, is grounded in the fact that they actually
do carry out simple forms of reasoning (i.e. intelligent, adaptive dynamics), in close, sometimes
cognitively valuable, coordination with the central cognitive network.
9.6.2.1 Some of the Human Body's Specialized Intelligent Subsystems
The human body contains multiple specialized intelligences apart from the cognitive cortex.
Here we review some of the most critical.
Hierarchies of Visual and Auditory Perception
. The hierarchical structure of visual and auditory cortex has been taken by some researchers
'Kuril, II II396] as the generic structure of cognition. While we suspect this is overstated, we
agree it is important that these cortices nudge large portions of the cognitive cortex to assume
an approximately hierarchical structure.
Olfactory Attractors
. The process of recognizing a familiar smell is grounded in a neural process similar to con-
vergence to an attractor in a nonlinear dynamical system Wre951. There is evidence that the
mammalian cognitive cortex evolved in close coordination with the olfactory cortex 'Rowlib
and much of abstract cognition reflects a similar dynamic of gradually coming to a conclusion
based on what initially "smells right."
Physical and Cognitive Action
. The cerebellum, a specially structured brain subsystem which controls motor movements,
has for some time been understood to also have involvement in attention, executive control,
language, working memory, learning, pain, emotion, and addiction IPS1:09].
EFTA00623949
174
9 General Intelligence in the Everyday Human World
The Second Brain
. The gastrointestinal neural net contains millions of neurons and is capable of operating inde-
pendently of the brain. It modulates stress response and other aspects of emotion and motivation
based on experience - resulting in so-called "gut feelings" IGer091.
The Heart's Neural Network
. The heart has its own neural network, which modulates stress response, energy level and
relaxation/excitement (factors key to motivation and emotion) based on experience lArm0-11.
Pattern Recognition and Memory in the Liver
. The liver is a complex pattern recognition system, adapting via experience to better identify
toxins [C13061. Like the heart, it seems to store sonic episodic memories as well, resulting in liver
transplant recipients sometimes acquiring the tastes in music or sports of the donor EENICP21.
Immune Intelligence
. The immune network is a highly complex, adaptive self-organizing system, which ongoingly
solves the learning problem of identifying antigens and distinguishing them from the body
system IIT861. As immune function is highly energetically costly, stress response involves subtle
modulation of the energy allocation to immune function, which involves communication between
neural and immune networks.
The Endocrine System: A Key Bridge Between Mind and Body
. The endocrine (hormonal) system regulates (and is related by) emotion, thus guiding all
aspects of intelligence (due to the close connection of emotion and motivation) [PH121.
Breathing Guides Thinking
. As oxygenation of the brain plays a key role in the spread of neural activity, the flow of breath
is a key driver of cognition. Forced alternate nostril breathing has been shown to significantly
affect cognition via balancing activity of the two brain hemispheres ISKI313911.
Much remains unknown, and the totality of feedback loops between the human cognitive
cortex and the various specialized intelligences operative throughout the human body, has not
yet been thoroughly charted.
EFTA00623950
9.6 Body and Mind
175
9.6.2.2 Implications for AGI
What lesson should the AGI developer draw from all this? The particularities of the human
mind/body should not be taken as general requirements for general intelligence. However, it
is worth remembering just how difficult is the computational problem of learning, based on
experiential feedback alone, the right way to achieve the complex goal of controlling a system
with general intelligence at the human level or beyond. To solve this problem without some sort
of strong inductive biasing may require massively more experience than young humans obtain.
Appropriate inductive bias may be embedded in an AGI system in many different ways.
Some AGI designers have sought to embed it very explicitly, e.g. with hand-coded declarative
knowledge as in Cyc, SOAR and other "GOFAI" type systems. On the other hand, the human
brain receives its inductive bias much more subtly and implicitly, both via the specifics of the
initial structure of the cognitive cortex, and via ongoing coupling of the cognitive cortex with
other systems possessing more focused types of intelligence and more specific structures and/or
dynamics.
In building an AGI system, one has four choices, very broadly speaking:
1. Create a flexible mind-network, as unbiased as feasible, and attempt to have it learn how
to achieve its goals via experience
2. Closely emulate key aspects of the human body along with the human mind
3. Imitate the human mind-body, conceptually if not in detail, and create a number of struc-
turally and dynamically simpler intelligent systems closely and appropriately coupled to
the abstract cognitive mind-network, provide useful inductive bias.
4. Find some other, creative way to guide and probabilistically constrain one's AGI system's
mind-network, providing inductive bias appropriate to the tasks at hand, without emulating
even conceptually the way the human mind-brain receives its inductive bias via coupling
with simpler intelligent systems.
Our suspicion is that the first option will not be viable. On the other hand, to do the second
option would require more knowledge of the human body than biology currently pmsPsses. This
leaves the third and fourth options, both of which seem viable to us.
CogPrime incorporates a combination of the third and fourth options. CogPrime's generic
dynamic knowledge store, the Atomspace, is coupled with specialized hierarchical networks
(DeSTIN) for vision and audition, somewhat mirroring the human cortex. An artificial en-
docrine system for OpenCog is also under development, speculatively, as part of a project using
OpenCog to control humanoid robots. On the other hand, OpenCog has no gastrointestinal nor
cardiological nervous system, and the stress-response-based guidance provided to the human
brain by a combination of the heart, gut, immune system and other body systems, is achieved
in CogPrime in a more explicit way using the OpenPsi model of motivated cognition, and its
integration with the system's attention allocation dynamics.
Likely there is no single correct way to incorporate the lessons of intelligent htunan body-
system networks into AGI designs. But these are aspects of human cognition that all AGI
researchers should be aware of.
EFTA00623951
176
9 Cenral Intelligence in the Everyday Human World
9.7 The Extended Mind and Body
Finally, Hutchins iihn951, Logan liogun and others have promoted a view of human intelli-
gence that views the human mind as extended beyond the individual body, incorporating social
interactions and also interactions with inanimate objects, such as tools, plants and animals.
This leads to a number of requirements for a humanlike AGI's environment:
1. The ability to create a variety of different tools for interacting with various aspects of the
world in various different ways, including tools for making tools and ultimately machinery
2. The existence of other mobile, virtual life-forms in the world, including simpler and less
intelligent ones, and ones that interact with each other and with the AGI
3. The existence of organic growing structures in the world, with which the AGI can interact
in various ways, including halting their growth or modifying their growth pattern
How necessary these requirements are is hard to say - but it is clear that these things have
played a major role in the evolution of human intelligence.
9.8 Conclusion
Happily, this diverse chapter supports a simple, albeit tentative conclusion. Our suggestion is
that, if an AGI is
• placed in an environment capable of roughly supporting multimodal communication and
vaguely (but not nectsbarily precisely) real-world-ish naive physics
• surrounded with other intelligent agents of varying levels of complexity, and other complex,
dynamic structures to interface with
• given a body that can perceive this environment through some forms of sight, sound and
touch; and perceive itself via some form of kinesthesia
• given a motivational system that encourages it to make rich use of these aspects of its
environment
then the AGI is likely to have an experience-base reinforcing the key inductive biases provided
by the everyday world for the guidance of humanlike intelligence.
EFTA00623952
Chapter 10
A Mind-World Correspondence Principle
10.1 Introduction
Real-world minds are always adapted to certain classes of environments and goals. The ideas
of the previous chapter, regarding the connection between a human-like intelligence's internals
and its environment, result from exploring the implications of this adaptation in the context
of the cognitive synergy concept. In this chapter we explore the mind-world connection in a
broader and more abstract way - making a more ambitious attempt to move toward a "general
theory of general intelligence."
One basic premise here, as in the preceding chapters is: Even a system of vast general
intelligence, subject to real-world space and time constraints, will necessarily be more efficient
at some kinds of learning than others. Thus, one approach to formulating a general theory of
general intelligence is to look at the relationship between minds and worlds - where a "world"
is conceived as an environment and a set of goals defined in terms of that environment.
In this spirit, we here formulate a broad principle binding together worlds and the minds that
are intelligent in these worlds. The ideas of the previous chapter constitute specific, concrete
instantiations of this general principle. A careful statement of the principle requires introduction
of a number of technical concepts, and will be given later on in the chapter. A crude, informal
version of the principle would be:
For a mind to work intelligently toward certain goats in a certain world, there should be a
nice mapping from goal-directed sequences of world-states into sequences of mind-states, where
"nice" means that a world-state-sequence IV composed of two parts WI and W2, gets mapped
into a mind-state-sequence Al composed of two corresponding parts MI and M2.
What's nice about this principle is that it relates the decomposition of the world into parts,
to the decomposition of the mind into parts.
177
EFTA00623953
178
10 A Mind-World Correspondence Principle
10.2 What Might a General Theory of General Intelligence Look
Like?
It's not clear, at this point, what a real "general theory of general intelligence" would look like
- but one tantalizing passibility is that it might confront the two questions:
• How does one design a world to foster the development of a certain sort of mind?
• How does one design a mind to match the particular challenges posed by a certain sort of
world?
One way to achieve this would be to create a theory that, given a description of an environment
and some associated goals, would output a description of the structure and dynamics that a
system should possess to be intelligent in that environment relative to those goals, using limited
computational resources.
Such a theory would serve a different purpose from the mathematical theory of "universal
intelligence" developed by Marcus Hither
lutOrd and others. For all its beauty and theoreti-
cal power, that approach currently gives it useful conclusions only about general intelligences
with infinite or infeasibly massive computational resources. On the other hand, the approach
suggested here is aimed toward creation of a theory of real-world general intelligences utilizing
realistic amounts of computational power, but still possessing general intelligence comparable
to human beings or greater.
T
[truncated]