“If you don't control your mind, someone else will.” John Allston.
Recently, my microwave oven developed agency. It started making annoying noises in the middle of the night, and it always seemed to know when I was there, because like the observer effect in quantum mechanics - it would always pull its socks up and be quiet when I was physically in the room! I set up a camera to study this new creature, I wanted to discover what it wanted - what its goals were, and perhaps even, if it was sentient.
This is the accompanying article to our recent show on agency on MLST.
Caveat emptor: agency doesn't imply non-determinism, this article does not take any position on the free will question.
Language models have no agency, duh
Warning: “galaxy brain” thoughts coming your way.
Every time you use an LLM, you are sequestering your agency. Language models don’t have any agency, you do. Language models are not things in the world, but they can be seen as virtual things using our (forthcoming) definition of nonvirtual agents but they are much more sclerotic than memes. Language models only become embedded in cognitive processes when you use them.
Is GPT “understanding-promoting” or “understanding-procrastinating”?
ChatGPT allows you to do slightly more work, faster than you otherwise could without understanding, but as soon as you need to take the task seriously, then you need to make up the ground you lost by using GPT in the first place. Ultimately, there is no substitute for understanding. The debt must always be paid, if not by you, then by someone else.
One of the reasons highly technical people get so much out of GPT models is that they already understand technical topics deeply, and can quickly discard incoherent confabulations and drive into detail. The “denovo” story is a bit different. A beginner doesn’t know what they don’t know, and hence — wouldn’t know how to query ChatGPT. I use the word query intentionally, because it is akin to querying a database. I like the paintbrush analogy from Gary Marcus too, but the database analogy is better, for reasons I might write about in the future.
Do you ever find yourself in the situation where you have “written” a significant amount of code, it compiles, and you only vaguely know what it supposed to do — you have no earthly clue how it works because you’ve been tapping the generate button on GPT-4 like a mindless moron?
Just like it is with the phenomenon of change blindness / selective attention — when you are so focussed on GPT garbage, the part of your brain which understands, plans and acts is impaired. You try and switch between the modes of cognition dynamically, much like passing a metaphorical baton. It’s an inefficient process and is the worst form of context switching.
(Video source from YouTube). This famous gorilla experiment demonstrates the concept of selective attention in humans.
There is a good chance that GenAI will dumb-down software engineering in much the same way that “Doomscrolling” on TikTok is making our kids dumber. TikTok might be a bad example though, as you could argue it’s an entropy-maximizer rather than Github Copilot which is clearly an entropy-minimizer. TikTok is quite similar to Kenneth Stanley’s picbreeder!
Have you ever heard yourself say to your colleagues “this is blowing up in complexity guys!” — and they frown and assume you didn’t understand it? Part of understanding is constructing and linking together reference frames of knowledge to create a high-fidelity shared world model. Generally speaking, if your model is complex, you don’t understand. That’s why we don’t understand intelligence by the way, everyone has a different model for it.
When you are in the middle of the Mandelbrot set you’re okay. Everything is nice and predictable. You’re in a warm fuzzy place. This is where GPT lives. GPT is trained to be entropy-minimizing. In GPT-land you will only ever see boilerplate-city, a sea of mundane repetition. It’s the job of the prompter to get the model as close to the entropic boundary as possible.
When you “automate” a language model on a schedule, it’s no different to using any other piece of code running on a schedule. The magic disappears doesn’t it? That’s because when you use the language model, it’s embedded into your cognitive nexus of planning, understanding and creativity. As soon as it runs autonomously, it discombobulates, it dissolves, it decoheres. Look at this example of me repeatedly driving RunwayML on our MLST pineapple logo. See what happens? When this is an interactive process such as with ChatGPT, we push towards the entropic chaos on the boundaries, but it’s physically grounded every step of the way. That’s why we can coax more entropy out of GPT, by pushing it again and again towards the boundaries.
There is nothing but complexity on the boundaries. It represents the tyrannical sea of edge cases. We can traverse to the edges in an infinite number of directions. The curious thing is that we maintain coherence. Autonomous AIs decohere — and we don’t, because we have agency, and AI’s don’t. This is quite similar to the concept of the “wall of fire” in the Battlefield Royale Games which appears when the time is running out in a game. AI models don’t like high entropy/information — you have to push them as hard as you can to the boundaries to complexify their output with clever prompting (i.e. entropy smuggling), and if you are not there to do this interactively, they will “mode collapse” (the wall of fire) into a sea of nothingness.
Does a scriptwriter for a film have a goal? Does the director have a goal? What is the directors goal? What is the editors goal? Is what emerges in the end the same thing as what the scriptwriter imagined?
When I edit videos, I know that the direction of the video emerges from the material, which emerges from situation of whence the material was filmed because it’s a materialisation of a physical cognitive process. It’s impossible for me to even think about the film without the material present, because as Andy Clark argued in supersizing the mind — the material not only extends my mind but also forms the basis of my agency.
Maxwell Ramstead’s “Precis” on the Free Energy Principle.
The free energy principle is a mathematical principle that allows us to describe the time evolution of coupled random dynamical systems. The perspective opened up by the FEP starts from the question of what it means to be an observable thing in our physical universe
.. at least, according to Dr. Maxwell Ramstead who recently published “The free energy principle - a precis”.
I really recommend reading it actually, it’s only a five minute read — it’s enlightening.
I highlighted a few things from it last night, which I am recapitulating here to help set the frame for what’s coming next on agency.
He said that the FEP allows us to partition a system of interest into “particles” or “things” which can be distinguished from other things.
The free energy principle rests on sparse coupling, that is, the idea that “things” can be defined in terms of the absence of direct influence between subsets of a system.
Subsets of the system, are the things, if they exhibit certain properties.
Markov blankets formalize a specific kind of sparseness or separation between things in the system, which help us identify the things from the non-things.
Ramstead notes that this self-similar pattern repeats at every scale at which things can be observed—from rocks to rockstars, and beyond.
He said that, the free energy principle tells us something deep about what it means for things to be coupled to each other, but also distinct from one another. It implies that any thing which exists physically—in the sense that it can be reliably re-identified over time as “the same thing,” will necessarily look as if it “tracks” the things to which it is coupled.
Abductive inference in active inference
He said that this “tracking” behaviour is formalised as a kind of abductive inference (which is realised through approximate Bayesian inference).
Abductive inference, often referred to as inference to the best explanation, is a form of reasoning that starts with an observation and then seeks out or disambiguates the simplest and most likely explanation for it. It differs from deductive and inductive reasoning by focusing on generating and filtering hypotheses which could plausibly account for the observed data, rather than proving something with certainty (deduction) or predicting future instances based on past occurrences (induction). It's a kind of logical reasoning that brings us to a provisional, best-guess conclusion based on the evidence at hand. In my opinion, it’s the holy grail of AI and something which seems computationally intractable.
Maxwell is emphasising the idea that systems appear to engage in a process that looks like they're making inferences about their environment.
Active inference frames this behaviour in terms of Bayesian statistics. Systems are considered to have generative models—internal representations of how they think the world works—and they update these models based on sensory input in a way that is consistent with Bayesian updating. This updating process aims to minimise surprise by making the internal model more predictive of the sensory inputs the system receives from its environment. Of course, this is just an interpretation. The main point of contention in my opinion is whether the representations are indeed internal, or external, which is to say, diffused. Maxwell has previously argued for an integrated account of cognition in his paper Multiscale integration: beyond internalism and externalism but proponents of the free energy principle have wildly diverging interpretations of this as we will discuss more shortly.
There is a broader discussion about whether the Bayesian brain hypothesis is true, and whether the brain itself performs an intractable computation, and indeed, whether approximating it would be good enough when building an “AGI” system. This is certainly the case that VERSES are making, when they invoked the “‘AGI’ ‘Clause’” in the OpenAI charter recently, but more about that soon.
By characterising this process as a kind of abductive inference, Maxwell is not saying that the system is consciously reasoning or making deliberate logical inferences. Instead, he is using abductive inference as a metaphor for the kind of behaviour which emerges from the system following the FEP. It is, in a sense, a shorthand way of explaining the Bayesian brain hypothesis underlying active inference, which models systems as if they were a superposition of agents trying to figure out the most likely state of the world given their perception, and behaving in ways that would reduce their uncertainty or surprise.
Dynamics are simply formal descriptions of behavior.
Mechanics are mathematical theories that are developed to explain the functional form or “shape” of said dynamics, which is to say they explain why the dynamics are the way that we observe them to be.
The free energy principle is based on the idea that sparseness is key to thingness. The main idea is that thingness is defined in terms of what is not connected to what. Think of a box containing a yellow gas, as opposed to a box containing a rock. In a gas, we have strongly mixing dynamics: any molecule in the gas could find itself arbitrarily connected to any other, such that no persistent “thing” can be identified in the box. By contrast, a rock does not mix with or dissipate into its environment. In other words, the rock exists as a rock because it is disconnected from the rest of the system in a specific way.
Maxwell concluded that the free energy principle stands in contrast to approaches which would split the sphere of mind or life from the sphere of physical phenomena, such as some versions of the autopoietic enactive approach. He said that the FEP eschews all such distinctions and embraces a physics of thingness that ranges from subatomic particles to galaxy clusters—and every kind of thing in between.
So, when he says the free energy principle "eschews all such distinctions" he means it proposes a monistic view where everything is encompassed within the same physicalist framework without a need for a separate ontological category for mental or life-like processes. The thing is though - as I understand it, autopoetic enactivists are still monoists.
Autopoietic enactivism typically embraces a form of biological monism, where cognition and life are deeply entwined, and the organisational properties of living systems are central. However, it does not necessarily prioritise physical explanations in the same way as the FEP does.
Varela and Maturana, suggested that living systems are self-producing (or autopoietic) and maintain their identities through continuous interactions that enact or "bring forth" their world. This kind of monism could be seen as having an idealist flavour because it emphasizes the primacy of the living organism's experiential world and the constitutive role of its actions in shaping its reality.
But basically, enactivist frameworks tend to ascribe a special status to biological processes that are distinct from mere physical processes. This could read as a form of dual-aspect monism, which accepts only one substance (hence monism) but insists on fundamentally different aspects or properties, such as the physical and the experiential (though still resisting the complete mind-body dualism of Descartes).
Finally, He also argued that this philosophical perspective is not physicalist reductionism (which is to say a reduction of causal efficacy to “mere” physics)—but rather, a deep commitment to anti-reductionism, emphasising the causal contributions of things at every scale to the overall dynamics of the nested system.
Maxwell's statement could seem paradoxical at first glance—asserting that the Free Energy Principle (FEP) is not a form of physicalist reductionism despite being materialistic and placing everything within the framework of physics. However, there's a nuanced distinction to be made here between different flavours of materialism, and the type of reductionism he’s referring to.
Physicalist reductionism is often associated with the idea that all complex phenomena can and should be understood entirely in terms of their simplest physical components. This view suggests that the behaviour of higher-level systems (like organisms or societies) can be fully explained by reducing them to the interactions of their most basic physical parts (like atoms and molecules)—essentially a "bottom-up" approach which can overlook emergent properties that are not apparent when looking solely at these components in isolation.
In contrast, when Maxwell discusses the FEP as being committed to "anti-reductionism" — he appears to be aligning with a materialist view that recognises and seeks to account for the complex structures and behaviours of systems at all levels—acknowledging that these cannot simply be reduced to or fully predicted by their constituent parts. This aligns with the idea of emergent properties whereby higher levels of organisation (such as life, consciousness, or social structures) have characteristics that, although they arise from physical processes, are not readily explained by the laws governing lower levels of organisation.
So all of this makes sense so far, but the killer question is the relationship between things and agents. Are all things agents?
Realist or instrumental agency
The free energy principle (FEP) accommodates a degree of "agent-ness" as it applies to entities by considering to what extent they appear to be making inferences about their existence or manifesting an 'as if' behaviour that suggests goal-directedness and adaptivity.
The FEP does not strictly demand that a system's physical content be intrinsically linked to agency. Systems can be described as if they are performing inference —a notion echoed in concepts like 'organism-centered fictionalism or instrumentalism' where the agent is modelled as-if it were coupled bidirectionally with its environment while in fact being a fictionalist account.
Representations figure prominently in several human affairs. Human beings routinely use representational artifacts like maps to navigate their environments. Maps represent the terrain to be traversed, for an agent capable of reading it and of leveraging the information that it contains to guide their behavior. It is quite uncontroversial to claim that human beings consciously and deliberately engage in intellectual tasks, such as theorizing about causes and effects—which entails the ability to mentally think about situations and states of affairs. Most of us can see in our mind’s eye situations past, possible, and fictive, via the imagination and mental imagery, which are traditionally cast in terms of representational abilities.
In the cognitive sciences, neurosciences, and the philosophy of mind, the concept of representation has been used to try and explain naturalistically how the fundamental property of ‘aboutness’ or ‘intentionality’ emerges in living systems [1]. Indeed, living creatures must interact with the world in which they are embedded, and must distinguish environmental features and other organisms that are relevant for their survival, from those that are not. Living creatures act as if they had beliefs about the world, about its structure and its denizens, which guide their decision-making processes, especially with respect to the generation of adaptive action. This property of aboutness is thus a foundational one for any system that must make probabilistic inferences to support their decision-making in an uncertain world, which are central to the special issue to which this paper contributes.
(From Ramstead - Is the Free-Energy Principle a Formal Theory of Semantics? From Variational Density Dynamics to Neural and Phenotypic Representations)
Ramstead proposed a deflationary account of mental representation, according to which the explanatorily relevant contents of neural representations are mathematical, rather than cognitive, and a fictionalist or instrumentalist account, according to which representations are scientifically useful fictions that serve explanatory (and other) aims.
An important distinction highlighted here is that while systems might not be intentionally maximising or minimising their free energy (a concept associated with sentience or consciousness), they can be understood to behave as if they were under the modelling framework provided by maximum entropy—a principle originally introduced to inject probability into statistical physics, regardless of intentional behaviour.
Consequently, the "agent-ness" and "thing-ness" under the FEP are not necessarily disjoint concepts but rather different vantage points of an overarching framework that seeks to explain the behaviour of systems and agents in a consistent manner.
The interplay between being an agent and being a mere thing in the system lies in the degree to which the behaviour of the system is passive or active, inferred or imposed, and modelled or mechanistic within the FEP's theoretical landscape.
We will be getting my favourite AGI “doommongerer” Connor Leahy back on MLST soon. I naively thought that deconstructing goals as instrumental fictions would be a good way to disarm the Bostrom-esque “we are all going to die” type arguments. He assured me that almost all X-risk people already agreed with this position, and it wouldn’t move the needle!
Agents and active states
Riddhi said to me that one takeaway she had from the dialogue with Friston was the distinction of agents as systems having active states.
In free energy principle parlance, when we talk about agents and their active states, we're digging into the practical machinery of how systems—be they stones or sentient beings—navigate and respond to changes in their surroundings.
The term 'active states' refers to components of a system's Markov blanket, which, refers to a boundary of sorts between an agent and its environment. In these active states, the agent takes action that can modify its milieu. In more technical terms, the active states are parts of the system's internal dynamics that allow the system to exert some influence on environmental states, potentially changing them.
For an agent or system to have active states essentially means it's capable of interacting with the environment in a non-passive manner. It can do something that affects its external reality, however slight that might be. A stone basking in sunshine, while exceedingly simple with perhaps a single internal degree of freedom tracking its temperature, has a trivial active state—it irradiates heat. It's an inert system, lacking adaptivity, but even in its banal existence, the principle applies.
So, active states are rooted in how various systems—adaptive or otherwise—employ their capacities to either preserve their internal structure in the face of environmental dynamism or change in response to it.
So it's about recognising and delineating the precise ways in which systems maintain their very structure, from moment to moment and interaction to interaction, with the environment which envelops them.
Agential density and nonphysical/virtual agents
A system's 'agential density' might be analogised to the extent to which it actively regulates its internal states and interacts with the environment to fulfil its generative model—essentially acting upon its predictions and updating them in light of new sensory information. Systems with more complex and numerous active states that aim to minimise free energy through direct interaction with the environment could be thought of as having higher 'agential density' compared to simpler systems whose behaviour might be more reactive or less directed towards minimising free energy.
In the show we discussed the idea of virtual or nonphysical agents, and we cited examples of culture, memes, language or even evolution. Professor Friston was a little bit wary of this idea, preferring to stick with the physical interpretation.
We can think of virtual agents as having active states and acting on the world, but in a much more diffused and complex way.
A revised approach to 'agential density' for such entities would necessarily focus less on direct physical interaction with the environment and more on the influence these entities exert over behaviours and structures. Here, 'agential density' could be reinterpreted as the degree to which these nonphysical entities exhibit coherent, goal-directed behaviours and maintain their structure within a social or economic space.
Influence: The reach or impact a nonphysical agent has on the behaviour of other agents.
Cohesion: The internal consistency of the agent over time and across contexts. For a market, this might mean stable dynamics and patterns of trade; for cultural practices, it could refer to the integrity and resilience of those practices in the face of external changes.
Adaptability: The ability of a nonphysical agent to adjust or evolve in response to changes within the social or economic environment in which it exists. This might manifest as cultural innovation in response to technological change, or market adaptation to new regulatory landscapes.
Since culture and markets lack physical Markov blankets, their 'boundaries' can instead be seen in terms of the information and practices that define them. These boundaries are more fluid and permeable than physical ones, and 'active states' in these contexts may refer to decision-making processes, the dissemination of cultural artefacts, or market signals that influence economic behaviour.
In framing 'agency' in observer-relative terms for these nonphysical phenomena, an instrumentalist perspective emerges which would view markets and cultures as 'agents' insofar as they effectively coordinate individual actions and information processing towards collective outcomes. Going on the precis piece from Ramstead earlier, he made it clear that newer formulations of the free energy principle already support this type of configuration i.e. “dynamic dependences … not physical boundaries”.
The interesting part is that these virtual agents can influence and diffuse the behaviour and agency of the physical agents via top-down causation.
Judging from some of the early feedback to the YouTube show, I might have stepped on several landmines by suggesting that agency is more diffused than we think. Funny isn’t it, that even between devout adherents of the free energy principle you get a significant divergence of how people understand it.
Strange bedfellows
Prof. Mark Bishop spoke with us on an MLST interview in 2022. He is an autopoietic enactivist. The main difference between his position and the FEP folks is the focus on biology, autopoiesis (biological autonomy through self-producing and self-maintaining systems) and importantly, phenomenology (importance of subjective experience in understanding cognition). Mark discusses 4E cognition and, interestingly drew a distinction between that enquiry and the extended version from Clark and Chalmers, which he described as representationalism and functionalism “via the back door”.
I’ve noticed some political polarisation in FEP folks. Many seem to be web3.0 / libertarians, and a subset of those are singularitarians. The small core of intersecting Autopoetic enactivists presumably lean far-left. Because far-left people tend to be bad capitalists, you can guess which ones are in charge! Being “politically non-binary” myself, well, alright then, a “centrist” - I seem to be the odd one out.
Anyway. Why is this relevant?
In my opinion, the biggest shift in psychology between political right and left is ascribing agency to the individual (right) vs collective (left). Or in FEP terms, it means an “internalist-leaning” vs “externalist-leaning” account of cognition. Libertarians worry about the government stealing their agency because they think agency is pointillistic and malleable, people on the left think that we don’t have any agency anyway and governance should focus on causes rather than effects.
I got significant pushback, because there is still shedloads of “wiggle room” for how we interpret agency, even between FEP adherents.
Enactivists are generally phenomenologists, which is anathema to “we don’t need biology” physicalism. There is a significant tension between folks who think intelligence is grown vs designed or something in-between.
But the point is that having a position on agency is interpreted politically (it wasn’t intended in this way, and I genuinely didn’t think about it before it was pointed out in the YouTube comments).
AGI clause on OpenAI
VERSES recently invoked OpenAIs "assist" clause.
OpenAI’s charter says that "if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project."
OpenAIs definition of AGI is "highly autonomous systems that outperform humans at most economically valuable work".
VERSES did this without publishing any results, paid for an advert placement in the New York Times and it wasn’t even an “apples to apples” comparison, as their version of AI is quite different. In my opinion, this was at best a cynical PR stunt, and at worst AGI grifting.
I am on board with this idea of a cyberphysical ecosystem which builds on top of our natural one, but the result isn’t going to be particularly star-trek in my opinion, at least not in the way singularitarians think of it.
I feel this is where we should make a clear distinction between what researchers at VERSES calls "AGI" and what OpenAI calls "AGI". What I have gleaned from several interviews with the VERSES folks is that they are describing a "natural general intelligence" which is to say, a cyber physical ecosystem. The "physical" part is critical in my opinion as it implies it's self-limiting and situational i.e. not omniscient. OpenAI believes in the "Nick Bostrom school of intelligence" i.e. that it can all run on a computer (probably with tractable computation), and at a very abstract level (where you have clean, disentangled values, goals, actions etc not only in the fictionalist way we humans understand the system, but in how the system actually operates and is designed, divorced from high-resolution physical dynamics in the real world).
Bostromites tend to believe these systems can "recursively self-improve" and will almost certainly want to kill us all through this theory of "instrumental convergence" which means that systems with any end goal such as paperclip maximisation would likely share a common intermediate goal — and, surprise surprise, killing humans would be such a goal.
These are two wildly different conceptions of AGI, and invoking the assist clause on OpenAI's charter cynically conflates them together. Singularitarians, doomers etc tend to believe in the OpenAI or Bostrom conception of intelligence, as you might expect.
I recently had a chance to sit down with the CEO of VERSES (Gabriel René), see below.
Agents don’t actually plan
A lie told often enough becomes the truth — Vladimir, Lenin
The main argument I was making in the MLST show was an instrumentalist one — instead of individual agents independently and explicitly calculating all possible future trajectories, the process of planning and decision-making could be understood as a distributed computation which occurs within the broader system. This system encompasses not just the agents themselves but also their interactions and the mutual information shared among them. This distributed nature of computation implies that the cognitive load of planning is not solely borne by individual agents. Rather, it is a collective process that emerges from the dynamic interplay of various elements within the system.
To put it another way, the agents are not solely and independently responsible for all the intricate calculations involved in planning. They are part of a larger network where information and potential outcomes are processed in a more distributed and interconnected manner. This approach to understanding agency and planning challenges the traditional view of these processes as being wholly centralised within individual agents. It opens up a more systemic perspective, where the complex dynamics of the entire system contribute to the emergence of what we perceive as goal-directed behaviour and future-oriented planning.
Professor Friston explains this in technical terms at 10:00 (generalised synchronisation of chaos in biological self-organisation) - which he also exquisitely exposited in "Life as we know it" by Friston (2013)
There is no intentionality or goals in general in any physical process (in the way we humans understand them). Evolution serves as a wonderful example because most people would agree with Daniel Dennett's argument that it's a blind, algorithmic process that operates through natural selection and only appears to be teleological/directed. Guess what, humans are no different. Goals are how we think, not what is.
Consider the phenomenon of carcinization, the evolutionary process that has led unrelated species to converge on a crab-like form. Is this a mere quirk of nature, or does it signify a deeper behavioural or mechanistic pattern?
Carcinization is a form of "morphologically convergent" evolution, which is fascinating. It makes it appear as if there is planning in the process, when in fact, there isn’t. For some reason it’s obvious to us that evolution doesn’t plan, yet many cognitive scientists are convinced that we humans do. I see this as being closely related to Kenneth Stanley’s ideas on why greatness cannot be planned - he’s basically saying goals are epistemic road blocks. Perhaps Kenneth’s words are still ringing in my ears - he’s one of the most perceptive AI researchers I have ever encountered. Just like Kenneth Stanley’s Picbreeder, the real source of entropy is humans and physical processes, or processes which supervene on physical processes. Humans going about their daily lives have access to rich, diverse sources of entropy and help preserve it. To create any divergent and creative process in AI, you probably need to farm entropy from humans.
Why does this matter? Given what we have discussed so far, none of this should be controversial. So why does it trigger people?
AI researchers design with goals
There are several elephants in the room here. If goals are only an anthropocentric instrumental fiction, why do so many AI researchers, VERSES included - think that explicitly modelling them would lead to AGI? Almost all AI researchers "think in goals" whether they are arguing that humans will become extinct, or when designing or constraining what they believe to be proto-"AGI" systems.
Why do so many researchers agree that the reward is enough paper was teleologically misguided, yet think explicit goals are fair game? Rich Sutton was perhaps onto something with his bitter lesson essay - he warned against anthropocentrism in AI system design - ironically though he fell foul of his own advice.
Why assume that adding goal-directed constraints to an emergentist, self-organising principle would yield the same types of intelligent behaviour as in the natural world?
AIs which explicitly plan like AlphaGo, are not doing it the way it actually happens in physical reality. You could argue that it was "planning which explains carcinization". The reality is much more mundane (or complex depending on how you look at it!), but we use the lens of planning to explain the world.
It’s already obvious to me that adding a search process with a predefined goal on top of an LLM won’t create new knowledge. It’s preposterous. The mistake is to confuse recombinations with new knowledge. New knowledge is paradigmatically new, it’s inventive. Current AIs only search through a relatively tiny predefined space, and there are strict computational limitations on the space and the search process. The miracle of human cognition is that we can apparently overcome exponential complexity both in how we understand the world, and invent new things.
Many representationalist/internalist AI researchers do think that humans and perhaps animals do explicit planning with goals. Many would concede it’s more diffused “in the real world” but would still happily use “internalist” cognitive models for their AI work.
Active inference frames this behaviour in terms of Bayesian statistics. Systems are considered to have generative models—internal representations of how they think the world works—and they update these models based on sensory input in a way that is consistent with Bayesian updating. We will leave the structure learning part out for now, but that makes the job even harder to do in a “plausibly natural” way. But given this instrumental vs realist confusion, does the magic trick still work when explicitly realised?
My main argument for precluding the existence of AGI is that intelligence is at least a very complex physical process, or a process which supervenes on a physical process. I think it would be possible to achieve AGI as it exists with a very high resolution simulation of a physical process, which in my opinion would be computationally intractable and like Wolfram says when he talks about computational irreducibility, there are no shortcuts!
This is a video of Professor Kristinn R. Thorisson suggesting that he thinks goal-orientedness should be baked into our conception of intelligence.
We could go philosophical on whether goals ontologically "exist". The philosopher Philip Goff argues for an account of panagentialism in his new book “Why? The purpose of the universe” which is to say, perhaps agentialism is knitted into the universe itself, and all material has degrees of it, or perhaps even arises from it. I highly recommend this interview with Philip on the waking cosmos podcast, and below is a clip from an upcoming interview I did with him.
From a nativist and psychology perspective, I believe goals exist in us and shape how we think about things (for example Daniel Dennett's intentional stance). But I think they are largely instrumental not realist. Which is to say, they exist only as knowledge and they shape how we think. Explicitly imputing them into AI isn’t a fools errand, but it will produce systems with truncated agency.
Any software we program with explicit goals, will never be an AGI because there are no shortcuts to AGI. But VERSES are proposing to build a natural general intelligence which is a simple extension to our real physical intelligence here on earth, and that is definitely worth pursuing — but don’t worry about Skynet any time soon.
But Tim! What about “understanding”. You have gone to great lengths to defend John Searle’s Chinese room argument. Couldn’t it be said that understanding is also a fictionalist and instrumental account, an imagined property of functional dynamics? This is exactly what Francois Chollet said when we interviewed him last.
I must admit, this is a conundrum for me and goes to the core of the “phenomenal resentment” I depicted in the Venn diagram above. The reason Enactivists and statistical mechanics folks are strange bedfellows rests on the philosophy of phenomenology. This is the bright line between the simple physics interpretation of understanding and the ontologically subjective one which Searle advocates for.
Rich Sutton argues that that many things, intelligence included are anthropocentric, instrumental fictions or epiphenomena (if they express no top-down causation). I agree with him. I just wish I understood why he can simultaneously understand this, and still think we need to design AI models explicitly with the epiphenomena i.e. “reward”. This seems like confusing the map and the territory, just like how X-risk people think of their model of intelligence (MDPs) actually being the same thing as intelligence (it isn’t).
Goals are a (human) cognitive primitive
In cognitive psychology, there appears to be a widespread consensus that goals play a foundational role in cognition. This perspective is supported by the work of Elizabeth Spelke, who, in her seminal paper on core knowledge, highlights the role of goals as a primitive of cognitive processes. According to Spelke, one of the core knowledge systems humans possess is focused on agents and their goal-directed actions. Goals serve as a crucial element which undergird the way human infants interpret the behavior of other beings as intentional and meaningful.
Spelke and her colleague Katherine D. Kinzler argue that core knowledge systems guide and shape mental lives, and among these systems is one which is attuned to agents and their actions, fundamentally characterised by the concept of goals. They posit that the recognition of goal-directed behavior forms the basis of how we understand agentive behavior.
The emphasis on goal-directed actions signals that even at a very early stage of development, human infants possess an inherent ability to perceive and interpret the actions of others in a manner that assumes an underpinning goal or intent.
The idea that an ability to recognise goals is an evolutionary and developmental precursor to more complex cognitive processes echoes the instrumental role that goals play in implementations of active inference models. It suggests that the way humans and potentially other animals navigate their environments and make decisions is deeply informed by an intrinsic understanding of goal-oriented behavior. Goals seem to be the very substance that fuels the engine of cognition in humans, but - are they just an instrumental fiction or reflect some deeper ontological reality? If you are from a psychology-influenced strand of AI such as expert systems, you would probably believe so. The design vs emerge debate is one of the biggest in AI and is the premise of Rich Sutton’s bitter lesson essay.
X-risk ideology hinges on goals
Orthogonality Thesis: highly intelligent AI systems could be aligned with arbitrary, trivial, or even harmful goals, and their level of intelligence doesn't inherently make them aligned with human values or ethics.
Instrumental Convergence: an AI designed with the sole objective of manufacturing paper clips (for example) might seek to acquire resources, protect itself from being shut down, and improve its own capabilities, and perhaps kill all humans because these sub-goals are instrumentally useful for producing more paper clips, and these sub-goals would likely intersect for many end goals.
If goals are seen merely as instrumental constructs created by humans to explain and guide behaviour, then Bostrom’s philosophy loses some of its force in my opinion. Bostrom assumes that AI's intelligence and goals are entirely independent, universal even, when in fact, as we have described in detail already — goals and intelligence emerge from functional dynamics of physical material (or extremely high resolution simulations of such) and are likely to be entangled in an extremely complex fashion. From this perspective, intelligence in AI is not a matter of fulfilling predefined goals or teleological goals. In the instrumentalist view, behaviour is not dictated solely by individual planning but emerges from the interplay within a system.
Bostrom's notion of instrumental convergence relies on the assumption that certain sub-goals are universally useful - and we probably see this anthropomorphically too, for example - human motivations for power, resource acquisition, and self-preservation. It's not clear that AIs would "naturally" converge on these sub-goals.
The externalist cognitive perspective demands that agents are embedded within dynamics, embodied and specialised / situational. This could imply that the risks sketched out by Bostrom are less imminent due to the computational complexity inherent in creating such advanced systems or simulations.
Dr. Tim Scarfe - MLST
"Many representationalist AI researchers do think that agents have internal representation of the world and do explicit planning." -- most people I know who do cognitive modelling (who are more or less aligned with neurosymbolic AI) would not have a problem with the world model / generative model / representation being somewhat external or distributed, as long as it can be *interrogated*. It must be able to be used as a model, to run simulations, otherwise what is the point of having it at all?
Other points of feedback if you are planning to use this as springboard for further: it might be worth having a good, lay, watertight definition or explanation of what entropy is somewhere. Also, the wall of fire para is pretty speculative and vague. Could be a really nice idea if developed a bit?
Are you going to keep up the Newsletter?