Why Do AIs Lie?

Mar 15, 2023

Zeroth Principles can clarify many issues in the ML/AI domain.

As discussed in a previous post, Epistemology is normally an armchair discipline, like the rest of Philosophy. It has only lately become accessible to experiments because we can use various Machine Learning models to test our hypotheses.

I would like to introduce three statements in Epistemology that are (I claim) pretty hard to argue with:

Omniscience is unavailable

We don’t even have eyes in the back of our heads. Complex and chaotic systems cannot be predicted over the long term. Nobody, even an AI, can track everything that happens. In order to always be correct about everything, we would need to know everything. To perfectly predict the weather we would need to track every water molecule in the ocean.

Some very hardline Reductionists have argued that we can have omniscience. They are clearly not expecting AI to appear in their lifetime. The better bet is to switch to a Holistic Stance.

All corpora are incomplete

AI is now Machine Learning. ChatGPT and its ilk (LLMs of all kinds, and future systems that may be very differently designed are all lumped under the term “AI” in my writing here on SubStack) are raised on a learning curriculum – a “corpus” – of text. Even a small corpus may lead to decent performance on common tasks, but larger corpora can cover more corner cases and provide more opportunities to learn from semi-related problem domains. Today, lacking better comparisons, we may view either the size of the language model or the size of the corpus as estimates of capabilities of a new system.

It seems our machines are too small for truly useful results. ChatGPT-3.5, to take a concrete example, learned a lot about language, in fact, several of them, but there was likely not enough resources to learn useful competences in Math, Physics, or Civics, to just name a few things it was largely ignorant of.

At some point, with more effective algorithms and even larger cloud-based learners, we will get to a point where our AIs, for all practical purposes for a majority of people, will stop lying and will become trusted assistants of various kinds. They will tell us when they do not know enough to answer, and on the flipside, we will learn not to bully them into lying.

All intelligences are fallible

This follows from the previous two statements.

Ignorance is one of the four major failure modes for all intelligences. The others are Illusion (incorrect sensory input and preprocessing), Misunderstanding (it was learned wrong, possibly from incorrect or conflicting corpora), and Confusion (more than one interpretation was possible, even at inference time or runtime).

Humans and AIs are both limited by these Epistemological constraints. We have to accept this and be happy if we can get something useful and halfway reliable out of either kind of agent.

Confabulation

Confabulation is the technical term for AIs lying when producing text. When they are producing images, some like to call it “hallucination”.

Since all intelligences are fallible, it means that all intelligences are – technically – confabulating every time they emit a communication of any kind. Because they could easily be ignorant, confused, or mistaken. We note that confabulation does not have to be malicious. Children who have learned some language will tell fantastical tales about how they see and interpret the world.

Currently, our AIs may tell you it is just an ignorant language model, or equivalent. But if the user insists or tricks it, it will confabulate several paragraphs out of whatever it has learned about the prompted topic. And since its world model only provides it a “Shallow and Hollow Pseudo-Understanding” there will be many opportunities to issue some very confusing statements.

Superhuman AIs

Note that I am not saying that superhuman intelligences are impossible. Not at all. I just wish to point out that there are hard limits to intelligence, and that getting closer to those limits will become a battle of diminishing returns.

I have not been following the “AI as Existential Risk” debate lately, and there are many aspects to this, but last time I looked, nobody was discussing these limits to intelligence. IMO, AI improvements will arrive at manageable rates, much like iOS releases. I have discussed some of this in a blog post.

Examining this closer we notice that the limits to intelligence are not just technological, They are largely set by the complexity of the world.

And adding AIs to the world will make it even more complex.

Zeroth Principles of AI

Discussion about this post

Ready for more?