Discover more from Zeroth Principles of AI
Altman Upsetting Investors?
Term Vectors Are Overrated
Sam Altman (CEO of OpenAI that made ChatGPT) has recently been saying that he is about to make a major announcement that will upset many investors.
Some people speculate this will be about OpenAI Open-Sourcing something. Some think he will ask for more regulation. Some think he will announce a long delay before GPT5. In fact, OpenAI hasn’t even started training it yet.
My really wild speculation: "We don't actually need GPUs"
The GPT5 delay could have been brought about by research at OpenAI leading to discovery of much much cheaper GPU-free LLM algorithms, and that those algorithms may not yet be quite ready for prime time.
The “disappointing investors” warning would be because cloud services involving GPUs would not be needed for language Understanding henceforth; they would still be critical for images, video, speech, sound, scientific applications, etc. This would upset many budgets and companies selling GPUs.
These GPU-free Natural Language Understanding (NLU) algorithms exist. I have been researching LLMs in my company Syntience Inc since 2001. Our product is a smaller, faster, cheaper kind of LLM that we now call an SSM – a Small Syntax Model. We can create a “useful” SSM on a laptop in under five minutes using a mere 5MB of corpus, and without using a GPU. We have a UM1 demo server in the cloud that loads a small SSM learned for just a few hours. Code to test this demo server is posted on GitHub.
Almost a year ago I posted a summary of how my SSMs are created and how they are used on my main publishing site. Chapter 8 discusses the “OL” learning algorithm and Chapter 9 the cloud based “UM1” runtime service. Note that the language in this one year old chapter does not use the term “SSM” since I only started using it recently.
How did we get here?
I have a “just-so” story that I made up from whole cloth because I wasn’t in the room when it happened. Consider this fictitious scenario:
Sometime between 2006-2014 people like Geoff Hinton get Deep learning (DL) working well for Understanding images.
By that time, and probably independently, some NLP researcher(s) invent termvectors and word2vec. These ideas provide the functionality for the famous equation of KING - MAN + WOMAN = QUEEN by allowing for Linear Algebra to work in a high-dimensional semantic concept space.
It is now a natural step for DL researchers to attempt to Understand human language by converting the input text to a strange kind of “image” using term vector lookup for the translation from, well, terms to vectors. And then to use the Image Understanding algorithms they had already developed to Understand text.
And this worked really well, and was the basis for many years of rapid improvement in DNN based NLU.
But my theory (in this fictitious story) is that they got too lucky too early.
They went with term vectors because it worked. And never bothered searching for a cheaper alternative.
So these algorithms are starting from the semantics (of terms at the word level) imported from the outside (as gathered by word2vec) and they then attempt to learn the syntax of the language from the main learning corpus. I call these Semantics-First algorithms.
When learning syntax, they will be schlepping around these termvectors. Which is very expensive. Which is why they need to run on powerful and expensive GPUs.
The most important algorithm in a Deep Neural Network stack is Convolution. This is used for correlation discovery. In images, correlation discovery requires that multiple passes be made over the whole image, performing various matrix operations using Linear Algebra.
In text, all possible correlations are in the (linear) past text that has already been read and they can be found using indexing methods such as those used for web search. A more effective indexing method capable of preserving much more context is a neural network using discrete neurons. This is what we use, and is discussed in Chapter 9.
So according to my just-so story, the ML community turned a 1-Dimensional indexed correlation lookup into a 2D convolution that required searching for these correlations. And there’s more: the convolution must be done repeatedly before it converges, because adjusting weights partially invalidates previous efforts.
And these DL algorithms operate in an Euclidean space, which means distance measurements involve squares of hundreds of floating point numbers and square roots. In contrast, my SSMs use Jaccard distance in an even higher-dimensional boolean space. Most of my algorithms are based on set theory.
These are the reasons LLMs cost OpenAI on the order of $Billions to train their LLMs. GPUs are expensive.
Learning language directly, character by character, is easily a million times faster than using termvectors. It produces SSMs instead of LLMs, because it didn’t start with semantics. We know SSMs can handle classification. Can they handle dialog?
Are term vectors really necessary for dialog?
Or perhaps OpenAI knows.
My algorithm, Organic Learning, has been working since 2017 but I don’t have a machine that is big enough to learn beyond what we needed for classification. We are using 10 years old Apple Macintosh Pro Late 2103 machines for all our research.
OpenAI certainly has the funding, compute, and talent they would need in order to switch to Syntax-First algorithms like mine. There may well be others working on similar ideas, and I predict we will see more research activity in this area now that we know it’s possible to at least get this far.
My company needs a 4TB RAM server with about 220 threads for learning various release versions of classifiers in multiple languages and for experiments aimed at learning enough to be able to conduct a dialog in the style of ChatGPT on a millionth of their budget.
We are self-funded and cannot afford such experiments.