SIX STEPS TO AGI | KEVIN BUHLER

← go back

SIX STEPS TO AGI

april 2 2024

2025 update: my views on the following have changed a lot, but it is interesting to see how we used to think about AI prior to reasoning models.

"I am not a madman for saying that it is likely that the code for artificial general intelligence is going to be tens of thousands of line of code not millions of lines of code. This is code that conceivably one individual could write, unlike writing a new web browser or operating system and, based on the progress that AI as machine learning had made in the recent decade, it's likely that the important things that we don't know are relatively simple. There's probably a handful of things and my bet is I think there's less than six key insights that need to be made. Each one of them can probably be written on the back of an envelope." - John Carmack

Here's what I feel the six key insights are:

1. Verifiable environment

In more detail, we need a verifiable environment. How else would the system know if it is doing something correctly? Humans exist in a 3D verifiable environment. We can perform actions on the environment and immediately get rich feedback. How can we find an easy to verify environment for a reasoning system? In math and coding. We can construct a series of math operations and quickly verify the result it can give us. In coding, we can construct some code and then verify if it has the intended solution (through unit tests). For math we can just compare it to an answer.

2. Thought generator

To be honest, the thought generator has pretty much been finished. ChatGPT was the first system to actually complete it, and all the LLMs after it have only improved its capabilities. However, but just the same as humans, thought generators cannot always be right on the first try. So what we must do is collect these thoughts, and then be able to judge them and perform actions on top of them until we get to a solution.

3. Agent

This is pretty obvious, but we need something that actually performs actions on the environment. In reinforcement learning that has become known as an agent. I think that LLMs can act as a pretty good agent. Just prompt it to think of a good next action, and then have some sort of environment so that that actions get performed. Once again, this can be done in math or code.

4. Critic

This is similar to a verifiable environment, but differs in that it can tell exactly what is wrong with a solution, and how to fix it. So you say 'hey critic, where did we go wrong on this path?' and then the critic helps you out. This is actually a pretty important piece because with a bad critic the system won't know how to improve. OpenAI suggests that a you need a large, powerful critic. The agent in most scenarios isn't as important. This essentially solves the credit-assignment problem.

5. Meta learner

So how is this any different from a RL agent? Its because we have a meta learner that essentially learns how to construct trees. So what should it look like? You construct a tree of thoughts in your verifiable environment, and then find a solution. Then with this solution have a pruner that gets rid of branches that have *never* been helpful. You don't want to prune branches that you have used in the past, at least not in the recent past (you can eventually prune/forget some branches after a long period of time). Then you have a completed tree. This meta learner is essentially a carrier of a bunch of trees, and learns what trees are useful in the long term. We as humans do trial and error and then when we get a solution we remember how to get there. The tree helps this system remember how to get there. This learner is also doing active/continuous learning in that you can just add a new tree to the system.

6. Truths

I need a better name for this, but you need to have some facts about the world that *always* are true. For example, we all know that 2+2=4. Its a given. Most people can't formally prove why this is, but it is. What you can do is push a bunch of static facts into the context window and then the system can always refer to these facts. It helps the system be efficient, and humans use this key as well. For example, say you are trying to solve a tricky math problem on an exam. If you have a fair teacher, then you probably have all the tools at your disposal and just have to find the right permutation of these tools to get to a solution. If its physics, then you know that F=ma. You take other known equations, manipulate them based off of the question, and arrive at a solution. This key helps the solution converge to a correct answer, bypassing an infinite amount of steps that a bruce force agent would try.

So there you have it, the six key steps to a reasoning system. Now that we have a solid understanding of the six puzzle pieces, we need finesse them into the exact right way. You need to now do trial and error. You have to enter Thomas Edison mode. Fail 2774 times, and try 2775 different possible permutations until the light bulb flickers on. With the exact right permutation, the reasoning system's light bulb will turn on.

"If we rely purely on generative methods and extrapolate from current trends, we will require an exorbitant parameter count to achieve even moderate performance" - OpenAI Lets verify step-by-step

There are hundreds of billions of dollars being spent each year towards trying to build a perfect thought generator. Today, this thought generator is best known as chatGPT. Why do we expect a thought generator to be 100% perfect first try? This is where a brute of effort and focus is happening, but it needs to go elsewhere. What we instead need to do to create a meta-learner is to collect a series of these thoughts, verify each as correct, and then act on these thoughts. Humans have thoughts, some are true but most are nonsense. However even with false thoughts we have the capability to construct immensely logical and complex solutions and products. So what is really important is to create a self improving feedback loop that learns from a tree of thoughts. Then when it encounters a similar task, it can recall this tree and construct it.