Andrej Karpathy – It will take a decade to get agents to work

Comments

The Andrej Karpathy episode.

Andrej explains why reinforcement learning is terrible (but everything else is much worse), why model collapse prevents LLMs from learning the way humans do, why AGI will just blend into the previous ~2.5 centuries of 2% GDP growth, why self driving took so long to crack, and what he sees as the future of education.

Watch on YouTube; listen on Apple Podcasts or Spotify.

Labelbox helps you get data that is more detailed, more accurate, and higher signal than you could get by default, no matter your domain or training paradigm. Reach out today at labelbox.com/dwarkeshMercury helps you run your business better. It’s the banking platform we use for the podcast — we love that we can see our accounts, cash flows, AR, and AP all in one place. Apply online in minutes at mercury.comGoogle’s Veo 3.1 update is a notable improvement to an already great model. Veo 3.1’s generations are more coherent and the audio is even higher-quality. If you have a Google AI Pro or Ultra plan, you can try it in Gemini today by visiting https://gemini.google

00:00:00 – AGI is still a decade away

Dwarkesh Patel 00:00:00

Today I’m speaking with Andrej Karpathy. Andrej, why do you say that this will be the decade of agents and not the year of agents?

Andrej Karpathy 00:00:07

First of all, thank you for having me here. I’m excited to be here.

The quote you’ve just mentioned, “It’s the decade of agents,” is actually a reaction to a pre-existing quote. I’m not actually sure who said this but they were alluding to this being the year of agents with respect to LLMs and how they were going to evolve. I was triggered by that because there’s some over-prediction going on in the industry. In my mind, this is more accurately described as the decade of agents.

We have some very early agents that are extremely impressive and that I use daily—Claude and Codex and so on—but I still feel there’s so much work to be done. My reaction is we’ll be working with these things for a decade. They’re going to get better, and it’s going to be wonderful. I was just reacting to the timelines of the implication.

Dwarkesh Patel 00:00:58

What do you think will take a decade to accomplish? What are the bottlenecks?

Andrej Karpathy 00:01:02

Actually making it work. When you’re talking about an agent, or what the labs have in mind and maybe what I have in mind as well, you should think of it almost like an employee or an intern that you would hire to work with you. For example, you work with some employees here. When would you prefer to have an agent like Claude or Codex do that work?

Currently, of course they can’t. What would it take for them to be able to do that? Why don’t you do it today? The reason you don’t do it today is because they just don’t work. They don’t have enough intelligence, they’re not multimodal enough, they can’t do computer use and all this stuff.

They don’t do a lot of the things you’ve alluded to earlier. They don’t have continual learning. You can’t just tell them something and they’ll remember it. They’re cognitively lacking and it’s just not working. It will take about a decade to work through all of those issues.

Dwarkesh Patel 00:01:44

Interesting. As a professional podcaster and a viewer of AI from afar, it’s easy for me to identify what’s lacking: continual learning is lacking, or multimodality is lacking. But I don’t really have a good way of trying to put a timeline on it. If somebody asks how long continual learning will take, I have no prior about whether this is a project that should take 5 years, 10 years, or 50 years. Why a decade? Why not one year? Why not 50 years?

Andrej Karpathy 00:02:16

This is where you get into a bit of my own intuition, and doing a bit of an extrapolation with respect to my own experience in the field. I’ve been in AI for almost two decades. It’s going to be 15 years or so, not that long. You had Richard Sutton here, who was around for much longer. I do have about 15 years of experience of people making predictions, of seeing how they turned out. Also I was in the industry for a while, I was in research, and I’ve worked in the industry for a while. I have a general intuition that I have left from that.

I feel like the problems are tractable, they’re surmountable, but they’re still difficult. If I just average it out, it just feels like a decade to me.

Dwarkesh Patel 00:02:57

This is quite interesting. I want to hear not only the history, but what people in the room felt was about to happen at various different breakthrough moments. What were the ways in which their feelings were either overly pessimistic or overly optimistic? Should we just go through each of them one by one?

Andrej Karpathy 00:03:16

That’s a giant question because you’re talking about 15 years of stuff that happened. AI is so wonderful because there have been a number of seismic shifts where the entire field has suddenly looked a different way. I’ve maybe lived through two or three of those. I still think there will continue to be some because they come with almost surprising regularity.

When my career began, when I started to work on deep learning, when I became interested in deep learning, this was by chance of being right next to Geoff Hinton at the University of Toronto. Geoff Hinton, of course, is the godfather figure of AI. He was training all these neural networks. I thought it was incredible and interesting. This was not the main thing that everyone in AI was doing by far. This was a niche little subject on the side. That’s maybe the first dramatic seismic shift that came with the AlexNet and so on.

AlexNet reoriented everyone, and everyone started to train neural networks, but it was still very per-task, per specific task. Maybe I have an image classifier or I have a neural machine translator or something like that. People became very slowly interested in agents. People started to think, “Okay, maybe we have a check mark next to the visual cortex or something like that, but what about the other parts of the brain, and how can we get a full agent or a full entity that can interact in the world?”

The Atari deep reinforcement learning shift in 2013 or so was part of that early effort of agents, in my mind, because it was an attempt to try to get agents that not just perceive the world, but also take actions and interact and get rewards from environments. At the time, this was Atari games.

I feel that was a misstep. It was a misstep that even the early OpenAI that I was a part of adopted because at that time, the zeitgeist was reinforcement learning environments, games, game playing, beat games, get lots of different types of games, and OpenAI was doing a lot of that. That was another prominent part of AI where maybe for two or three or four years, everyone was doing reinforcement learning on games. That was all a bit of a misstep.

What I was trying to do at OpenAI is I was always a bit suspicious of games as being this thing that would lead to AGI. Because in my mind, you want something like an accountant or something that’s interacting with the real world. I just didn’t see how games add up to it. My project at OpenAI, for example, was within the scope of the Universe project, on an agent that was using keyboard and mouse to operate web pages. I really wanted to have something that interacts with the actual digital world that can do knowledge work.

It just so turns out that this was extremely early, way too early, so early that we shouldn’t have been working on that. Because if you’re just stumbling your way around and keyboard mashing and mouse clicking and trying to get rewards in these environments, your reward is too sparse and you just won’t learn. You’re going to burn a forest computing, and you’re never going to get something off the ground. What you’re missing is this power of representation in the neural network.

For example, today people are training those computer-using agents, but they’re doing it on top of a large language model. You have to get the language model first, you have to get the representations first, and you have to do that by all the pre-training and all the LLM stuff.

I feel maybe loosely speaking, people kept trying to get the full thing too early a few times, where people really try to go after agents too early, I would say. That was Atari and Universe and even my own experience. You actually have to do some things first before you get to those agents. Now the agents are a lot more competent, but maybe we’re still missing some parts of that stack.

I would say those are the three major buckets of what people were doing: training neural nets per-tasks, trying the first round of agents, and then maybe the LLMs and seeking the representation power of the neural networks before you tack on everything else on top.

Dwarkesh Patel 00:07:02

Interesting. If I were to steelman the Sutton perspective, it would be that humans can just take on everything at once, or even animals can take on everything at once. Animals are maybe a better example because they don’t even have the scaffold of language. They just get thrown out into the world, and they just have to make sense of everything without any labels.

The vision for AGI then should just be something which looks at sensory data, looks at the computer screen, and it just figures out what’s going on from scratch. If a human were put in a similar situation and had to be trained from scratch… This is like a human growing up or an animal growing up. Why shouldn’t that be the vision for AI, rather than this thing where we’re doing millions of years of training?

Andrej Karpathy 00:07:41

That’s a really good question. Sutton was on your podcast and I saw the podcast and I had a write-up about that podcast that gets into a bit of how I see things. I’m very careful to make analogies to animals because they came about by a very different optimization process. Animals are evolved, and they come with a huge amount of hardware that’s built in.

For example, my example in the post was the zebra. A zebra gets born, and a few minutes later it’s running around and following its mother. That’s an extremely complicated thing to do. That’s not reinforcement learning. That’s something that’s baked in. Evolution obviously has some way of encoding the weights of our neural nets in ATCGs, and I have no idea how that works, but it apparently works.

Brains just came from a very different process, and I’m very hesitant to take inspiration from it because we’re not actually running that process. In my post, I said we’re not building animals. We’re building ghosts or spirits or whatever people want to call it, because we’re not doing training by evolution. We’re doing training by imitation of humans and the data that they’ve put on the Internet.

You end up with these ethereal spirit entities because they’re fully digital and they’re mimicking humans. It’s a different kind of intelligence. If you imagine a space of intelligences, we’re starting off at a different point almost. We’re not really building animals. But it’s also possible to make them a bit more animal-like over time, and I think we should be doing that.

One more point. I do feel Sutton has a very... His framework is, “We want to build animals.” I think that would be wonderful if we can get that to work. That would be amazing. If there were a single algorithm that you can just run on the Internet and it learns everything, that would be incredible. I’m not sure that it exists and that’s certainly not what animals do, because animals have this outer loop of evolution.

A lot of what looks like learning is more like maturation of the brain. I think there’s very little reinforcement learning for animals. A lot of the reinforcement learning is more like motor tasks; it’s not intelligence tasks. So I actually kind of think humans don’t really use RL, roughly speaking.

Dwarkesh Patel 00:09:52

Can you repeat the last sentence? A lot of that intelligence is not motor task…it’s what, sorry?

Andrej Karpathy 00:09:54

A lot of the reinforcement learning, in my perspective, would be things that are a lot more motor-like, simple tasks like throwing a hoop. But I don’t think that humans use reinforcement learning for a lot of intelligence tasks like problem-solving and so on. That doesn’t mean we shouldn’t do that for research, but I just feel like that’s what animals do or don’t.

Dwarkesh Patel 00:10:17

I’m going to take a second to digest that because there are a lot of different ideas. Here’s one clarifying question I can ask to understand the perspective. You suggest that evolution is doing the kind of thing that pre-training does in the sense of building something which can then understand the world.

The difference is that evolution has to be titrated in the case of humans through three gigabytes of DNA. That’s very unlike the weights of a model. Literally, the weights of the model are a brain, which obviously does not exist in the sperm and the egg. So it has to be grown. Also, the information for every single synapse in the brain simply cannot exist in the three gigabytes that exist in the DNA.

Evolution seems closer to finding the algorithm which then does the lifetime learning. Now, maybe the lifetime learning is not analogous to RL, to your point. Is that compatible with the thing you were saying, or would you disagree with that?

Andrej Karpathy 00:11:17

I think so. I would agree with you that there’s some miraculous compression going on because obviously, the weights of the neural net are not stored in ATCGs. There’s some dramatic compression. There are some learning algorithms encoded that take over and do some of the learning online. I definitely agree with you on that. I would say I’m a lot more practically minded. I don’t come at it from the perspective of, let’s build animals. I come from it from the perspective of, let’s build useful things. I have a hard hat on, and I’m just observing that we’re not going to do evolution, because I don’t know how to do that.

But it does turn out we can build these ghosts, spirit-like entities, by imitating internet documents. This works. It’s a way to bring you up to something that has a lot of built-in knowledge and intelligence in some way, similar to maybe what evolution has done. That’s why I call pre-training this crappy evolution. It’s the practically possible version with our technology and what we have available to us to get to a starting point where we can do things like reinforcement learning and so on.

Dwarkesh Patel 00:12:15

Just to steelman the other perspective, after doing this Sutton interview and thinking about it a bit, he has an important point here. Evolution does not give us the knowledge, really. It gives us the algorithm to find the knowledge, and that seems different from pre-training.

Perhaps the perspective is that pre-training helps build the kind of entity which can learn better. It teaches meta-learning, and therefore it is similar to finding an algorithm. But if it’s “Evolution gives us knowledge, pre-training gives us knowledge,” that analogy seems to break down.

Andrej Karpathy 00:12:42

It’s subtle and I think you’re right to push back on it, but basically the thing that pre-training is doing, you’re getting the next-token predictor over the internet, and you’re training that into a neural net. It’s doing two things that are unrelated. Number one, it’s picking up all this knowledge, as I call it. Number two, it’s actually becoming intelligent.

By observing the algorithmic patterns in the internet, it boots up all these little circuits and algorithms inside the neural net to do things like in-context learning and all this stuff. You don’t need or want the knowledge. I think that’s probably holding back the neural networks overall because it’s getting them to rely on the knowledge a little too much sometimes.

For example, I feel agents, one thing they’re not very good at, is going off the data manifold of what exists on the internet. If they had less knowledge or less memory, maybe they would be better. What I think we have to do going forward—and this would be part of the research paradigms—is figure out ways to remove some of the knowledge and to keep what I call this cognitive core. It’s this intelligent entity that is stripped from knowledge but contains the algorithms and contains the magic of intelligence and problem-solving and the strategies of it and all this stuff.

Dwarkesh Patel 00:13:50

There’s so much interesting stuff there. Let’s start with in-context learning. This is an obvious point, but I think it’s worth just saying it explicitly and meditating on it. The situation in which these models seem the most intelligent—in which I talk to them and I’m like, “Wow, there’s really something on the other end that’s responding to me thinking about things—is if it makes a mistake it’s like, “Oh wait, that’s the wrong way to think about it. I’m backing up.” All that is happening in context. That’s where I feel like the real intelligence is that you can visibly see.

That in-context learning process is developed by gradient descent on pre-training. It spontaneously meta-learns in-context learning, but the in-context learning itself is not gradient descent, in the same way that our lifetime intelligence as humans to be able to do things is conditioned by evolution but our learning during our lifetime is happening through some other process.

Andrej Karpathy 00:14:42

I don’t fully agree with that, but you should continue your thought.

Dwarkesh Patel 00:14:44

Well, I’m very curious to understand how that analogy breaks down.

Andrej Karpathy 00:14:48

I’m hesitant to say that in-context learning is not doing gradient descent. It’s not doing explicit gradient descent. In-context learning is pattern completion within a token window. It just turns out that there’s a huge amount of patterns on the internet. You’re right, the model learns to complete the pattern, and that’s inside the weights. The weights of the neural network are trying to discover patterns and complete the pattern. There’s some adaptation that happens inside the neural network, which is magical and just falls out from the internet just because there’s a lot of patterns.

I will say that there have been some papers that I thought were interesting that look at the mechanisms behind in-context learning. I do think it’s possible that in-context learning runs a small gradient descent loop internally in the layers of the neural network. I recall one paper in particular where they were doing linear regression using in-context learning. Your inputs into the neural network are XY pairs, XY, XY, XY that happen to be on the line. Then you do X and you expect Y. The neural network, when you train it in this way, does linear regression.

Normally when you would run linear regression, you have a small gradient descent optimizer that looks at XY, looks at an error, calculates the gradient of the weights and does the update a few times. It just turns out that when they looked at the weights of that in-context learning algorithm, they found some analogies to gradient descent mechanics. In fact, I think the paper was even stronger because they hardcoded the weights of a neural network to do gradient descent through attention and all the internals of the neural network.

That’s just my only pushback. Who knows how in-context learning works, but I think that it’s probably doing a bit of some funky gradient descent internally. I think that that’s possible. I was only pushing back on your saying that it’s not doing in-context learning. Who knows what it’s doing, but it’s probably maybe doing something similar to it, but we don’t know.

Dwarkesh Patel 00:16:39

So then it’s worth thinking okay, if in-context learning and pre-training are both implementing something like gradient descent, why does it feel like with in-context learning we’re getting to this continual learning, real intelligence-like thing? Whereas you don’t get the analogous feeling just from pre-training. You could argue that.

If it’s the same algorithm, what could be different? One way you could think about it is, how much information does the model store per information it receives from training? If you look at pre-training, if you look at Llama 3 for example, I think it’s trained on 15 trillion tokens. If you look at the 70B model, that would be the equivalent of 0.07 bits per token that it sees in pre-training, in terms of the information in the weights of the model compared to the tokens it reads. Whereas if you look at the KV cache and how it grows per additional token in in-context learning, it’s like 320 kilobytes. So that’s a 35 million-fold difference in how much information per token is assimilated by the model. I wonder if that’s relevant at all.

Andrej Karpathy 00:17:46

I kind of agree. The way I usually put this is that anything that happens during the training of the neural network, the knowledge is only a hazy recollection of what happened in training time. That’s because the compression is dramatic. You’re taking 15 trillion tokens and you’re compressing it to just your final neural network of a few billion parameters. Obviously it’s a massive amount of compression going on. So I refer to it as a hazy recollection of the internet documents.

Whereas anything that happens in the context window of the neural network—you’re plugging in all the tokens and building up all those KV cache representations—is very directly accessible to the neural net. So I compare the KV cache and the stuff that happens at test time to more like a working memory. All the stuff that’s in the context window is very directly accessible to the neural net.

There’s always these almost surprising analogies between LLMs and humans. I find them surprising because we’re not trying to build a human brain directly. We’re just finding that this works and we’re doing it. But I do think that anything that’s in the weights, it’s a hazy recollection of what you read a year ago. Anything that you give it as a context at test time is directly in the working memory. That’s a very powerful analogy to think through things.

When you, for example, go to an LLM and you ask it about some book and what happened in it, like Nick Lane’s book or something like that, the LLM will often give you some stuff which is roughly correct. But if you give it the full chapter and ask it questions, you’re going to get much better results because it’s now loaded in the working memory of the model. So a very long way of saying I agree and that’s why.

Dwarkesh Patel 00:19:11

Stepping back, what is the part about human intelligence that we have most failed to replicate with these models?

Andrej Karpathy 00:19:20

Just a lot of it. So maybe one way to think about it, I don’t know if this is the best way, but I almost feel like — again, making these analogies imperfect as they are — we’ve stumbled by with the transformer neural network, which is extremely powerful, very general. You can train transformers on audio, or video, or text, or whatever you want, and it just learns patterns and they’re very powerful, and it works really well. That to me almost indicates that this is some piece of cortical tissue. It’s something like that, because the cortex is famously very plastic as well. You can rewire parts of brains. There were the slightly gruesome experiments with rewiring the visual cortex to the auditory cortex, and this animal learned fine, et cetera.

So I think that this is cortical tissue. I think when we’re doing reasoning and planning inside the neural networks, doing reasoning traces for thinking models, that’s kind of like the prefrontal cortex. Maybe those are like little checkmarks, but I still think there are many brain parts and nuclei that are not explored. For example, there’s a basal ganglia doing a bit of reinforcement learning when we fine-tune the models on reinforcement learning. But where’s the hippocampus? Not obvious what that would be. Some parts are probably not important. Maybe the cerebellum is not important to cognition, its thoughts, so maybe we can skip some of it. But I still think there’s, for example, the amygdala, all the emotions and instincts. There’s probably a bunch of other nuclei in the brain that are very ancient that I don’t think we’ve really replicated.

I don’t know that we should be pursuing the building of an analog of a human brain. I’m an engineer mostly at heart. Maybe another way to answer the question is that you’re not going to hire this thing as an intern. It’s missing a lot of it because it comes with a lot of these cognitive deficits that we all intuitively feel when we talk to the models. So it’s not fully there yet. You can look at it as not all the brain parts are checked off yet.

Dwarkesh Patel 00:21:16

This is maybe relevant to the question of thinking about how fast these issues will be solved. Sometimes people will say about continual learning, “Look, you could easily replicate this capability. Just as in-context learning emerged spontaneously as a result of pre-training, continual learning over longer horizons will emerge spontaneously if the model is incentivized to recollect information over longer horizons, or horizons longer than one session.” So if there’s some outer loop RL which has many sessions within that outer loop, then this continual learning where it fine-tunes itself, or it writes to an external memory or something, will just emerge spontaneously. Do you think things like that are plausible? I just don’t have a prior over how plausible that is. How likely is that to happen?

Andrej Karpathy 00:22:07

I don’t know that I fully resonate with that. These models, when you boot them up and they have zero tokens in the window, they’re always restarting from scratch where they were. So I don’t know in that worldview what it looks like. Maybe making some analogies to humans—just because I think it’s roughly concrete and interesting to think through—I feel like when I’m awake, I’m building up a context window of stuff that’s happening during the day. But when I go to sleep, something magical happens where I don’t think that context window stays around. There’s some process of distillation into the weights of my brain. This happens during sleep and all this stuff.

We don’t have an equivalent of that in large language models. That’s to me more adjacent to when you talk about continual learning and so on as absent. These models don’t really have a distillation phase of taking what happened, analyzing it obsessively, thinking through it, doing some synthetic data generation process and distilling it back into the weights, and maybe having a specific neural net per person. Maybe it’s a LoRA. It’s not a full-weight neural network. It’s just some small sparse subset of the weights that are changed.

But we do want to create ways of creating these individuals that have very long context. It’s not only remaining in the context window because the context windows grow very, very long. Maybe we have some very elaborate, sparse attention over it. But I still think that humans obviously have some process for distilling some of that knowledge into the weights. We’re missing it. I do also think that humans have some very elaborate, sparse attention scheme, which I think we’re starting to see some early hints of. DeepSeek v3.2 just came out and I saw that they have sparse attention as an example, and this is one way to have very, very long context windows. So I feel like we are redoing a lot of the cognitive tricks that evolution came up with through a very different process. But we’re going to converge on a similar architecture cognitively.

Dwarkesh Patel 00:24:02

In 10 years, do you think it’ll still be something like a transformer, but with much more modified attention and more sparse MLPs and so forth?

Andrej Karpathy 00:24:10

The way I like to think about it is translation invariance in time. So 10 years ago, where were we? 2015. In 2015, we had convolutional neural networks primarily, residual networks just came out. So remarkably similar, I guess, but quite a bit different still. The transformer was not around. All these more modern tweaks on the transformer were not around. Maybe some of the things that we can bet on, I think in 10 years by translational equivariance, is that we’re still training giant neural networks with a forward backward pass and update through gradient descent, but maybe it looks a bit different, and it’s just that everything is much bigger.

Recently I went back all the way to 1989 which was a fun exercise for me, a few years ago, because I was reproducing Yann LeCun’s 1989 convolutional network, which was the first neural network I’m aware of trained via gradient descent, like modern neural network trained gradient descent on digit recognition. I was just interested in how I could modernize this. How much of this is algorithms? How much of this is data? How much of this progress is compute and systems? I was able to very quickly halve the learning just by time traveling by 33 years.

So if I time travel by algorithms 33 years, I could adjust what Yann LeCun did in 1989, and I could halve the error. But to get further gains, I had to add a lot more data, I had to 10x the training set, and then I had to add more computational optimizations. I had to train for much longer with dropout and other regularization techniques.

So all these things have to improve simultaneously. We’re probably going to have a lot more data, we’re probably going to have a lot better hardware, probably going to have a lot better kernels and software, we’re probably going to have better algorithms. All of those, it’s almost like no one of them is winning too much. All of them are surprisingly equal. This has been the trend for a while.

So to answer your question, I expect differences algorithmically to what’s happening today. But I do also expect that some of the things that have stuck around for a very long time will probably still be there. It’s probably still a giant neural network trained with gradient descent. That would be my guess.

Dwarkesh Patel 00:26:16

It’s surprising that all of those things together only halved the error, 30 years of progress…. Maybe half is a lot. Because if you halve the error, that actually means that…

Andrej Karpathy 00:26:30

Half is a lot. But I guess what was shocking to me is everything needs to improve across the board: architecture, optimizer, loss function. It also has improved across the board forever. So I expect all those changes to be alive and well.

Dwarkesh Patel 00:26:43

Yeah. I was about to ask you a very similar question about nanochat. Since you just coded it up recently, every single step in the process of building a chatbot is fresh in your RAM. I’m curious if you had similar thoughts about, “Oh, there was no one thing that was relevant to going from GPT-2 to nanochat.” What are some surprising takeaways from the experience?

Andrej Karpathy 00:27:08

Of building nanochat? So nanochat is a repository I released. Was it yesterday or the day before? I can’t remember.

Dwarkesh Patel 00:27:15

We can see the sleep deprivation that went into the…

Andrej Karpathy 00:27:18

It’s trying to be the simplest complete repository that covers the whole pipeline end-to-end of building a ChatGPT clone. So you have all of the steps, not just any individual step, which is a bunch. I worked on all the individual steps in the past and released small pieces of code that show you how that’s done in an algorithmic sense, in simple code. But this handles the entire pipeline. In terms of learning, I don’t know that I necessarily found something that I learned from it. I already had in my mind how you build it. This is just the process of mechanically building it and making it clean enough so that people can learn from it and that they find it useful.

Dwarkesh Patel 00:28:04

What is the best way for somebody to learn from it? Is it to just delete all the code and try to reimplement from scratch, try to add modifications to it?

Andrej Karpathy 00:28:10

That’s a great question. Basically it’s about 8,000 lines of code that takes you through the entire pipeline. I would probably put it on the right monitor. If you have two monitors, you put it on the right. You want to build it from scratch, you build it from the start. You’re not allowed to copy-paste, you’re allowed to reference, you’re not allowed to copy-paste. Maybe that’s how I would do it.

But I also think the repository by itself is a pretty large beast. When you write this code, you don’t go from top to bottom, you go from chunks and you grow the chunks, and that information is absent. You wouldn’t know where to start. So it’s not just a final repository that’s needed, it’s the building of the repository, which is a complicated chunk-growing process. So that part is not there yet. I would love to add that probably later this week. It’s probably a video or something like that. Roughly speaking, that’s what I would try to do. Build the stuff yourself, but don’t allow yourself copy-paste.

I do think that there’s two types of knowledge, almost. There’s the high-level surface knowledge, but when you build something from scratch, you’re forced to come to terms with what you don’t understand and you don’t know that you don’t understand it.

It always leads to a deeper understanding. It’s the only way to build. If I can’t build it, I don’t understand it. That’s a Feynman quote, I believe. I 100% have always believed this very strongly, because there are all these micro things that are just not properly arranged and you don’t really have the knowledge. You just think you have the knowledge. So don’t write blog posts, don’t do slides, don’t do any of that. Build the code, arrange it, get it to work. It’s the only way to go. Otherwise, you’re missing knowledge.

00:29:45 – LLM cognitive deficits

Dwarkesh Patel 00:29:45

You tweeted out that coding models were of very little help to you in assembling this repository. I’m curious why that was.

Andrej Karpathy 00:29:53

I guess I built the repository over a period of a bit more than a month. I would say there are three major classes of how people interact with code right now. Some people completely reject all of LLMs and they are just writing by scratch. This is probably not the right thing to do anymore.

The intermediate part, which is where I am, is you still write a lot of things from scratch, but you use the autocomplete that’s available now from these models. So when you start writing out a little piece of it, it will autocomplete for you and you can just tap through. Most of the time it’s correct, sometimes it’s not, and you edit it. But you’re still very much the architect of what you’re writing. Then there’s the vibe coding: “Hi, please implement this or that,” enter, and then let the model do it. That’s the agents.

I do feel like the agents work in very specific settings, and I would use them in specific settings. But these are all tools available to you and you have to learn what they’re good at, what they’re not good at, and when to use them. So the agents are pretty good, for example, if you’re doing boilerplate stuff. Boilerplate code that’s just copy-paste stuff, they’re very good at that. They’re very good at stuff that occurs very often on the Internet because there are lots of examples of it in the training sets of these models. There are features of things where the models will do very well.

I would say nanochat is not an example of those because it’s a fairly unique repository. There’s not that much code in the way that I’ve structured it. It’s not boilerplate code. It’s intellectually intense code almost, and everything has to be very precisely arranged. The models have so many cognitive deficits. One example, they kept misunderstanding the code because they have too much memory from all the typical ways of doing things on the Internet that I just wasn’t adopting. The models, for example—I don’t know if I want to get into the full details—but they kept thinking I’m writing normal code, and I’m not.

Dwarkesh Patel 00:31:49

Maybe one example?

Andrej Karpathy 00:31:51

You have eight GPUs that are all doing forward, backwards. The way to synchronize gradients between them is to use a Distributed Data Parallel container of PyTorch, which automatically as you’re doing the backward, it will start communicating and synchronizing gradients. I didn’t use DDP because I didn’t want to use it, because it’s not necessary. I threw it out and wrote my own synchronization routine that’s inside the step of the optimizer. The models were trying to get me to use the DDP container. They were very concerned. This gets way too technical, but I wasn’t using that container because I don’t need it and I have a custom implementation of something like it.

Dwarkesh Patel 00:32:26

They just couldn’t internalize that you had your own.

Andrej Karpathy 00:32:28

They couldn’t get past that. They kept trying to mess up the style. They’re way too over-defensive. They make all these try-catch statements. They keep trying to make a production code base, and I have a bunch of assumptions in my code, and it’s okay. I don’t need all this extra stuff in there. So I feel like they’re bloating the code base, bloating the complexity, they keep misunderstanding, they’re using deprecated APIs a bunch of times. It’s a total mess. It’s just not net useful. I can go in, I can clean it up, but it’s not net useful.

I also feel like it’s annoying to have to type out what I want in English because it’s too much typing. If I just navigate to the part of the code that I want, and I go where I know the code has to appear and I start typing out the first few letters, autocomplete gets it and just gives you the code. This is a very high information bandwidth to specify what you want. You point to the code where you want it, you type out the first few pieces, and the model will complete it.

So what I mean is, these models are good in certain parts of the stack. There are two examples where I use the models that I think are illustrative. One was when I generated the report. That’s more boilerplate-y, so I partially vibe-coded some of that stuff. That was fine because it’s not mission-critical stuff, and it works fine.

The other part is when I was rewriting the tokenizer in Rust. I’m not as good at Rust because I’m fairly new to Rust. So there’s a bit of vibe coding going on when I was writing some of the Rust code. But I had a Python implementation that I fully understand, and I’m just making sure I’m making a more efficient version of it, and I have tests so I feel safer doing that stuff. They increase accessibility to languages or paradigms that you might not be as familiar with. I think they’re very helpful there as well. There’s a ton of Rust code out there, the models are pretty good at it. I happen to not know that much about it, so the models are very useful there.

Dwarkesh Patel 00:34:23

The reason this question is so interesting is because the main story people have about AI exploding and getting to superintelligence pretty rapidly is AI automating AI engineering and AI research. They’ll look at the fact that you can have Claude Code and make entire applications, CRUD applications, from scratch and think, “If you had this same capability inside of OpenAI and DeepMind and everything, just imagine a thousand of you or a million of you in parallel, finding little architectural tweaks.”

It’s quite interesting to hear you say that this is the thing they’re asymmetrically worse at. It’s quite relevant to forecasting whether the AI 2027-type explosion is likely to happen anytime soon.

Andrej Karpathy 00:35:05

That’s a good way of putting it, and you’re getting at why my timelines are a bit longer. You’re right. They’re not very good at code that has never been written before, maybe it’s one way to put it, which is what we’re trying to achieve when we’re building these models.

Dwarkesh Patel 00:35:19

Very naive question, but the architectural tweaks that you’re adding to nanochat, they’re in a paper somewhere, right? They might even be in a repo somewhere. Is it surprising that they aren’t able to integrate that into whenever you’re like, “Add RoPE embeddings” or something, they do that in the wrong way?

Andrej Karpathy 00:35:42

It’s tough. They know, but they don’t fully know. They don’t know how to fully integrate it into the repo and your style and your code and your place, and some of the custom things that you’re doing and how it fits with all the assumptions of the repository. They do have some knowledge, but they haven’t gotten to the place where they can integrate it and make sense of it.

A lot of the stuff continues to improve. Currently, the state-of-the-art model that I go to is the GPT-5 Pro, and that’s a very powerful model. If I have 20 minutes, I will copy-paste my entire repo and I go to GPT-5 Pro, the oracle, for some questions. Often it’s not too bad and surprisingly good compared to what existed a year ago.

Overall, the models are not there. I feel like the industry is making too big of a jump and is trying to pretend like this is amazing, and it’s not. It’s slop. They’re not coming to terms with it, and maybe they’re trying to fundraise or something like that. I’m not sure what’s going on, but we’re at this intermediate stage. The models are amazing. They still need a lot of work. For now, autocomplete is my sweet spot. But sometimes, for some types of code, I will go to an LLM agent.

Dwarkesh Patel 00:36:53

Here’s another reason this is really interesting. Through the history of programming, there have been many productivity improvements—compilers, linting, better programming languages—which have increased programmer productivity but have not led to an explosion. That sounds very much like the autocomplete tab, and this other category is just automation of the programmer. It’s interesting you’re seeing more in the category of the historical analogies of better compilers or something.

Andrej Karpathy 00:37:26

Maybe this gets to one other thought. I have a hard time differentiating where AI begins and stops because I see AI as fundamentally an extension of computing in a pretty fundamental way. I see a continuum of this recursive self-improvement or speeding up programmers all the way from the beginning: code editors, syntax highlighting, or checking even of the types, like data type checking—all these tools that we’ve built for each other.

Even search engines. Why aren’t search engines part of AI? Ranking is AI. At some point, Google, even early on, was thinking of themselves as an AI company doing Google Search engine, which is totally fair.

I see it as a lot more of a continuum than other people do, and it’s hard for me to draw the line. I feel like we’re now getting a much better autocomplete, and now we’re also getting some agents which are these loopy things, but they go off-rails sometimes. What’s going on is that the human is progressively doing a bit less and less of the low-level stuff. We’re not writing the assembly code because we have compilers. Compilers will take my high-level language in C and write the assembly code.

We’re abstracting ourselves very, very slowly. There’s this what I call “autonomy slider,” where more and more stuff is automated—of the stuff that can be automated at any point in time—and we’re doing a bit less and less and raising ourselves in the layer of abstraction over the automation.

00:40:05 – RL is terrible

Dwarkesh Patel 00:40:05

Let’s talk about RL a bit. You tweeted some very interesting things about this. Conceptually, how should we think about the way that humans are able to build a rich world model just from interacting with our environment, and in ways that seem almost irrespective of the final reward at the end of the episode?

If somebody is starting a business, and at the end of 10 years, she finds out whether the business succeeded or failed, we say that she’s earned a bunch of wisdom and experience. But it’s not because the log probs of every single thing that happened over the last 10 years are up-weighted or down-weighted. Something much more deliberate and rich is happening. What is the ML analogy, and how does that compare to what we’re doing with LLMs right now?

Andrej Karpathy 00:40:47

Maybe the way I would put it is that humans don’t use reinforcement learning, as I said. I think they do something different. Reinforcement learning is a lot worse than I think the average person thinks. Reinforcement learning is terrible. It just so happens that everything that we had before it is much worse because previously we were just imitating people, so it has all these issues.

In reinforcement learning, say you’re solving a math problem, because it’s very simple. You’re given a math problem and you’re trying to find the solution. In reinforcement learning, you will try lots of things in parallel first. You’re given a problem, you try hundreds of different attempts. These attempts can be complex. They can be like, “Oh, let me try this, let me try that, this didn’t work, that didn’t work,” etc. Then maybe you get an answer. Now you check the back of the book and you see, “Okay, the correct answer is this.” You can see that this one, this one, and that one got the correct answer, but these other 97 of them didn’t. Literally what reinforcement learning does is it goes to the ones that worked really well and every single thing you did along the way, every single token gets upweighted like, “Do more of this.”

The problem with that is people will say that your estimator has high variance, but it’s just noisy. It’s noisy. It almost assumes that every single little piece of the solution that you made that arrived at the right answer was the correct thing to do, which is not true. You may have gone down the wrong alleys until you arrived at the right solution. Every single one of those incorrect things you did, as long as you got to the correct solution, will be upweighted as, “Do more of this.” It’s terrible. It’s noise.

You’ve done all this work only to find, at the end, you get a single number of like, “Oh, you did correct.” Based on that, you weigh that entire trajectory as like, upweight or downweight. The way I like to put it is you’re sucking supervision through a straw. You’ve done all this work that could be a minute of rollout, and you’re sucking the bits of supervision of the final reward signal through a straw and you’re broadcasting that across the entire trajectory and using that to upweight or downweight that trajectory. It’s just stupid and crazy.

A human would never do this. Number one, a human would never do hundreds of rollouts. Number two, when a person finds a solution, they will have a pretty complicated process of review of, “Okay, I think these parts I did well, these parts I did not do that well. I should probably do this or that.” They think through things. There’s nothing in current LLMs that does this. There’s no equivalent of it. But I do see papers popping out that are trying to do this because it’s obvious to everyone in the field.

The first imitation learning, by the way, was extremely surprising and miraculous and amazing, that we can fine-tune by imitation on humans. That was incredible. Because in the beginning, all we had was base models. Base models are autocomplete. It wasn’t obvious to me at the time, and I had to learn this. The paper that blew my mind was InstructGPT, because it pointed out that you can take the pretrained model, which is autocomplete, and if you just fine-tune it on text that looks like conversations, the model will very rapidly adapt to become very conversational, and it keeps all the knowledge from pre-training. This blew my mind because I didn’t understand that stylistically, it can adjust so quickly and become an assistant to a user through just a few loops of fine-tuning on that kind of data. It was very miraculous to me that that worked. So incredible. That was two to three years of work.

Now came RL. And RL allows you to do a bit better than just imitation learning because you can have these reward functions and you can hill-climb on the reward functions. Some problems have just correct answers, you can hill-climb on that without getting expert trajectories to imitate. So that’s amazing. The model can also discover solutions that a human might never come up with. This is incredible. Yet, it’s still stupid.

We need more. I saw a paper from Google yesterday that tried to have this reflect & review idea in mind. Was it the memory bank paper or something? I don’t know. I’ve seen a few papers along these lines. So I expect there to be some major update to how we do algorithms for LLMs coming in that realm. I think we need three or four or five more, something like that.

Dwarkesh Patel 00:44:54

You’re so good at coming up with evocative phrases. “Sucking supervision through a straw.” It’s so good.

You’re saying the problem with outcome-based reward is that you have this huge trajectory, and then at the end, you’re trying to learn every single possible thing about what you should do and what you should learn about the world from that one final bit. Given the fact that this is obvious, why hasn’t process-based supervision as an alternative been a successful way to make models more capable? What has been preventing us from using this alternative paradigm?

Andrej Karpathy 00:45:29

Process-based supervision just refers to the fact that we’re not going to have a reward function only at the very end. After you’ve done 10 minutes of work, I’m not going to tell you you did well or not well. I’m going to tell you at every single step of the way how well you’re doing. The reason we don’t have that is it’s tricky how you do that properly. You have partial solutions and you don’t know how to assign credit.  So when you get the right answer, it’s just an equality match to the answer. It’s very simple to implement. If you’re doing process supervision, how do you assign in an automatable way, a partial credit assignment? It’s not obvious how you do it.

Lots of labs are trying to do it with these LLM judges. You get LLMs to try to do it. You prompt an LLM, “Hey, look at a partial solution of a student. How well do you think they’re doing if the answer is this?” and they try to tune the prompt.

The reason that this is tricky is quite subtle. It’s the fact that anytime you use an LLM to assign a reward, those LLMs are giant things with billions of parameters, and they’re gameable. If you’re reinforcement learning with respect to them, you will find adversarial examples for your LLM judges, almost guaranteed. So you can’t do this for too long. You do maybe 10 steps or 20 steps, and maybe it will work, but you can’t do 100 or 1,000. I understand it’s not obvious, but basically the model will find little cracks. It will find all these spurious things in the nooks and crannies of the giant model and find a way to cheat it.

One example that’s prominently in my mind, this was probably public, if you’re using an LLM judge for a reward, you just give it a solution from a student and ask it if the student did well or not. We were training with reinforcement learning against that reward function, and it worked really well. Then, suddenly, the reward became extremely large. It was a massive jump, and it did perfect. You’re looking at it like, “Wow, this means the student is perfect in all these problems. It’s fully solved math.”

But when you look at the completions that you’re getting from the model, they are complete nonsense. They start out okay, and then they change to “dhdhdhdh.” It’s just like, “Oh, okay, let’s take two plus three and we do this and this, and then dhdhdhdh.” You’re looking at it, and it’s like, this is crazy. How is it getting a reward of one or 100%? You look at the LLM judge, and it turns out that “dhdhdhdh” is an adversarial example for the model, and it assigns 100% probability to it.

It’s just because this is an out-of-sample example to the LLM. It’s never seen it during training, and you’re in pure generalization land. It’s never seen it during training, and in the pure generalization land, you can find these examples that break it.

Dwarkesh Patel 00:47:52

You’re basically training the LLM to be a prompt injection model.

Andrej Karpathy 00:47:56

Not even that. Prompt injection is way too fancy. You’re finding adversarial examples, as they’re called. These are nonsensical solutions that are obviously wrong, but the model thinks they are amazing.

Dwarkesh Patel 00:48:07

To the extent you think this is the bottleneck to making RL more functional, then that will require making LLMs better judges, if you want to do this in an automated way. Is it just going to be some sort of GAN-like approach where you have to train models to be more robust?

Andrej Karpathy 00:48:22

The labs are probably doing all that. The obvious thing is, “dhdhdhdh” should not get 100% reward. Okay, well, take “dhdhdhdh,” put it in the training set of the LLM judge, and say this is not 100%, this is 0%. You can do this, but every time you do this, you get a new LLM, and it still has adversarial examples. There’s an infinity of adversarial examples.

Probably if you iterate this a few times, it’ll probably be harder and harder to find adversarial examples, but I’m not 100% sure because this thing has a trillion parameters or whatnot. I bet you the labs are trying. I still think we need other ideas.

Dwarkesh Patel 00:48:57

Interesting. Do you have some shape of what the other idea could be?

Andrej Karpathy 00:49:02

This idea of a review solution encompassing synthetic examples such that when you train on them, you get better, and meta-learn it in some way. I think there are some papers that I’m starting to see pop out. I am only at a stage of reading abstracts because a lot of these papers are just ideas. Someone has to make it work on a frontier LLM lab scale in full generality because when you see these papers, they pop up, and it’s just a bit noisy. They’re cool ideas, but I haven’t seen anyone convincingly show that this is possible. That said, the LLM labs are fairly closed, so who knows what they’re doing now.

00:49:38 – How do humans learn?

Dwarkesh Patel 00:49:38

I can conceptualize how you would be able to train on synthetic examples or synthetic problems that you have made for yourself. But there seems to be another thing humans do—maybe sleep is this, maybe daydreaming is this—which is not necessarily to come up with fake problems, but just to reflect.

I’m not sure what the ML analogy is for daydreaming or sleeping, or just reflecting. I haven’t come up with a new problem. Obviously, the very basic analogy would just be fine-tuning on reflection bits, but I feel like in practice that probably wouldn’t work that well. Do you have some take on what the analogy of this thing is?

Andrej Karpathy 00:50:17

I do think that we’re missing some aspects there. As an example, let’s take reading a book. Currently when LLMs are reading a book, what that means is we stretch out the sequence of text, and the model is predicting the next token, and it’s getting some knowledge from that. That’s not really what humans do. When you’re reading a book, I don’t even feel like the book is exposition I’m supposed to be attending to and training on. The book is a set of prompts for me to do synthetic data generation, or for you to get to a book club and talk about it with your friends. It’s by manipulating that information that you actually gain that knowledge. We have no equivalent of that with LLMs. They don’t really do that. I’d love to see during pre-training some stage that thinks through the material and tries to reconcile it with what it already knows, and thinks through it for some amount of time and gets that to work. There’s no equivalence of any of this. This is all research.

There are some subtle—very subtle that I think are very hard to understand—reasons why it’s not trivial. If I can just describe one: why can’t we just synthetically generate and train on it? Because every synthetic example, if I just give synthetic generation of the model thinking about a book, you look at it and you’re like, “This looks great. Why can’t I train on it?” You could try, but the model will get much worse if you continue trying. That’s because all of the samples you get from models are silently collapsed. Silently—it is not obvious if you look at any individual example of it—they occupy a very tiny manifold of the possible space of thoughts about content. The LLMs, when they come off, they’re what we call “collapsed.” They have a collapsed data distribution. One easy way to see it is to go to ChatGPT and ask it, “Tell me a joke.” It only has like three jokes. It’s not giving you the whole breadth of possible jokes. It knows like three jokes. They’re silently collapsed.

You’re not getting the richness and the diversity and the entropy from these models as you would get from humans. Humans are a lot noisier, but at least they’re not biased, in a statistical sense. They’re not silently collapsed. They maintain a huge amount of entropy. So how do you get synthetic data generation to work despite the collapse and while maintaining the entropy? That’s a research problem.

Dwarkesh Patel 00:52:20

Just to make sure I understood, the reason that the collapse is relevant to synthetic data generation is because you want to be able to come up with synthetic problems or reflections which are not already in your data distribution?

Andrej Karpathy 00:52:32

I guess what I’m saying is, say we have a chapter of a book and I ask an LLM to think about it, it will give you something that looks very reasonable. But if I ask it 10 times, you’ll notice that all of them are the same.

Dwarkesh Patel 00:52:44

You can’t just keep scaling “reflection” on the same amount of prompt information and then get returns from that.

Andrej Karpathy 00:52:54

Any individual sample will look okay, but the distribution of it is quite terrible. It’s quite terrible in such a way that if you continue training on too much of your own stuff, you actually collapse.

I think that there’s possibly no fundamental solution to this. I also think humans collapse over time. These analogies are surprisingly good. Humans collapse during the course of their lives. This is why children, they haven’t overfit yet. They will say stuff that will shock you because you can see where they’re coming from, but it’s just not the thing people say, because they’re not yet collapsed. But we’re collapsed. We end up revisiting the same thoughts. We end up saying more and more of the same stuff, and the learning rates go down, and the collapse continues to get worse, and then everything deteriorates.

Dwarkesh Patel 00:53:39

Have you seen this super interesting paper that dreaming is a way of preventing this kind of overfitting and collapse? The reason dreaming is evolutionary adaptive is to put you in weird situations that are very unlike your day-to-day reality, so as to prevent this kind of overfitting.

Andrej Karpathy 00:53:55

It’s an interesting idea. I do think that when you’re generating things in your head and then you’re attending to it, you’re training on your own samples, you’re training on your synthetic data. If you do it for too long, you go off-rails and you collapse way too much. You always have to seek entropy in your life. Talking to other people is a great source of entropy, and things like that. So maybe the brain has also built some internal mechanisms for increasing the amount of entropy in that process. That’s an interesting idea.

Dwarkesh Patel 00:54:25

This is a very ill-formed thought so I’ll just put it out and let you react to it. The best learners that we are aware of, which are children, are extremely bad at recollecting information. In fact, at the very earliest stages of childhood, you will forget everything. You’re just an amnesiac about everything that happens before a certain year date. But you’re extremely good at picking up new languages and learning from the world. Maybe there’s some element of being able to see the forest for the trees.

Whereas if you compare it to the opposite end of the spectrum, you have LLM pre-training, where these models will literally be able to regurgitate word-for-word what is the next thing in a Wikipedia page. But their ability to learn abstract concepts really quickly, the way a child can, is much more limited. Then adults are somewhere in between, where they don’t have the flexibility of childhood learning, but they can memorize facts and information in a way that is harder for kids. I don’t know if there’s something interesting about that spectrum.

Andrej Karpathy 00:55:19

I think there’s something very interesting about that, 100%. I do think that humans have a lot more of an element, compared to LLMs, of seeing the forest for the trees. We’re not actually that good at memorization, which is actually a feature. Because we’re not that good at memorization, we’re forced to find patterns in a more general sense.

LLMs in comparison are extremely good at memorization. They will recite passages from all these training sources. You can give them completely nonsensical data. You can hash some amount of text or something like that, you get a completely random sequence. If you train on it, even just for a single iteration or two, it can suddenly regurgitate the entire thing. It will memorize it. There’s no way a person can read a single sequence of random numbers and recite it to you.

That’s a feature, not a bug, because it forces you to only learn the generalizable components. Whereas LLMs are distracted by all the memory that they have of the pre-training documents, and it’s probably very distracting to them in a certain sense. So that’s why when I talk about the cognitive core, I want to remove the memory, which is what we talked about. I’d love to have them have less memory so that they have to look things up, and they only maintain the algorithms for thought, and the idea of an experiment, and all this cognitive glue of acting.

Dwarkesh Patel 00:56:36

And this is also relevant to preventing model collapse?

Andrej Karpathy 00:56:41

Let me think. I’m not sure. It’s almost like a separate axis. The models are way too good at memorization, and somehow we should remove that. People are much worse, but it’s a good thing.

Dwarkesh Patel 00:56:57

What is a solution to model collapse? There are very naive things you could attempt. The distribution over logits should be wider or something. There are many naive things you could try. What ends up being the problem with the naive approaches?

Andrej Karpathy 00:57:11

That’s a great question. You can imagine having a regularization for entropy and things like that. I guess they just don’t work as well empirically because right now the models are collapsed. But I will say most of the tasks that we want from them don’t actually demand diversity. That’s probably the answer to what’s going on.

The frontier labs are trying to make the models useful. I feel like the diversity of the outputs is not so much... Number one, it’s much harder to work with and evaluate and all this stuff, but maybe it’s not what’s capturing most of the value.

Dwarkesh Patel 00:57:42

In fact, it’s actively penalized. If you’re super creative in RL, it’s not good.

Andrej Karpathy 00:57:48

Yeah. Or maybe if you’re doing a lot of writing, help from LLMs and stuff like that, it’s probably bad because the models will silently give you all the same stuff. They won’t explore lots of different ways of answering a question.

Maybe this diversity, not as many applications need it so the models don’t have it. But then it’s a problem at synthetic data generation time, et cetera. So we’re shooting ourselves in the foot by not allowing this entropy to maintain in the model. Possibly the labs should

Post a Comment

Previous Post Next Post