Published on [Permalink]
Reading time: 4 minutes

If you’re going to cover LLMs, you ought to know how they work and what they are (and are not) capable of.

Brad Delong, responding to a tweet by Timothy B. Lee about Google’s Gemini:

‘My guess—and at this point it’s only a guess—is that we’re starting to see diminishing returns to scaling up conventional LLMs. Further progress may require significant changes to enable them to better handle long, complex inputs and to reason more effectively about abstract concepts…

This last phrase is utter bilge.

There is no way that conventional LLMs can “reason more effectively about abstract concepts” because they do not and cannot reason about abstract concepts at all.

They make one-word-ahead predictions.

They make them very plausibly.

Delong is spot on here.

It continues to shock (but not surprise) me that the tech journalists who are covering this stuff are so sloppy in how they talk about how large language models work and what these systems are capable of.

I heard this same pattern on an episode of Ezra Klein’s podcast a few weeks ago. He was talking to Kevin Roose and Casey Newton about LLM progress over the course of the last year, and at one point, they got to talking about whether systems like GPT-4 and Bard will be able to do stuff beyond helping write emails and documents. Klein said he is “interested in programs and the models that can create things that don’t exist,” like curing diseases and finding new energy sources.

Newton responded:

Well, to get there, you need systems that can reason. And right now the systems that we have just aren’t very good at reasoning. I think that over the past year we have seen them move a little away from the way that I was thinking of them a year ago, which was a sort of fancy autocomplete. It’s making a prediction about what the next word will be. It’s still true that they do it that way. But it is able to create a kind of facsimile of thought that can be interesting in some ways.

But you just can’t get to where you’re going, Ezra, with a facsimile of thought. You need something that has improved reasoning capabilities. So maybe that comes with the next generation of frontier models. But until then, I think you’ll be disappointed.

Let’s be clear. These systems do not reason. It’s not that they “just aren’t very good” at reasoning; they do nothing like reasoning. Newton does sort of allude to that in the second part of his answer, but only as an afterthought, and that is as close as they get in the entire discussion to discussing the limits of these tools.

The reason I get so worked up about these conversations is that journalists like Roose and Newton (and Klein himself) are how most of the public—who are not in the weeds of large language models—view and understand these systems’ capabilities.

Actually, that’s not quite right. I think most of the public’s understanding of large language models and the tools they power (like ChatGPT) is almost entirely informed by what they have seen in science fiction—HAL 9000, the droids in Star Wars, the ship computer in Star Trek. That’s what people think of when they hear the phrase “artificial intelligence” bandied about in the news or read about it online. They imagine creatures that think and reason the way humans do, except with brains of silicon and metal and plastic. You ask it a question or tell it to do a task, and you expect it to go off, think about it, and then come back to you with an answer or a result.

That is totally understandable, but also totally wrong. It’s important because thinking about ChatGPT as being more sophisticated than it is—thinking that it is or soon will be able to reason—is a category error. It leads to all sorts of confusion about what we should do with these systems and what we should do about them.1

And given that is how most people approach this conversation, I feel that journalists like Roose and Newton (and Timothy B. Lee, whose quote started off this post) have a duty to be clear about this stuff when they talk about it and when they write about. My worry is that even these guys seem to not really understand how these systems work, and tend to be much more interested in the business drama that surrounds them and the companies that build them.

  1. That’s before we even get to the question of the financial incentives that drive the companies that build this stuff, as opposed to the sci-fi versions, where the robots all seem to be free. But that is a topic for a whole other post, I fear. ↩︎

✍️ Reply by email

✴️ Also on another weblog yet another weblog