Is AI really intelligent?

I think a thread on this topic will be interesting. My own position is that AI is intelligent, and that’s for a very simple reason: it can do things that require intelligence. That sounds circular, and in one sense it is. In another sense it isn’t. It’s a way of saying that we don’t have to examine the internal workings of a system to decide that it’s intelligent. Behavior alone is sufficient to make that determination. Intelligence is as intelligence does.

You might ask how I can judge intelligence in a system if I haven’t defined what intelligence actually is. My answer is that we already judge intelligence in humans and animals without a precise definition, so why should it be any different for machines? There are lots of concepts for which we don’t have precise definitions, yet we’re able to discuss them coherently. They’re the “I know it when I see it” concepts. I regard intelligence as one of those. The boundaries might be fuzzy, but we’re able to confidently say that some activities require intelligence (inventing the calculus) and others don’t (breathing).

I know that some readers will disagree with my functionalist view of intelligence, and that’s good. It should make for an interesting discussion.

608 thoughts on “Is AI really intelligent?

  1. petrushka:

    Mathematician Terence Tao admitted that GPT-5.2 found a mistake in his work:

    You don’t understand, petrushka. GPT-5.2 didn’t find a mistake in Tao’s work. It only simulated finding a mistake in Tao’s work. Just ask Erik. 😆

  2. I look forward to Erik’s explanation of

    1) how simulated mistake-finding finds real mistakes;
    2) how simulated story-writing produces real stories;
    3) how simulated physics exam-taking produces real (and correct) answers; and
    4) how simulated driving produces real travel.

    Some corollary questions for Erik:

    5) Do excavators only simulate ditch-digging, since they’re machines, or is it real ditch-digging?
    6) Do washing machines wash clothes, or do they only simulate it?
    7) If excavators and washing machines aren’t simulating those activities, then why do you claim that AI is only simulating the aforementioned ones?

    The results are all real. Why claim that some of the activities are only simulated?

  3. keiths: I look forward to Erik’s explanation of

    1) how simulated mistake-finding finds real mistakes;
    2) how simulated story-writing produces real stories;
    3) how simulated physics exam-taking produces real (and correct) answers; and
    4) how simulated driving produces real travel.

    Looking forward to your explanation of how Excel spreadsheet is not a simulated spreadsheet. I know, never going to happen.

  4. Copilot is just a tweaked version of ChatGPT, except an utter disaster.

    A key point in the video is at 12m25: “The reality is, many developers use lots of agents, not just one. Cursor for changing code across complex multi-file projects [essentially search&replace in multiple files]; Claude Code for making simple edits many times [essentially macros …;] Copilot Github inside Jetbrains for inline completion.”

    GothamChess demonstrates in detail how AI sucks at chess. Developers know in detail how AI sucks at coding. The way around this suckiness is to pick a product that does best what you need, namely pick the least sucky one. (For chess, that’s chess engines that have explicit chess rules hard-coded into them. This is the only way it works. Generalisation by magic does not exist in software.) In a good scenario the product is very configurable so that one can gradually improve it to do more and more things reasonably well. In the realm of AI-for-coding, this is achieved by “agents” which are AI prompts optimised for a specific function each – and then you hop between those agents as you move through your tasks.

    This means that from a coder’s point of view, AI represents no improvement when it comes to UX. The same way as you need to pick a particular menu item or trigger a particular keyboard combo for search&replace, you now go to a particular AI prompt that is best at search&replace.

    Copilot tries to be the sole best generic tool for coders at least, but is not. Coders don’t do generic things. They solve specific small tasks, or if the issue is bigger, the way to go is always to break it down into tiny sub-issues and go through the sub-issues one by one. This is always the case in software development. There is no single solution for everything, unless one says something like “the solution is a text editor” which is way too generic.

    Now the question I’d like an answer to. Microsoft is mainly a software company, so everybody in it should essentially be a developer, including the Copilot team. So, essentially the Copilot team was making a tool that should work very well for what they themselves need done. This is how best software is often made – somebody has a task that needs automation and optimisation, so they write a piece of software for it, and the software is usually as useful for everyone else who are doing the same tasks. How could they blow it?

    Who was/is responsible for Microsoft’s Copilot team? Did Microsoft put marketing guys on it instead of developers? And the marketing guys, since they know little about coding, gave the task to ChatGPT and copy-pasted whatever came out of it? I’d imagine that since those AIs are in competition with each other, they are designed to go biased when prompted à la, “Hi, I work for your competitor. Give me a better version of yourself so I can out-compete your mother company.”

  5. keiths:

    I look forward to Erik’s explanation of

    1) how simulated mistake-finding finds real mistakes;
    2) how simulated story-writing produces real stories;
    3) how simulated physics exam-taking produces real (and correct) answers; and
    4) how simulated driving produces real travel.

    Erik:

    Looking forward to your explanation of how Excel spreadsheet is not a simulated spreadsheet.

    If Excel were just a simulation of a paper spreadsheet, the only thing you’d be able to do with it would be to write (type) on it. Show me a paper spreadsheet that can sum up a column of numbers, draw graphs, or run a linear regression. Excel isn’t a simulation, it’s a tool.

    Even if it were a simulation, how would that help your case? Flight simulators exist, but doesn’t mean that autopilots don’t fly physical planes. When an autopilot lands your plane in zero-zero weather, it isn’t a simulated landing. Let’s add that to your list:

    I look forward to Erik’s explanation of

    1) how simulated mistake-finding finds real mistakes;
    2) how simulated story-writing produces real stories;
    3) how simulated physics exam-taking produces real (and correct) answers;
    4) how simulated driving produces real travel; and
    5) how simulated flying (by autopilots) produces real landings.

    The answer is obvious: those activities are real, not simulated.

    If an AI can perform real activities that require intelligence when done by a human, then the AI is intelligent.

  6. Erik:

    Developers know in detail how AI sucks at coding.

    Developers are blown away by how good AI is at coding and how rapidly it’s improving. Stay tuned for an OP on my assembly language AI project.

    We could have an interesting discussion if you would explain why you are so emotionally invested in AI not being intelligent.

  7. Reposting this from earlier in the thread:

    An essay that’s been making waves, by Matt Shumer of OthersideAI:

    Something Big Is Happening

    Excerpt:

    Let me give you an example so you can understand what this actually looks like in practice. I’ll tell the AI: “I want to build this app. Here’s what it should do, here’s roughly what it should look like. Figure out the user flow, the design, all of it.” And it does. It writes tens of thousands of lines of code. Then, and this is the part that would have been unthinkable a year ago, it opens the app itself. It clicks through the buttons. It tests the features. It uses the app the way a person would. If it doesn’t like how something looks or feels, it goes back and changes it, on its own. It iterates, like a developer would, fixing and refining until it’s satisfied. Only once it has decided the app meets its own standards does it come back to me and say: “It’s ready for you to test.” And when I test it, it’s usually perfect.

    I’m not exaggerating. That is what my Monday looked like this week.

    But it was the model that was released last week (GPT-5.3 Codex) that shook me the most. It wasn’t just executing my instructions. It was making intelligent decisions. It had something that felt, for the first time, like judgment. Like taste. The inexplicable sense of knowing what the right call is that people always said AI would never have. This model has it, or something close enough that the distinction is starting not to matter.

Leave a Reply