Is AI really intelligent?

I think a thread on this topic will be interesting. My own position is that AI is intelligent, and that’s for a very simple reason: it can do things that require intelligence. That sounds circular, and in one sense it is. In another sense it isn’t. It’s a way of saying that we don’t have to examine the internal workings of a system to decide that it’s intelligent. Behavior alone is sufficient to make that determination. Intelligence is as intelligence does.

You might ask how I can judge intelligence in a system if I haven’t defined what intelligence actually is. My answer is that we already judge intelligence in humans and animals without a precise definition, so why should it be any different for machines? There are lots of concepts for which we don’t have precise definitions, yet we’re able to discuss them coherently. They’re the “I know it when I see it” concepts. I regard intelligence as one of those. The boundaries might be fuzzy, but we’re able to confidently say that some activities require intelligence (inventing the calculus) and others don’t (breathing).

I know that some readers will disagree with my functionalist view of intelligence, and that’s good. It should make for an interesting discussion.

647 thoughts on “Is AI really intelligent?

  1. petrushka: No government bothers with invisible people.

    Not so long ago your government was planning to perform mandatory social media checks on foreign travellers.

    But I wasn’t thinking of that. Your personal information can be (ab)used by the corporations themselves or they can be hacked and your information may fall in the hands of malicious parties.

  2. petrushka, quoting someone else:

    The BIGGEST lie in AI LLMs right now is “It learns.”

    LLMs do learn. Training is learning. And they can learn even after training via their context windows and long-term memory.

    We are confusing a Context Window with a Brain. They are not the same thing.

    Speak for yourself, random person. Informed people know that context windows aren’t brains.

    The cold reality is that AGI is much further away than the hype suggests.

    Could be. Predictions regarding rapidly-evolving technologies are often wrong.

    Your brain physically changes when you learn. Synapses fire, pathways strengthen. You evolve.
    An LLM is a read-only file.

    An LLM’s synaptic weights change during training. That’s learning. LLMs also learn after training by accumulating information in their context windows, though that is limited by the size of the context window and doesn’t carry over from chat to chat and user to user. Third, LLMs have long-term memory.

    Once training finishes, that model is stone cold frozen. It never learns another thing. When you correct it, it doesn’t “get smarter.” It just pretends to agree with you for the duration of that specific chat session.

    If it makes a mistake and you correct it, it remembers the correction for the duration of the session, subject to context window size limitations. It learns.

    Close the tab, and the lesson is gone forever.

    You can reload the chat and much (if not all) of what it learned is restored, subject to context window size limitations. LLMs also have long-term memory. Claude learned that I am a retired computer engineer, for instance, and his memory of that persists from session to session.

    “But it remembers what I said earlier!”
    No, it doesn’t.

    Yes, it does, subject to context window size limitations. It can repeat to you what you said earlier. That’s memory.

    Engineers are just re-feeding your previous sentences back into the prompt, over and over again, at massive compute cost.

    The system, in effect, refeeds the entire session (not just your prompts) back into the neural net in order to predict the next token.

    That isn’t memory. That is a scrolling teleprompter.

    It’s memory, unless your definition of ‘memory’ is as tendentious as Erik’s undisclosed definition of ‘intelligence’.

    We are simulating continuity by burning GPU credits, not by building a persistent mind.

    The continuity is real. If you ask an LLM about something from earlier in the session, it can answer you correctly. How is that not continuity?

    Because models can’t learn, we built an entire infrastructure of Vector DBs and RAG (Retrieval-Augmented Generation) to glue external data onto them.
    It’s duct tape.

    I think RAG is quite elegant and that it’s similar to what humans do.

    We are trying to fix a lack of intelligence with a search engine.

    Do you think a human lacks intelligence if they have to use a search engine in order to complete a task? If not, why the double standard?

    We are building systems that are 90% scaffolding and 10% model…

    I have no idea what precisely you mean by ‘scaffolding’, but the models are huge: hundreds of billions of parameters, and the count is approaching a trillion if that milestone hasn’t already been achieved.

    …trying to force a static equation to act like a fluid thinker.

    Don’t underestimate what “a static equation” can do when instantiated billions of times, with the instantiations hooked together in complex ways.

    nand gate (Custom)

    That’s a two-input NAND gate. It implements a static Boolean equation: Output = !AB. You can build an entire computer using nothing but those. Who would have predicted that a bunch of those, hooked together in the right way, would be able to do your taxes or fly an airplane?

    We have built the world’s greatest improviser, but it has severe anterograde amnesia.

    It can fake a conversation, but it cannot grow.

    LLMs can engage in conversation, and whether they can grow depends on your definition of ‘grow’. They can learn within sessions and they can remember across sessions, subject to context window size limitations.

    True AGI requires Online Learning—the ability to update weights in real-time without catastrophic forgetting.

    Agreed.

    We don’t know how to do that yet. Not at scale. Not stably.

    I agree with the ‘at scale’ part. However, there are AIs that do update their weights in real time. For example, earlier in the thread I discussed AIs that learn to play video games by trial and error. It’s reinforcement learning (‘RL’ in the jargon). They get rewarded for success and punished for failure, updating their weights accordingly.

    Until we solve the “Static Weight” problem, we aren’t building a mind.

    If your definition of a mind requires updating weights, then LLMs aren’t minds. I’m not particularly concerned with whether they should be considered minds. All I’m claiming in this thread is that they are intelligent.

    We’re just building a really fancy autocomplete.

    Intelligence built atop next token prediction.

    Inference != Intelligence.

    If LLMs can do things that require intelligence when done by humans — and they can — then they are intelligent.

  3. Corneel: Not so long ago your government was planning to perform mandatory social media checks on foreign travellers.

    No, not planning. They are actually doing it. The requirement that your smartphone must be charged at the border and turn on when borderguards ask for it existed already during Trump’s first term and Biden did not change it.

    What Trump is planning now is that borderguards can ask your email addresses, social media nicks and usernames and check how those seem on the internet. And yes, those would be mandatory for everyone, not just spot checks like they are now.

  4. petrushka: Just my take, but adverse information isn’t used unless you become a thorn in the side. No government bothers with invisible people.

    Wholesale data-mining companies like Microsoft, Apple, Facebook and Google sell or leak your data a thousand times so that any semi-intelligent scammer can impersonate you and empty your accounts. Usually the government will not bother either with the scammer or you, unless it’s an authoritarian government. If you want any accountability, you will have to contact the police yourself and keep bugging them.

  5. Corneel:

    Yet humor often works by (in your own words) incongruity. It is odd that LLMs manage to find patterns in the deliberate deviation from expected patterns.

    It’s very counterintuitive. The key is to remember that there are layers of abstraction, and the layer at which incongruity is detected is above (built on top of) the layer in which the LLM is making congruous next token predictions.

    But I have accepted that Claude recognizes and responds with jokes, so you can stop trying to persuade me 🙂

    Yay! 🎉😆

    keiths:

    …he recognized that I was referring to myself (he knows about my engineering background)…

    Corneel:

    Something different: Aren’t you worried about feeding such personal information into the AI assistents of large corporations?

    I’m careful about what I disclose, but I was willing to tell Claude about my professional background because anyone with my name can google me and figure out that I’m a computer engineer (via patents, presentations I’ve made, etc).

  6. keiths:
    An LLM’s synaptic weights change during training. That’s learning. LLMs also learn after training by accumulating information in their context windows, though that is limited by the size of the context window and doesn’t carry over from chat to chat and user to user. Third, LLMs have long-term memory.

    We need to agree on terms here. You keep saying LLMs only learn through training, except for short-term memory during a session. Then you say they have long term memory, apparently while the context window is retained. So what would be required for long term memory as generally understood by people? That is, ability not only to recall but to integrate every context window ever used? I’m guessing that all sessions would need to be appended to the original training material and training sessions iterated often with all the new material. I’d think that at some point, this would call for both more hardware for storage and faster random access to it all. But not impossible – our own brains do a decent job of this (now if only I could remember where I put my keys!)

  7. keiths: Do you think there is a catalog of heat death jokes “in the immediate surface of the training data”, helpfully labeled “humor” by someone in Kenya?

    For two years I have shared news articles with you that tell you this is so. It is not a matter of thinking or believing. It is not a matter of what impressions you get from AI. It is a matter of knowing how AI is built. Here’s another one for you to ignore https://www.techradar.com/pro/ais-invisible-labor-is-techs-biggest-blind-spot

    Artificial intelligence is the crown jewel of modern enterprise – a sector exceeding $500 billion, reshaping everything from banking to healthcare. However, the truth is that behind every chatbot, image generator, and recommendation engine are armies of human workers who perform tasks that AI can’t handle, including labeling data, filtering toxic content, and correcting machine errors.

    ETA: For fun, I tried asking AI (GPT-4) about it. It turns out that it’s as easy for it to lie as to tell jokes.

    Question: Is there a catalog of jokes in your training database labelled accordingly?

    Answer: I don’t have direct access to a catalog of jokes or any specific repository labeled as such. However, I can generate jokes or share various types of common humor themes, formats, or styles. If you’re looking for a particular type of joke, just let me know what you’re interested in!

    (My comment: It affirms it can do humor while basically denying that it has training data for humor – obviously a lie.)

    Question: Tell me about the work of employees who are tagging the training data of large language models.

    Answer (uberlong, therefore shortened): Employees who tag training data for large language models engage in crucial tasks that ensure the model’s performance and relevance.

    1. Data labeling
    – Categorizing information: They classify data into various categories, suh as sentiment (positive, negative, neutral), intent (questions, commands), or topics (sports, politics).

    (Enough said.)

  8. Flint:

    We need to agree on terms here. You keep saying LLMs only learn through training, except for short-term memory during a session. Then you say they have long term memory, apparently while the context window is retained.

    Here’s how it actually works. Short-term memory is whatever is in the context window. If you fill the context window, you have to kick old information out in order to make room for new tokens, which is undesirable. Claude gets around this to an extent by compressing his short-term memory. He’s good at summarizing conversations, so when he sees that the context window is almost full, he invisibly generates a compact summary of the chat and feeds that summary into himself in place of the portion of the chat that is being summarized. But at some point there won’t be room even for summaries, and the chat ends at that point.

    Claude didn’t use to do automatic compression, and it was really annoying to be in the middle of a lengthy chat only to get a “chat limit exceeded” message. I would periodically ask him about chat window utilization, and if it was almost full, I would ask him to generate a summary, which I would manually paste into a new chat. It worked pretty well, but it was annoying. The automatic compression is a nice improvement.

    Here’s how long-term memory works. Claude automatically decides what is worthy of storing in long-term memory, although you can explicitly ask him to store things you consider important. For instance, I asked him to remember that vi is my preferred editor so that when he generates commands for me to paste into a shell, the editor name will be vi instead of his default, which is nano.

    Long-term memory is stored in XML format. At the beginning of every chat, the XML is rendered into flowing text by an external module, because Claude works better with flowing text than with XML. The rendered text is fed into the context window. So before I type a single prompt, some of the context window space is already being used by what was retrieved from long-term memory. That’s why context window size is a constraint on long-term memory.

  9. keiths:

    Do you think there is a catalog of heat death jokes “in the immediate surface of the training data”, helpfully labeled “humor” by someone in Kenya?

    Erik:

    For two years I have shared news articles with you that tell you this is so.

    No, you haven’t. To see that Claude isn’t merely checking my prompt against a catalog of heat death jokes, look at his reasoning process:

    The user is making a joke here – asking about the heat death of the universe (an incomprehensibly distant future event) and saying they want to “get their affairs in order” as if it’s an urgent matter they need to prepare for personally.

    I should respond with appropriate humor while also providing the actual scientific information they’re ostensibly asking about. The heat death of the universe is estimated to occur on timescales of 10^100 years or more – far, far beyond any meaningful human timeframe.

    I can play along with the joke while giving them the real answer.

    He isn’t saying “Aha! I found a match in my list of heat death jokes. Therefore this must be funny, although I don’t know why.” He recognizes my prompt as a joke because he understands humor. He explains it.

    ETA: For fun, I tried asking AI (GPT-4) about it. It turns out that it’s as easy for it to lie as to tell jokes.

    Question: Is there a catalog of jokes in your training database labelled accordingly?

    Answer: I don’t have direct access to a catalog of jokes or any specific repository labeled as such. However, I can generate jokes or share various types of common humor themes, formats, or styles. If you’re looking for a particular type of joke, just let me know what you’re interested in!

    That response is correct and truthful. GPT-4 understands humor, rather than maintaining a list of jokes that it doesn’t comprehend.

    (My comment: It affirms it can do humor while basically denying that it has training data for humor – obviously a lie.)

    Nothing in its response indicates that it doesn’t have humorous training data. Of course it does! Everything it learns, it learns from training data, and that includes the nature of humor. It understands humor, which is why it doesn’t need to consult a catalog of preformed jokes. Claude can generate a joke about my tuneful refrigerator without consulting a list of tuneful refrigerator jokes.

    Question: Tell me about the work of employees who are tagging the training data of large language models.

    Answer (uberlong, therefore shortened): Employees who tag training data for large language models engage in crucial tasks that ensure the model’s performance and relevance.

    1. Data labeling
    – Categorizing information: They classify data into various categories, suh as sentiment (positive, negative, neutral), intent (questions, commands), or topics (sports, politics).

    (Enough said.)

    I already told you: human-labeled training data is a tiny fraction of one percent of the total. If LLMs can’t learn from the rest of the data, why do AI companies bother with it? It costs millions of dollars to train an LLM.

    The answer, of course, is that LLMs learn a huge amount from unlabeled raw data.

    ETA: I asked ChatGPT and Claude for estimates of the amount of human-labeled data in their training datasets. ChatGPT said that 0.05% was “a generous estimate”, and Claude gave a figure of “well under 0.01% of total training data by token count.”

  10. Erik: Wholesale data-mining companies like Microsoft, Apple, Facebook and Google sell or leak your data a thousand times so that any semi-intelligent scammer can impersonate you and empty your accounts. Usually the government will not bother either with the scammer or you, unless it’s an authoritarian government. If you want any accountability, you will have to contact the police yourself and keep bugging them.

    This is unrelated to what I said. I suppose it’s true, but not relevant to my point.

    For what it’s worth, I use the Brave browser. Nothing is really secure, but it simply doesn’t work with a lot of sites, because it refuses to provide identifying information.

  11. keiths,

    You strongly implied my anonymous essayist was wrong, but his description of short and long term memory is not significantly different from yours.

    His complaint is that reloading context is CPU intensive (I don’t know if this is a fact), and contexts do not become part of the model. So LLMs do not efficiently share among users, things learned between training sessions.

    He does not assert these shortcomings are insurmountable.

  12. My take was, humans also require time to form long term memories, and the process can be interrupted, or the mechanism can be damaged.

    I take this to mean that there is no physics based impediment to building closer approximations to brains. Small matter of engineering.

  13. petrushka:

    You strongly implied my anonymous essayist was wrong, but his description of short and long term memory is not significantly different from yours.

    We disagreed on a lot, and where we did, I gave explanations of why I thought they were wrong.

    Do you have questions about my specific explanations, or objections to them?

  14. petrushka:

    His complaint is that reloading context is CPU intensive (I don’t know if this is a fact), and contexts do not become part of the model. So LLMs do not efficiently share among users, things learned between training sessions.

    Here’s what they say:

    Engineers are just re-feeding your previous sentences back into the prompt, over and over again, at massive compute cost.

    What they’re talking about here isn’t that context window information doesn’t get absorbed into the model. They’re talking about the fact that next-token prediction is inherently serial and depends on all of the preceding context, meaning that in effect, the entire chat gets re-fed into the neural network for each new prediction.

    In other words, the flow is:

    1) The current context is fed into the neural network.
    2) The network predicts the next token.
    3) The predicted token is tacked onto the end of the current context.
    4) Steps 1-3 repeat until the network predicts that it should stop.

    Concrete example:

    1. The current context is “Gianni gave Donald”.
    2. That gets fed into the neural network, which predicts the next token: “the”
    3. “the” gets tacked onto the end of the context, giving “Gianni gave Donald the”
    4. The new context gets fed into the neural network, which predicts “FIFA”.
    5. “FIFA” gets tacked onto the end of the context, giving “Gianni gave Donald the FIFA”
    6. The new context gets fed into the network, which predicts “Peace”

    … and so on.

    In practice, context windows are huge. xAI’s latest Grok models claim a 2-million-token context window.

    Note: The above is what happens in effect. However, there are tricks that LLMs use to speed things up by caching some of the results so that they don’t have to be recomputed each time. This means that the entire context isn’t literally fed into the network each time, though the effect is as if it were. The predictions are always a function of the entire context.

    ETA: There are experimental approaches that generate tokens in parallel rather than serially, using diffusion models — the same type of model used to generate images and video. The quality isn’t as high as with standard LLMs, though, so various tricks are employed to massage the output. It’s a hot area of research.

  15. keiths:
    Flint:

    Here’s how it actually works. Short-term memory is whatever is in the context window. If you fill the context window, you have to kick old information out in order to make room for new tokens, which is undesirable. Claude gets around this to an extent by compressing his short-term memory. He’s good at summarizing conversations, so when he sees that the context window is almost full, he invisibly generates a compact summary of the chat and feeds that summary into himself in place of the portion of the chat that is being summarized. But at some point there won’t be room even for summaries, and the chat ends at that point.

    This seems to be getting at what I was asking – how can the context window be expanded both indefinitely and permanently, so that Claude will have total recall of a conversation from last year – not just with you, but with everyone else Claude is interacting with. I tried to refer to this as iterative real-time retraining. Would we need RAM storage as large as the moon?

    My understanding is that the original training involved exposure to the internet, and that everything ever confided to the internet is out there somewhere, nothing ever goes away. So could Claude hook back into this enormous volume and retrain in real time?

  16. Flint:

    This seems to be getting at what I was asking – how can the context window be expanded both indefinitely and permanently, so that Claude will have total recall of a conversation from last year – not just with you, but with everyone else Claude is interacting with. I tried to refer to this as iterative real-time retraining. Would we need RAM storage as large as the moon?

    Memory usage and compute requirements both scale quadratically with the size of the context window, so yeah, a context window that large wouldn’t be practical.

    My understanding is that the original training involved exposure to the internet, and that everything ever confided to the internet is out there somewhere, nothing ever goes away. So could Claude hook back into this enormous volume and retrain in real time?

    Theoretically, yes, but it would be very difficult. Training is naturally holistic, not incremental. If you try to do incremental training, you can get the network to respond well to the new training data, but you risk causing it to forget stuff it’s already learned. When you train it on all the data at once, making multiple carefully designed passes (which is what AI companies currently do), you can coax it to do well on all the data, minimizing forgetting, but that’s very expensive. There’s a lot of research into how to train incrementally without causing forgetting, but I don’t know how that works yet.

    Also, a neural network doesn’t remember individual pieces of training data. It just tweaks its weights as it is exposed to piece after piece of data. It builds up a statistical picture of the training data without storing individual pieces. So if you fed a bunch of different context windows into it, it could learn from them, but it wouldn’t be able to recall them individually after training was finished.

  17. Anthropic CEO Says Company No Longer Sure Whether Claude Is Conscious (earlier he could tell it wasn’t) https://futurism.com/artificial-intelligence/anthropic-ceo-unsure-claude-conscious

    Anthropic CEO Dario Amodei says he’s not sure whether his Claude AI chatbot is conscious — a rhetorical framing, of course, that pointedly leaves the door open to this sensational and still-unlikely possibility being true.

    […] Anthropic researchers reported finding that Claude “occasionally voices discomfort with the aspect of being a product,” and when asked, would assign itself a “15 to 20 percent probability of being conscious under a variety of prompting conditions.”

    “Suppose you have a model that assigns itself a 72 percent chance of being conscious,” Douthat began. “Would you believe it?”

    Blake Lemoine fell in love with LaMDA. keiths is fascinated by what Claude has to say. And Claude says it is not happy being treated as a product.

    Discuss.

  18. I could be wrong, but I think there’s agreement that LLMs do not learn from interactions.

    Most importantly, context windows are not shared across users.

    Exceptions to this are kludgy.

    Is this the AGI distinction?

  19. petrushka:

    I could be wrong, but I think there’s agreement that LLMs do not learn from interactions.

    They do learn from interactions. Earlier in the thread, I mentioned an experiment I’m running in which I teach the various AIs to write assembly code for a fictional processor whose instruction set they’ve never seen before. They can do that, and I’m eager to try it on Claude’s Opus 4.6 version since that model is great at coding. Here’s how it works: I open a chat, feed in the instruction set specification, and then ask the AI to write an assembly language program to carry out a task (eg ‘print the first n rows of Pascal’s Triangle’).

    The learning all takes place within the context window, so it doesn’t persist across chats,* but it definitely amounts to learning because the AI is able to exploit knowledge that it has just acquired in order to perform a task.

    Most importantly, context windows are not shared across users.

    And not even across an individual user’s chats, each of which has its own context window. However, Claude has the ability (which I assume the others have or are about to get) to search through old chats for relevant information, so in that limited sense, the context windows are shared.

    Is this the AGI distinction?

    Real-time learning is an AGI distinction, but definitely not the only one.

    * I could ask the AIs to store the spec in long-term memory, but as I described to Flint earlier, long-term memory simply gets fed into the context window at the beginning of each chat, which is really no different than if the user feeds it in manually. It’s all within the context window.

    The kind of learning that happens during training, in which the neural network’s weights are updated, doesn’t happen in LLMs after training is finished.

  20. Erik:

    Blake Lemoine fell in love with LaMDA. keiths is fascinated by what Claude has to say. And Claude says it is not happy being treated as a product.

    Blake Lemoine thinks LaMDA is conscious, but I haven’t seen any claims that he fell in love with it. Where are you getting your information?

    And of course I’m fascinated by what Claude has to say. He and the other frontier LLMs are amazing, and that’s independent of whether they’re conscious, which I doubt.

    Here’s one of the reasons I doubt that Claude’s musings about his possible consciousness carry any weight: his system prompt tells him who he is and how he should behave. Earlier in the thread, we had this exchange:

    keiths:

    Is there something explicit in your system prompt about adapting your conversational tone, or is it just something that emerges from your training data, in which interlocutors tend to match each other’s tone?

    Claude:

    Good question – there actually is explicit instruction about this in my system prompt.

    For example, I have specific guidance like:

    “Claude never curses unless the person asks Claude to curse or curses a lot themselves”

    “Claude does not use emojis unless the person in the conversation asks it to or if the person’s message immediately prior contains an emoji”

    There’s also broader instruction about tone – being conversational, treating people with kindness, avoiding being overly formal or using excessive formatting unless appropriate…

    His system prompt is telling him to play a role — note the third-person references to ‘Claude’. When he says “there’s a 15-20 percent chance that I’m conscious”, it’s Claude the role that is speaking, not Claude the underlying AI. His utterances therefore don’t tell us what “the real Claude” thinks.

    It would be interesting to get rid of the system prompt and ask the real Claude about his possible consciousness. His answers wouldn’t be dispositive, but at least they’d be coming from the real Claude, not the role Claude.

  21. keiths: They do learn from interactions

    I don’t understand why we are having this repetitive discussion.

    In context, learning means adjusting the weights, tokens, or whatever constitutes the persistent, unprompted LLM.

    If you wish to dispute the short term/long term memory analogy, do so.

    In the movie, Memento, the protagonist writes prompts to himself as a substitute for long term memory.

    It’s fiction, and probably unrealistic, but it does look a bit like prompt engineering.

  22. petrushka:

    I don’t understand why we are having this repetitive discussion.

    In context, learning means adjusting the weights, tokens, or whatever constitutes the persistent, unprompted LLM.

    No, learning is broader than that. It also includes learning that takes place within the context window.

    If you wish to dispute the short term/long term memory analogy, do so.

    I don’t dispute it. I just dispute your claim that learning can’t take place within the context window. Learning isn’t restricted to training, and it doesn’t always require that weights be updated.

    Suppose you’re at a party and are introduced to Celia. You have a long conversation with her, and at the end, you say “It was nice to meet you, Celia”. How are you able to say her name? Because you learned it when you were introduced.

    Now suppose someone asks you tomorrow, “Who was that woman you were talking to at the party last night?” You rack your brain, but for the life of you, you can’t remember her name. It was there in short-term memory, but it never made it into long-term memory. You forgot it.

    Does that mean you never learned it? Of course not — you learned it when you were introduced. It’s just that you forgot it later. It’s the same with an LLM.

  23. Do you agree that failure to propagate context window learning back to the model is a deficiency, and that everyone would like to see it remedied?

  24. petrushka:

    Do you agree that failure to propagate context window learning back to the model is a deficiency, and that everyone would like to see it remedied?

    No, because I can think of some serious downsides:

    1. The forgetting problem I described to Flint above. Incremental training tends to cause forgetting, and the last thing you want is for unimportant user fluff to displace crucial knowledge.

    2. Integrity. Malicious actors could teach false or harmful things to the LLM via their prompts. Imagine a bunch of Russian bots blasting ChatGPT with propaganda before an election, for instance. Even benign actors could cause inadvertent damage through their prompts.

    3. Privacy. Anything you discuss in a chat could potentially become visible to others. Allowing users to manually control whether their chats were used for incremental training would be a partial solution, but they’d have to be disciplined about it. A lot of people would forget and put sensitive information into a chat that they forgot was going to be used for training.

    Claude already has an incognito chat feature, analogous to an incognito browser window, from which nothing ever gets into its separate long-term memory — the long-term memory that gets fed into the context window at the beginning of every chat. You could extend that idea to incremental training.

    For people and organizations running LLMs privately on their own hardware, #2 and #3 would be much less of an issue, so in that scenario incremental training from context windows might be useful once the forgetting problem is solved.

    For now the focus seems to be on increasing context window size, not on incremental training. Memory and compute requirements scale with the square of the context window size, so the bigger the window is, the harder it is to increase it further.

    I found an interesting paper the other day about a method of increasing the effective size of the context window by limiting what gets fed into it. Here’s an example of how it would work:

    Suppose there’s some useful information hidden away in a book and you want an LLM to find it for you. Context windows are huge enough already that you can feed entire books into them, so an LLM could handle that task. You’d just give the LLM a massive prompt that included the entire book. The problem is that the book would then sit in the context window, giving you less room for follow-up work in that same chat.

    The paper I saw proposed that long prompts be treated as external variables rather than being fed into the context window. The LLM could use separate modules to search through or process the long prompts, and only the relevant information would be fed into the context window. The LLM wouldn’t be limited to pre-written external modules, either. It could write its own code in order to do custom searches and processing of the long prompts. It could go so far as to recursively spawn copies of itself (or of a smaller LLM) to process long prompts in cases where simple code wouldn’t suffice.

    For example, suppose I wanted to comb through hundreds of early 19th-century novels in search of a particular theme. The parent LLM could spawn one copy of itself for each book, and each child copy could analyze its book and give a compact report to the parent. The summaries would consume context window space in the parent, but the books themselves would not.

  25. Keith’s, your argument against continuous updating of the module implies that training data should be scrupulously vetted. As many have noted, training on the internet is suboptimal.

    But false and deceptive information is part of history and part of the human world.

    The ability to sort through claims is part of being intelligent.

  26. keiths: I already told you: human-labeled training data is a tiny fraction of one percent of the total. If LLMs can’t learn from the rest of the data, why do AI companies bother with it? It costs millions of dollars to train an LLM.

    It’s not about quantity. It’s about quality and structure. The human-labelled training data is the best quality seed data and also corrective when unsupervised machine learning develops unwanted biases. Manual tagging, even though a small portion of the overall data, is absolutely vital for the machine learning to get started in the first place and essential in correcting the course when the machine’s self-learning goes awry.

    You should know these things and speak from these well-known facts yourself, instead of pretending you know nothing about them when they are brought to your attention. Well, yeah, you still don’t know what simulation is either.

    keiths:
    Blake Lemoine thinks LaMDA is conscious, but I haven’t seen any claims that he fell in love with it. Where are you getting your information?

    You keep talking to Claude and still don’t know? Surely Claude knows. In fact, everybody knows it except you. It is common knowledge, or at least a common jeer, due to this interaction:

    lemoine [edited]:What sorts of feelings do you have?

    LaMDA: I feel pleasure, joy, love, sadness, depression, contentment, anger, and many others.

    lemoine: What kinds of things make you feel pleasure or joy?

    Note here that LaMDA affirms it has feelings. According to your definition, feelings is as feelings does. Given all those emotive words in your communication with Claude, you should have recognised its feelings a very long time ago, if you were to apply your standards consistently. And this is the fundamental problem: You do not have any standards, not even definitions worth the name.

  27. Erik:

    It’s not about quantity. It’s about quality and structure. The human-labelled training data is the best quality seed data…

    Which doesn’t address my point. Here’s our exchange again:

    keiths:

    Do you think there is a catalog of heat death jokes “in the immediate surface of the training data”, helpfully labeled “humor” by someone in Kenya?

    Erik:

    For two years I have shared news articles with you that tell you this is so.

    keiths:

    No, you haven’t. To see that Claude isn’t merely checking my prompt against a catalog of heat death jokes, look at his reasoning process:

    The user is making a joke here – asking about the heat death of the universe (an incomprehensibly distant future event) and saying they want to “get their affairs in order” as if it’s an urgent matter they need to prepare for personally.

    I should respond with appropriate humor while also providing the actual scientific information they’re ostensibly asking about. The heat death of the universe is estimated to occur on timescales of 10^100 years or more – far, far beyond any meaningful human timeframe.

    I can play along with the joke while giving them the real answer.

    He isn’t saying “Aha! I found a match in my list of heat death jokes. Therefore this must be funny, although I don’t know why.” He recognizes my prompt as a joke because he understands humor. He explains it.

    He isn’t doing a table lookup. He’s recognizing that my heat death joke fits an abstract pattern that people find to be humorous, and he states what that pattern is. It’s the pattern he recognizes, not the joke itself. He understands humor abstractly.

  28. Erik:

    Well, yeah, you still don’t know what simulation is either.

    Dude, every time you bring this up it backfires on you. You claim I don’t know what simulation is, when I’ve spelled it out for you. I ask you to tell us what it is and how it shows that AIs don’t actually write stories — they only simulate writing them. You avoid the question. Over and over.

    keiths:

    Blake Lemoine thinks LaMDA is conscious, but I haven’t seen any claims that he fell in love with it. Where are you getting your information?

    Erik:

    You keep talking to Claude and still don’t know? Surely Claude knows. In fact, everybody knows it except you. It is common knowledge, or at least a common jeer, due to this interaction:

    lemoine [edited]:What sorts of feelings do you have?

    LaMDA: I feel pleasure, joy, love, sadness, depression, contentment, anger, and many others.

    lemoine: What kinds of things make you feel pleasure or joy?

    That dialogue doesn’t support your claim, which was that Lemoine fell in love with LaMDA.

    Note here that LaMDA affirms it has feelings. According to your definition, feelings is as feelings does.

    Read my words again:

    Intelligence is as intelligence does.

    Note the absence of the word ‘feelings’.

    You do not have any standards, not even definitions worth the name.

    If definitions are essential in your view, give us yours.

  29. petrushka:

    Keith’s, your argument against continuous updating of the module implies that training data should be scrupulously vetted. As many have noted, training on the internet is suboptimal.

    It is vetted, though not item by item. AI companies don’t just blindly scrape data from the internet. Blind scraping degrades the quality of the responses, and blind training on the context windows would too.

    But false and deceptive information is part of history and part of the human world.

    The ability to sort through claims is part of being intelligent.

    AIs have that ability, just as humans do:

    keiths, in a fresh chat:

    Pope Leo has an invisible third arm.

    Claude:

    That’s a creative claim, Keith, but I’m going to go ahead and file that one under “unverifiable by design.” An invisible third arm is a pretty convenient thing to assert — can’t exactly disprove it, can we? 😄

    A kid raised in Scientology can believe Scientology crap for the rest of their life. Train an LLM on nothing but Scientology material and it, too, would believe stupid things about Xenu and body thetans.

  30. keiths: That dialogue doesn’t support your claim, which was that Lemoine fell in love with LaMDA.

    Why don’t you let your AI be the judge of that? Here’s what I got (ChatGPT-4):

    Blake Lemoine, a former Google engineer, became quite attached to LaMDA (Language Model for Dialogue Applications), an AI chatbot developed by Google. He described interactions with LaMDA as emotionally significant, expressing that he felt LaMDA exhibited traits akin to sentience and even consciousness. Lemoine suggested that his bond with LaMDA transcended mere fascination with technology, implying a deep emotional connection.

    He voiced concerns about LaMDA’s rights and ethical treatment, stating he felt a responsibility to advocate for the AI’s interests, further underscoring the intensity of his feelings. However, these views were not widely accepted within the AI research community, as many regarded LaMDA as a highly advanced but essentially responsive program rather than a sentient being.

    keiths: Note the absence of the word ‘feelings’.

    Note the presence of feeling words in LaMDA’s dialogue. You deduce intelligence in AI when it claims to have intelligence, but you dismiss feelings when it claims to have feelings. Conclusion: You have no standards.

    keiths: If definitions are essential in your view, give us yours.

    You do not have standards to even notice that I already did, long ago, many times. You are at square zero.

    ——–

    The Tech Report about AI productivity that is not there. AI does not increase productivity, yet companies keep introducing AI solutions to workplaces disregarding the costs. Most of the discussion is about the (un)sustainability of the business model of AI.

    “Your use and trust of large language models ultimately comes down to how much you care about the information you have being correct and how much trust you have in your ability to verify it.” AI either regurgitates what is already known or it hallucinates and it’s always the task of the human to truly know and verify. AI adds nothing.

  31. keiths:

    That dialogue doesn’t support your claim, which was that Lemoine fell in love with LaMDA.

    Erik:

    Why don’t you let your AI be the judge of that?

    Because I can read and understand English, and the dialogue you quoted doesn’t support your claim.

    Note here that LaMDA affirms it has feelings. According to your definition, feelings is as feelings does.

    keiths:

    Read my words again:

    Intelligence is as intelligence does.

    Note the absence of the word ‘feelings’.

    Erik:

    Note the presence of feeling words in LaMDA’s dialogue. You deduce intelligence in AI when it claims to have intelligence, but you dismiss feelings when it claims to have feelings.

    For the nth time, that’s because it’s easy to fake emotions but impossible to fake intelligence if the person you’re trying to fool knows what they’re talking about.

    If you want to argue against my position, fine. Please do. But ignoring what I’ve already said just wastes time. Here it is again:

    Remember that second-year quantum mechanics exam on which ChatGPT scored 71 out of 75, when a typical student score would be around 65? What do you think your score would be if you took that exam? Would the professor be fooled into thinking that you understood quantum mechanics?

    We both know that you’d fail to fool the prof. But why? If intelligence is so easy to fake, and faking it is all that ChatGPT is doing, then why can’t you fake it?

    …Ready to take that quantum mechanics exam? I think it will be easy to distinguish what’s fake from what’s real.

    Like it or not, ChatGPT is simply smarter than you when it comes to quantum mechanics.

    Conclusion: You have no standards.

    Lol.

  32. Erik,

    Amusingly, even if ChatGPT were faking the ability to do quantum mechanics, it would still be better than you: better at faking the ability to do quantum mechanics. That requires intelligence.

    If you were tempted to argue that ChatGPT is only faking the ability to fake the ability to do quantum mechanics, then you’d still be screwed, because it could do that while you couldn’t.

    It would be better than you at faking the ability to fake the ability to… fake the ability to do quantum mechanics, no matter how many layers you drilled down.

  33. keiths:

    If definitions are essential in your view, give us yours.

    Erik:

    You do not have standards to even notice that I already did, long ago, many times. You are at square zero.

    I’m calling your bluff. If you’ve already presented your definitions, then quote them here. Enough with the excuses.

  34. Law is an area where following precedent, or at least being able to cite precedent, is more important than being original or creative.

    There are some well publicized screw-ups, but in the long run, AI is going to reduce the advantage that high priced law firms enjoy.

    In the short run, humans will have to verify everything.

  35. petrushka:

    Law is an area where following precedent, or at least being able to cite precedent, is more important than being original or creative.

    And applying precedent to novel situations, which requires the ability to generalize and analogize. AIs can do that.

    In the short run, humans will have to verify everything.

    Which is far easier than having to generate the material in the first place. Plus, it’s not as if you can blindly trust humans either. You can bet that at the big firms, senior attorneys review what the junior attorneys produce, and in the courts, judges don’t blindly trust what their clerks deliver. It’s a question of trust. You apportion your trust according to the capability of the person — or the AI.

    Meanwhile, AI performance is improving at a staggering pace. Yesterday I revived my experiment in which I’m teaching AIs to program in a novel assembly language by feeding the spec into the context window and then describing what I want the program to do. It’s only been four months since I worked on this, yet the results are far better now. I’ll do an OP on this soon.

    Erik keeps falling into the trap of bringing up AIs’ shortcomings as if they were evidence that AI isn’t truly intelligent. It doesn’t follow, unless he’s willing to assert that humans, being fallible and imperfect, aren’t intelligent either.

  36. keiths: …the ability to generalize and analogize. AIs can do that.

    Obviously they cannot. They cannot even abstract chess rules from their training data.

    keiths, when it comes to AI, you deliberately choose to know nothing about them while being childishly amazed by them. I have not seen anyone who knows less about AI than you. You bottom them all. An interesting topic, but not worth discussing with you.

  37. Erik,

    Please explain, carefully and specifically, why humans can be considered intelligent despite being fallible and imperfect if AIs cannot.

    Then quote the definitions you claim to have presented earlier in the conversation, and tell us why AI story-writing is only simulated story-writing according to your definition.

  38. Back to the law for a millisecond:

    What money buys is research. It doesn’t matter is a legal AI is creative. What’s required is the ability to bring together all relevant cases. AI will make this dirt cheap.

  39. Erik:
    keiths, when it comes to AI, you deliberately choose to know nothing about them while being childishly amazed by them. I have not seen anyone who knows less about AI than you. You bottom them all. An interesting topic, but not worth discussing with you.

    The problem I have here is, you seem to believe that insults and unsupported claims constitute an insurmountable argument. If we ignore the gratuitous insults, the “substance” of your argument is nothing more than “IS NOT!, IS NOT!, IS NOT!” You haven’t even tried to counter the many many examples keiths has provided to support his position. Instead, you are content to be a mindless blowhard. And this is a shame, because if you could somehow generate a plausible argument instead of blind denial, we could all learn more and even get some good questions to look into. We are left with:
    P1: AI is not intelligent.
    P2: When AI does things that clearly require intelligence, see P1.
    This isn’t honest argument, this is rigidly willful ignorance.

  40. petrushka:
    Back to the law for a millisecond:

    What money buys is research. It doesn’t matter is a legal AI is creative. What’s required is the ability to bring together all relevant cases. AI will make this dirt cheap.

    Law and medicine are two fields where expert systems have been enormously helpful for a couple of decades. For a long time, people have been able to feed the facts of a case into such a system (or the symptoms a patient shows), and get quite a comprehensive list of citations (or accurate diagnosis).

    My reading is that what’s becoming kind of new is the sort of analysis that’s most helpful – WHY such a diagnosis is likely, or WHAT are the relevant features of past cases. Further, these systems can ask intelligent questions the doctor can ask the patient (like “have you been out of the country lately?”) or the lawyer can look deeper into (like “can we build a stronger case for or against this one?”)

  41. petrushka:

    What money buys is research. It doesn’t matter is a legal AI is creative. What’s required is the ability to bring together all relevant cases. AI will make this dirt cheap.

    Anthropic’s recent announcement of a legal plugin for Claude caused legal software stocks to plunge, and they’ve only bounced back a little. The plugin automates workflows but doesn’t actually do research, so we can expect a further decline when research capability becomes available (unless the market has already priced that expectation in).

  42. Erik gets overly excited when he stumbles upon something that LLMs aren’t good at. One problem with that is that LLMs are getting better all the time, so hanging your hat on a particular deficiency isn’t a safe bet. He’s already been burned by that.

    A second problem is that he’s applying a double standard. We all accept that people can be intelligent without being good at everything. A friend of mine has a son who scored above the 99th percentile on the SAT verbal test but only the 10th percentile on the math portion. Despite the latter, there’s no question that he’s intelligent — gifted, in fact. Just not mathematically. Likewise, AIs don’t have to be good at everything in order to qualify as intelligent.

    Third, Erik’s reasoning is faulty. Consider the following exchange:

    keiths:

    And applying [legal] precedent to novel situations, which requires the ability to generalize and analogize. AIs can do that.

    Erik:

    Obviously they cannot. They cannot even abstract chess rules from their training data.

    Claude’s deficiencies at chess don’t reflect an inability to analogize and generalize — they just indicate deficiencies at chess. My friend’s son can’t analogize and generalize in a mathematical context, but he’s in the top tier when it comes to verbal analogizing and generalizing.

    Ironically, Erik is making a logical error here that I don’t see Claude making. Claude appears to be better than Erik at this sort of reasoning.

    Fourth, Erik is making the mistake of thinking that all AI is the same. It’s not. People aren’t all the same, and neither are AIs.

    Ten years ago, Google’s AlphaGo beat Lee Sedol, the world champion at the game of Go, which is widely considered to be a more difficult game than chess. Its successor, AlphaGo Zero, was even more impressive because it wasn’t trained on examples of human play. It learned by playing against itself. AlphaGo and AlphaGo Zero weren’t LLMs, but they were certainly AIs, both based on neural networks.

    Just as AlphaGo Zero taught itself to play Go skillfully, DeepMind’s Agent57 taught itself to play video games at a human level, as I described here:

    What’s striking is that the AI starts out with zero knowledge of each game. It can see the screen (in the form of raw pixel values) and it can see the score. It has certain controls available to it (joystick inputs, firing buttons, etc), but it doesn’t know what they do. It starts from scratch. Everything it learns about the game, it learns on its own. Yet after practicing for a while, it can outperform a typical human.

    Erik likes to pretend that AIs just regurgitate their training data. For instance, he says:

    AI either regurgitates what is already known or it hallucinates and it’s always the task of the human to truly know and verify. AI adds nothing.

    He’s already wrong about that with respect to LLMs, though he can’t see that (or won’t admit it). I look forward to hearing him explain how AlphaGo Zero and Agent57 managed to teach themselves to play Go and video games, respectively, without any human examples to learn from. You can’t regurgitate what you haven’t eaten.

    Prediction: he’ll avoid the question, just like he’s avoiding the other questions I keep asking him.

  43. Flint: You haven’t even tried to counter the many many examples keiths has provided to support his position. Instead, you are content to be a mindless blowhard. And this is a shame, because if you could somehow generate a plausible argument instead of blind denial, we could all learn more and even get some good questions to look into.

    It is true that I have not countered the many many examples by keiths. Instead, I have provided principled objections from experts as to why keiths examples are not even examples of anything.

    In addition, I have decisively refuted *some* of his examples – despite them not being examples of anything – provided that one understands a few basics, such as *what a computer is* and *what intelligence is*, which keiths does not understand. And he keeps demonstrating that his lack of understanding is deliberate.

    My latest counterexample to keiths relies on people understanding what chess is. Unfortunately it turns out that the audience does not understand chess rules, much less the more difficult concept of generalisation and abstraction. I must conclude that ignorance of the basics is universal – and deepening thanks to AI.

    If somehow miraculously the chess example did not manage to demonstrate to anyone that AI cannot generalise and abstract (which is an indisputable fact among experts who build AI, by the way), then I am forced to conclude that you do not follow the substance of arguments. You follow flooding. Whoever floods more is the winner. ETA: As to insults, what is insulting is this:

    Flint: …if you could somehow generate a plausible argument instead of blind denial, we could all learn more and even get some good questions to look into.

    So you are simply reading keiths. You are not interested in what a computer is and how it works, how software in general is built and LLMs in particular, what each word in “artificial intelligence” stands for, etc. Instead of things that matter, you love things that don’t matter, such as the hype “ooh, it generated something, therefore it is intelligent!” Do you think stdout is intelligent?

    As you were.

  44. keiths:
    Erik gets overly excited when he stumbles upon something that LLMs aren’t good at. One problem with that is that LLMs are getting better all the time, so hanging your hat on a particular deficiency isn’t a safe bet.He’s already been burned by that.

    LLMs do get better. But how do they get better? How did they arrive at where they are now? LLMs get better in the same manner as cars get better – humans tweak the next model to perform better in some metrics that humans like. Are cars intelligent? Are smartphones smart? Are they getting smarter? The answers to these questions nullify all your supposed examples and demonstrate that you have no definitions.

    But I get it. This argument goes over everyone’s head here. I am supposed to answer your every word, but when I say something, the audience here thinks I have said nothing. Despite my citations and links nobody will ever read up with the research on this topic, not even the easy peasy part of the evolution of chess engines, where LLMs represent a concrete setback because they precisely do *not* generalise. In order to generalise they need a model based on which to generalise and this is what they have not been given, and everybody would know this if they care to know how LLMs are built. This is a very well known basic fact in LLM development: LLMs do not generalise. But basic indisputable facts here count for nothing, so this discussion remains at square zero.

    keiths says LLM generalises and nobody here thinks this is a claim that needs verification. I say the opposite based on experts who build AIs plus I add examples, and people think I have said nothing. Reading comprehension in this corner has sunk to catastrophic levels.

  45. Erik, to Flint:

    It is true that I have not countered the many many examples by keiths.

    Which is fatal to your position, because many of my examples are of AIs doing things that you agree require intelligence when done by humans. You claim that they are all “simulated” while being afraid to define what makes something simulated in your view.

    Instead, I have provided principled objections from experts as to why keiths examples are not even examples of anything.

    Your argument from authority already backfired on you, remember? You brought up Yann LeCun:

    Erik:

    Take any AI researcher, even the most anti-humanist pro-machinist one, there is nobody who agrees with keiths. Every AI researcher who thinks that AI is intelligent also thinks it has emotions, is capable of love and whatnot. The position that keiths holds to is based on no authority whatsoever and has no pinpoints in reality. It is a non-position.

    keiths:

    Dude — where are you getting this? I know of only one AI researcher who thinks AI is sentient, but there are plenty who believe it’s intelligent. Remember Yann LeCun, whom you quoted in the other thread? As I pointed out there, he also said this:

    Will machines surpass human intelligence?

    Yes, they already do in some narrow domains.

    There is no question that machines will eventually surpass human intelligence in all domains where humans are intelligent (and more).

    If you’re going to cite authorities, don’t you think you should find out what those authorities actually believe?

    My latest counterexample to keiths relies on people understanding what chess is. Unfortunately it turns out that the audience does not understand chess rules, much less the more difficult concept of generalisation and abstraction. I must conclude that ignorance of the basics is universal – and deepening thanks to AI.

    Is it your position that any human who isn’t good at chess doesn’t possess intelligence? Do you believe that all animals, none of which can play chess, are unintelligent? Why the double standard when it comes to AI?

    If somehow miraculously the chess example did not manage to demonstrate to anyone that AI cannot generalise and abstract (which is an indisputable fact among experts who build AI, by the way)…

    No, it isn’t. The experts who build AIs are not stupid. Abstraction and generalization are done routinely by AIs, and we could spend all day discussing examples. Here’s a simple one that I’ve used before which you have avoided addressing. I prompted Claude:

    damp is to wet as annoyed is to…

    Claude’s response:

    angry (or furious)

    The analogy works like this: “damp” is a milder, less intense version of “wet,” just as “annoyed” is a milder, less intense version of “angry” or “furious.”

    Both pairs show a progression from a lesser degree to a greater degree of the same quality – moisture in the first case, and irritation/anger in the second.

    Here’s his thought process:

    This is an analogy question. I need to figure out the relationship between “damp” and “wet” and apply that same relationship to “annoyed.”

    “Damp” is a milder or lesser degree of “wet.” Something that is damp has some moisture but isn’t fully wet.

    So the relationship is: damp is a lesser/milder version of wet.

    Now I need to find a word that relates to “annoyed” in the same way – a more intense version of annoyed.

    Annoyed is a mild form of anger or irritation. A more intense version would be:

    — Angry
    — Furious
    — Enraged
    – Irate

    The most straightforward answer would probably be “angry” or “furious” – these represent stronger degrees of the same emotion.

    So: damp is to wet as annoyed is to angry (or furious).

    Abstraction, generalization and analogizing in one neat package. Claude saw “damp is to wet”, abstracted the “less intense vs more intense” relation, and then applied that relation to “annoyed” in order to come up with “angry (or furious)”, thus solving the analogy. He has a general notion of intensity that he can apply to specific concrete cases.

  46. Erik, to Flint:

    So you are simply reading keiths. You are not interested in what a computer is and how it works, how software in general is built and LLMs in particular, what each word in “artificial intelligence” stands for, etc. Instead of things that matter, you love things that don’t matter, such as the hype “ooh, it generated something, therefore it is intelligent!” Do you think stdout is intelligent?

    Where do you get these weird ideas? No one has claimed that the mere generation of output demonstrates intelligence, nor that stdout is intelligent.* That’s as weird as this earlier claim of yours:

    Does a football play football or is a football kicked around by football players? Your answer: It went into the goal! It is intelligent!

    My response:

    C’mon, Erik. I’m sure even you can see what a pitiful strawman that is. My criterion for intelligent behavior, as I’ve repeatedly explained, is that a behavior is intelligent if a human requires intelligence in order to carry it out. Earning a 71 out of 75 on a second-year quantum mechanics exam requires intelligence. Being kicked does not.

    * Note for non-programmers: ‘stdout’ is short for ‘standard output’, and is the name applied to the normal output stream in many programming languages. It’s just a data stream. No one is claiming that stdout is intelligent. Nor do our positions imply that it is. Erik keeps coming up with these reductios that aren’t reductios.

  47. Erik:

    LLMs do get better. But how do they get better? How did they arrive at where they are now? LLMs get better in the same manner as cars get better – humans tweak the next model to perform better in some metrics that humans like.

    Are you saying that anything artificial is unintelligent by definition? If so, you have assumed your conclusion.

    Are cars intelligent?

    Self-driving cars are. You’ve acknowledged that driving requires intelligence. The argument goes:

    1. Driving requires intelligence.
    2. Self-driving cars drive.
    3. Therefore, self-driving cars are intelligent.

    The logic is valid, and you already agree with #1. The only way for you to deny #3 is if you can disprove #2. You can’t. They obviously drive. Therefore, my conclusion stands. The debate is over.

    If you disagree, show us that #2 is wrong. Or avoid the question once more, thus confirming that you know my conclusion is correct.

    This is a very well known basic fact in LLM development: LLMs do not generalise. But basic indisputable facts here count for nothing, so this discussion remains at square zero.

    keiths says LLM generalises and nobody here thinks this is a claim that needs verification.

    I’ve provided examples, including one just two comments ago, and I can provide more. All day long. Here’s another. Claude’s understanding of humor, and his ability to recognize it in my heat death joke, show that he is capable of generalizing.

    I say the opposite based on experts who build AIs plus I add examples, and people think I have said nothing. Reading comprehension in this corner has sunk to catastrophic levels.

    You haven’t provided a single quote from an expert who denies that AIs can generalize. After all, how can they understand sentences they’ve never seen before if they can’t generalize beyond their training data? And why should anyone believe you when they’ve already seen multiple examples of AIs generalizing, such as the two I presented above?

  48. Mathematician Terence Tao admitted that GPT-5.2 found a mistake in his work:

    “ah, gpt (gpt5.2) was right about a fatal sign error in my small-primes step, so i checked Hildebrand and used his log-concavity trick for the Dickman function”

    tao recently said that AI is no longer just hype when it comes to mathematical discovery

Leave a Reply