Is AI really intelligent?

Posted on October 5, 2025 by keiths

I think a thread on this topic will be interesting. My own position is that AI is intelligent, and that’s for a very simple reason: it can do things that require intelligence. That sounds circular, and in one sense it is. In another sense it isn’t. It’s a way of saying that we don’t have to examine the internal workings of a system to decide that it’s intelligent. Behavior alone is sufficient to make that determination. Intelligence is as intelligence does.

You might ask how I can judge intelligence in a system if I haven’t defined what intelligence actually is. My answer is that we already judge intelligence in humans and animals without a precise definition, so why should it be any different for machines? There are lots of concepts for which we don’t have precise definitions, yet we’re able to discuss them coherently. They’re the “I know it when I see it” concepts. I regard intelligence as one of those. The boundaries might be fuzzy, but we’re able to confidently say that some activities require intelligence (inventing the calculus) and others don’t (breathing).

I know that some readers will disagree with my functionalist view of intelligence, and that’s good. It should make for an interesting discussion.

740 thoughts on “Is AI really intelligent?”

keiths on April 17, 2026 at 8:07 pm said:

In discussing a mathematical result with Claude (OP forthcoming), I used the made-up word ‘numerize’ to describe the conversion of a predicate (which can be true or false) to a number — 1 for true, 0 for false. ‘Quantize’ is already taken, with a different meaning, so I settled on ‘numerize’. I like to play with language and it can be fun to test AI’s ability to recognize neologisms and infer their meaning.

My prompt was

Putting brackets around predicates is the standard way to numerize them in mathematical expressions?

Claude immediately understood what I meant and responded appropriately. He has abstracted the idea that adding -ize to a noun or adjective creates a verb that means “to bring about X”, where X is the antecedent. This isn’t something you’d intuitively expect from a system that is fundamentally built on next-token prediction, and the fact that AI is able to do it is yet more evidence that AI is truly intelligent.
petrushka on April 17, 2026 at 11:29 pm said:

The Abstraction Fallacy: Why AI Can
Simulate But Not Instantiate Consciousness
Alexander Lerchner
Google DeepMind
2026-03-19
Computational functionalism dominates current debates on AI consciousness. This is the hypothesis that subjective experience emerges entirely from abstract causal topology, regardless of the underlying physical substrate. We argue this view fundamentally mischaracterizes how physics relates to information. We call this mistake the Abstraction Fallacy. Tracing the causal origins of abstraction reveals that symbolic computation is not an intrinsic physical process. Instead, it is a mapmaker-dependent description. It requires an active, experiencing cognitive agent to alphabetize continuous physics into a finite set of meaningful states. Consequently, we do not need a complete, finalized theory of consciousness to assess AI sentience—a demand that simply pushes the question beyond near-term resolution and deepens the Al welfare trap. What we actually need is a rigorous ontology of computation. The framework proposed here explicitly separates simulation (behavioral mimicry driven by vehicle causality) from instantiation (intrinsic physical constitution driven by content causality). Establishing this ontological boundary shows why algorithmic symbol manipulation is structurally incapable of instantiating experience. Crucially, this argument does not rely on biological exclusivity. If an artificial system were ever conscious, it would be because of its specific physical constitution, never its syntactic architecture. Ultimately, this framework offers a physically grounded refutation of computational functionalism to resolve the current uncertainty surrounding…
keiths on April 18, 2026 at 12:05 am said:

petrushka,

I saw that paper too. I think I’ll do an OP on it.

This sort of thing isn’t promising…

It requires an active, experiencing cognitive agent to alphabetize continuous physics into a finite set of meaningful states.

…but I’m sure the thread will end up being about AI consciousness generally, not just this paper.
petrushka on April 18, 2026 at 3:01 pm said:

This is not scientific, but I think consciousness begins with tropisms and evolves to support survival.

I don’t think you can evolve consciousness without evolving layers of survival mechanisms.

Trying to build top to bottom would be like trying to program the weights in an LLM from first principles.
petrushka on April 21, 2026 at 1:50 pm said:

A joint study by Anthropic, the UK AI Security Institute and the Alan Turing Institute dropped a bombshell. They proved that inserting just 250 specially crafted documents into the pretraining data is enough to create a permanent backdoor in large language models from 600 million parameters all the way up to 13 billion parameters. This works no matter how large the model or how massive the overall training dataset gets.
These poisoned documents look completely normal. They read like ordinary web pages. But hidden inside is a trigger phrase. Once the model sees that trigger later on, it can be forced into harmful behavior such as spitting out gibberish, leaking data, or breaking down completely. The backdoor gets baked directly into the model weights during training. There is no way to remove it surgically. The only real fix is to throw the model away and train an entirely new one from scratch.
This is not some theoretical attack. This is data poisoning at internet scale. Anyone can plant these documents right now on blogs, forums, academic sites or anywhere else that ends up in training scrapes. And some publisher rights groups have made sure this poison is in the wild.
Do you hear me now?
For years I have warned that training frontier AI on raw internet scraped data is a security and integrity disaster. I have advocated relentlessly for offline high protein human curated datasets. I have pushed for drawing training data from pristine pre 1970 archives. Books. Journals. Patents. Court records. Private libraries that have never touched the public web. I have advocated for this for 100s of reasons for decades. Now we are here.
This study is not a surprise to me. It is the inevitable result of the broken training paradigm I have been calling out since the earliest days of modern large language models.
We actually knew the foundational problems as far back as 1998. That was when adversarial data insertion was already understood as a basis to break any AI model. The techniques have gotten more sophisticated but the core vulnerability has always been the same. Train on unverified publicly editable oceans of data and you open the door to permanent compromise.
Anthropic is ground zero for the doom burners camp. They claimed they would be focused on building powerful helpful models. Now with each new paper they seem determined to highlight just how fragile and attackable the current web scale approach really is. This study is another clear example.
Here are some points from the study and why they completely vindicate the offline curated data path I have been championing.
Minimal poison quantity works. Only 250 malicious documents roughly 420 thousand tokens or just 0.00016 percent of a large dataset are enough. One hundred is not reliable. Two hundred fifty succeeds consistently.
Scale invariance. The number of poisoned samples needed stays almost constant whether you are training a 600 million parameter model or a 13 billion parameter model on anywhere from 6 billion to 260 billion tokens. Bigger models and bigger datasets do not make you safer.
Stealth design. The poisoned documents look exactly like normal web content. No obvious red flags for crawlers or human reviewers.
Permanence. The backdoor is permanently embedded in the model weights. Training is easy. Untraining is impossible. Full retraining from scratch is the only option.
Trigger reliability. A simple hidden phrase activates the malicious behavior on demand. Gibberish output. Bias injection. Data leaks. Policy bypass. Whatever the attacker wants.
Universal exposure. Every major model trained on public internet data including the GPT series Claude Gemini and others sits wide open to this exact vector today.
Economic catastrophe. Retraining a frontier model costs hundreds of millions or even billions of dollars. One successful poisoning campaign could force entire companies to start over.
Silent failure. The model performs normally until the trigger appears. No obvious signs of degradation until it is too late.
No current defense. There is no reliable way to detect filter or mitigate this attack at true web scale. The attack surface is the entire internet.
Paradigm failure. The study proves once and for all that more data plus more compute does not solve the poisoning problem. It actually makes the situation more dangerous because such a tiny poisoned signal can still dominate.
My solutions have always been clear. Stop feeding these models the polluted firehose.
Train exclusively on offline verified corpora. Use high signal high integrity sources from the 1870 to 1970 period or earlier. Sources that have never been digitized and have never touched the public web. These high protein datasets deliver far more real capability with none of the modern contamination bias or poisoning risks. I know where they are and how to digitize them. I just don’t have the money and therefore the time to do much about it other than complain here like chicken little.
The training data is in public and private archives, cold storage. To train AI, digitize and protect non public historical knowledge under strict human curation. Keep everything air gapped and completely offline. No live web scraping. Ever. News insertion yes, but this is another article.
Build local sovereign models that can run fully offline on personal hardware. Phones. Laptops. Local clusters. I have shown this repeatedly with models in the open source systems. No cloud. No subscription. No exposure.
Put human in the loop curation at every single stage. Replace quantity with quality. Reward provenance and empirical distrust inside the loss function. Penalize coordinated institutional echo chambers and all the post 1995 narrative sludge.
Hire the best humans not the cheapest to help train AI and pay them well. Keep them employed with a promise of job security. You will need them.
Avoid retrieval augmented generation on untrusted sources. Any RAG system must pull exclusively from your own verified offline index. Never trust live web results without cryptographic provenance and heavy human vetting.
Embed rules directly in the data itself like The Love Equation. Bake love, honesty, truth and empathy and first principles reasoning into the training corpus long before any alignment stage ever begins. The data layer is the real human loving layers.
This is not a trick. This is the only path that produces capable trustworthy and truly secure AI. The 2025 study is the latest overwhelming proof that continuing with internet scale scraping is not just inefficient. It is actively dangerous.
Primary sources:
Anthropic Research Blog: https://www.anthropic.com/research/small-samples-poison
Full Paper on arXiv: https://arxiv.org/abs/2510.07192
AISI Announcement: https://www.aisi.gov.uk/blog/examining-backdoor-data-poisoning-at-scale
Alan Turing Institute Blog: https://www.turing.ac.uk/blog/llms-may-be-more-vulnerable-data-poisoning-we-thought
The era of just scrape everything is over.
keiths on April 21, 2026 at 9:38 pm said:

As a kid, I was fascinated with the mechanics of reading. It struck me that if someone were sitting across the table from me, it was surprisingly easy to read whatever they had in front of them despite the text being upside down from my perspective. That led me to experiment with holding a book up to a mirror and reading the reflection, which was harder, and then reading the reflection when I held the book upside down, which was the hardest.

I was recently reading about the VWFA (aka the Visual Word Form Area), a brain region responsible for recognizing characters and words, and it reminded me of my childhood experiments. I wondered how much practice it would take to read inverted, mirrored, and inverted + mirrored text at speeds comparable to my normal reading speed. I could grab a mirror and practice, but it would be clunky physically and a pain to measure and record my words per minute scores as they gradually increased.

Then, as with practically every problem I tackle these days, I asked myself if AI could help. I described the project to Claude and had him write a program that could display text files in all of those orientations while measuring and recording my reading speed. I also asked him to support normal orientation so that I could get a baseline for my reading speed.

In less than five minutes, he produced the program. He also found an online corpus, the CLEAR corpus, that contains 5,000 passages used for reading research, each of which is tagged with its reading difficulty.

The program loads the passage in the specified orientation. I hit the space bar to start the timer, read the passage, and then hit the space bar again to stop the timer. The program computes the wpm (words per minute) score and stores it in a database along with the filename. When loading a passage, it checks the database to make sure I haven’t used it before, in order to avoid any practice effects. (That’s probably overkill, but Claude suggested it and I saw no reason not to implement it, since he was the one doing the work.)

The program is about a thousand lines and takes full advantage of the available Python libraries. The only bug was that Claude forgot to implement wraparound, so the entire passage appeared on a single line. He easily fixed that.

I played with the program and asked for some additional features. The CLEAR corpus contains difficulty ratings for each passage, so those are now stored in the results database. Claude even suggested that he could compute difficulty ratings for non-CLEAR passages using the Flesch-Kincaid scale, so I had him do so. He noted that when reporting stats, he could compute a correlation coefficient between my wpm performance and the difficulty ratings of the passages, so I approved that change too.

I also asked him to make the font selectable, because fonts vary wildly in their readability when reoriented. The font is now recorded for each run.

It was fascinating to watch him code, because he tested everything himself before delivering the final product. This technology is frikkin’ amazing. And also genuinely scary.
keiths on April 21, 2026 at 11:26 pm said:

Sample screenshots so you can try it for yourself:

Normal:

Flipped vertically:

Flipped horizontally:

Flipped vertically and horizontally:

ETA: Found one additional bug: Em dashes were being rendered incorrectly because the program assumed UTF-8 encodings when the passages were in CP-1252. Only two bugs in a thousand lines of nontrivial code.
petrushka on April 22, 2026 at 12:01 pm said:

I have no difficulty reading any of these.

It’s slow going at first, and on some words I have to go letter by letter.

But that’s with zero practice. And I’m old.

The ordering of words is arbitrary and conventional. A young person with a week’s practice should have no problem.

I’m reminded that people have adapted to image reversing goggles.
keiths on April 22, 2026 at 5:48 pm said:

petrushka:

I have no difficulty reading any of these.

It’s slow going at first, and on some words I have to go letter by letter.

That’s the point of my experiment. We’re slower at reading the odd orientations, and I want to see how quickly the speeds improve with practice and whether they hit a plateau. I suspect they will.

The letter-by-letter phenomenon is interesting because it’s similar to learning to read for the first time. You’re consciously sounding out words rather than just recognizing them. When the Ukraine war broke out, I taught myself Cyrillic so that I could understand the writing on the signs I was seeing in photos and the place names on maps. It’s still mostly a letter-by-letter affair, though I do recognize some words on sight now, like Путин (Putin) and Зеленський (Zelenskyy). Then again, I’m not getting much practice. I don’t understand Russian or Ukrainian, so I can’t read news articles. It’s mostly just signs and maps.
keiths on April 22, 2026 at 5:57 pm said:

petrushka:

The ordering of words is arbitrary and conventional. A young person with a week’s practice should have no problem.

Reading from right to left comes pretty naturally, because that’s what we have to do if someone is sitting across from us and we’re reading what they have in front of them. It’s the word and letter recognition that becomes harder, not the reading direction.
petrushka on April 22, 2026 at 6:49 pm said:

I suspect it’s like being bilingual.

Up to a certain age it’s easy. After a certain age, you have to translate.
keiths on April 23, 2026 at 7:25 am said:

I’m watching a lecture series on Language and the Mind and today, coincidentally, the lecturer mentioned a cool study on the relationship between reading direction (left-to-right vs right-to-left) and spatial metaphors for time:

In 1991, the cognitive scientist Barbara Tversky had more than 1,000 English-speaking and Arabic-speaking children and adults place stickers labeled with the terms breakfast and dinner relative to a sticker labeled with the term lunch on the middle of a table. The English-speaking subjects were much more likely to place the breakfast sticker to the left of the lunch sticker and the dinner sticker to the right of it. The Arabic speakers did the opposite. Arabic is read and written from right to left, not left to right like English. It appears that the cultural convention of reading and writing has caused the two groups to conceive of the arrow of time differently.
keiths on May 1, 2026 at 5:47 pm said:

Yesterday I was telling Claude about how it annoyed me as a kid that ‘flammable’ and ‘inflammable’ are synonyms, and he replied:

The history is that inflammable came first, from Latin inflammare, “to set on fire,” where the in- is the other Latin in- — the one meaning “into” or “upon,” as in inscribe, induce, invade. Same spelling, opposite force. So inflammable literally meant “able to be set aflame,” and it was the standard English word for centuries. Flammable is the latecomer, a 19th-century back-formation that barely got used until the 20th century, when safety officials — reportedly the National Fire Protection Association in the US in the 1920s, and similar bodies elsewhere — realized that people were occasionally reading inflammable the wrong way and, you know, dying. So flammable got promoted onto fuel trucks and warning labels precisely because it couldn’t be misread.

The result is the current weird situation where both words mean the same thing, flammable dominates in safety contexts and American usage, and inflammable survives mainly in older or more formal writing and in derived forms like inflammation and inflammatory (which nobody misreads, because there’s no live antonym pulling the other way). The proper opposite, by the way, is non-flammable, which sidesteps the whole mess.

Dr. Nick Riviera on The Simpsons has a line about this — “Inflammable means flammable? What a country!” — which captures the bafflement pretty well. You were just ahead of the curve.

I thought I’d post it since it’s a good example of a) Claude’s sense of humor (‘people were occasionally reading inflammable the wrong way and, you know, dying’), and b) Claude’s ability, in a matter of seconds, to assemble, analyze, and comment on an array of information that would have taken forever to google in pre-AI days. He even throws in a pop culture reference. That’s intelligence.
keiths on May 2, 2026 at 4:01 am said:

AI is blowing my mind practically every day lately. Today my sister asked me

Do peeves have to be pet?

She was referring to the phrase ‘pet peeves’, of course. I asked Claude, deadpan:

What’s the difference between domesticated peeves and wild peeves?

Claude:

Wild peeves roam freely across the linguistic landscape and have not been broken to civilized use. They’re the ones lurking on Reddit threads at 2 a.m. waiting to ambush anyone who writes “literally” non-literally, or that descend in packs on a tweet containing “could care less.” They are unsocialized, occasionally aggressive, and have no functional grammatical purpose; they exist mainly to perpetuate the species. A wild peeve will defend its territory ferociously even when challenged with overwhelming evidence that the construction it objects to is older than Shakespeare and used by every major author in the canon. Native habitat: comment sections, copy-edit memos, and the comments your uncle leaves on Facebook posts.

Domesticated peeves, by contrast, have been bred over generations for compatibility with thoughtful prose. They earn their keep. A well-trained domestic peeve about thusly or very unique or comma splices will actually improve your writing if you listen to it, the way a good herding dog earns its kibble by managing sheep. Domesticated peeves can be socialized — taken out in mixed company without embarrassing their owners — and they understand that there are contexts in which they should sit quietly. They know not to bark at “they” used as a singular, having been raised after the great taxonomic revisions of the late 20th century. They tolerate split infinitives, having read Fowler. They are at peace with prepositions at the ends of sentences, a thing up with which they will put…

[He continued in the same vein for a couple more paragraphs]
keiths on May 2, 2026 at 4:10 am said:

I then asked him:

OK, so any hypotheses about how you recognized my deadpan query as unserious?

Claude:

Several converging signals, in roughly the order I’d guess they fired:

The framing “domesticated vs wild” is the heaviest tell. That’s a taxonomic distinction borrowed from biology — Canis lupus familiaris vs Canis lupus, Norway rat vs ship rat, that whole register. Applying it to peeves is a category error played for effect: peeves are abstract mental objects, not organisms with breeding histories and ecological niches. When someone deliberately maps a vocabulary from one domain onto a domain it doesn’t fit, the only reading that makes the sentence coherent is “this is a joke / extended metaphor invitation.” A serious version of the same question would be something like “what’s the difference between justified and unjustified peeves?” or “between peeves with a grammatical basis and ones without?” Choosing domesticated and wild over those obvious neutral alternatives is a deliberate stylistic move, and it only pays off if I take the bait.

The conversational momentum mattered too. A few turns earlier we’d been discussing thusly and thus, and you’d said you eschew thusly but conceded “de gustibus.” That whole exchange established that you and I were treating usage peeves with a light touch — analyzing them, taking them seriously enough to be interesting, but not earnestly enough to get pious about. So when a question arrived that invited bestiary-style elaboration on peeves as a category, the local context of the conversation strongly predicted that you’d appreciate me running with it rather than answering deadpan with “well, there isn’t really a standard taxonomy of peeves, but some are more defensible than others…”

The trailing backslash (“”) at the end of your message — almost certainly a typo, but a typo with a specific signature. People who are about to send a careful, formal question typically read their message before sending; people who are tossing off a fun prompt and curious what I’ll do with it often don’t. It’s a minor signal and I wouldn’t have weighted it heavily on its own, but combined with the other cues it pointed in the same direction.

The shape of the question itself. “What’s the difference between X and Y” is a textbook setup line in comedy writing — it’s how stand-up bits and internet jokes have been opening for a hundred years. It’s also a perfectly normal serious-inquiry frame, of course, but when X and Y are themselves a clearly playful pairing (domesticated/wild peeves), the comedic-setup reading dominates. The alternative reading — that you were sincerely asking me to disambiguate two terms-of-art in some real prescriptivist taxonomy — would require there to be such a taxonomy, and there isn’t one.

Cumulative cues from the broader conversation. You’ve been treating internet discourse as recreational intellectual exercise this whole conversation — chasing down etymologies, fact-checking Trump claims for blog posts, riffing on theology and Beelzebub-as-username trolling. That’s a person who likes language as a playground, not someone who’s likely to suddenly want a sober reference-work answer about peeve typology. So the prior on “Keith is being playful” was already pretty high before the question arrived.

The slightly-meta answer is that the cue I’m probably weighting most is something fuzzier than any of the above: the question only has a good answer if treated as a setup. If I’d taken it literally, the best I could have done was something like “those aren’t really standard categories, but I assume you mean…” which would have been a worse response on every dimension — less useful, less funny, less faithful to what you almost certainly wanted. When the playful reading produces a much better response than the literal one, that itself is evidence the playful reading is correct, because most people most of the time are asking questions that have good answers in their intended frame.

If a human had written Claude’s answer and produced his subsequent analysis, you’d judge them to be smart, articulate, insightful and funny. I see no reason other than anti-machine prejudice to deny this.

Erik, do you think Claude demonstrated intelligence here? If not, why not?
petrushka on May 2, 2026 at 1:56 pm said:

I would like to start a thread on the em dash.

Just kidding.

Some background: I started junior high school at a six year school. Senior high offered courses in printing. Long before computers or IBM Selectric typesetters. So printing was manual typesetting.

I had six weeks of printshop in eighth grade. What we did was sort buckets of type into trays. Compared to other classes, this was fun.

So I learned about picas and points, ens and ems.

When I got a typewriter, I learned to use two dashes when punctuating. That was approximately equivalent to an em dash. If I type two dashes on my tablet — the software converts them to an em dash.

Now my pet peeve. Somewhere I learned that an em dash was not surrounded by spaces—this way. (Interesting. It doesn’t look horrible in this font.)

I notice that Claude surrounds the dash with spaces.

I read a rant recently where someone said they could not use an AI to proofread their work, because it would convert dashes to em dashes, and this causes posts to be rejected by AI AI detectors.

This is fun.
keiths on May 2, 2026 at 5:58 pm said:

petrushka:

I read a rant recently where someone said they could not use an AI to proofread their work, because it would convert dashes to em dashes, and this causes posts to be rejected by AI AI detectors.

They should have instructed the model not to do the conversion and to store that preference in long-term memory. LLMs are pretty good at following such instructions.

ETA: I’ve been on an incense kick lately, so for fun and to illustrate the point, I asked Claude to tell me about bakhoor while going heavy on the em dashes. Here’s an excerpt:

Bakhoor — the word itself is Arabic (بخور), meaning simply “incense” or “that which is fumigated” — refers to a category of compounded, perfumed wood chips that are burned over charcoal or on electric burners throughout the Arabian Peninsula, the broader Gulf, and increasingly — via diaspora and luxury-fragrance markets — the rest of the world. It’s not a single substance — bakhoor is a form, more like “perfume” or “potpourri” than like “frankincense” — and the recipes vary wildly by region, by family, and by perfumer.

The base is typically agarwood — oud, the resinous heartwood of the Aquilaria tree — though cheaper bakhoors substitute ordinary wood chips (sandalwood, or even neutral hardwood) and rely on the perfume soak for their character. The chips are macerated — sometimes for weeks, sometimes for months — in a blend that can include rose oil, saffron, musk (synthetic now, historically animal-derived), ambergris (or ambergris analogues), sandalwood oil, frankincense and myrrh resins dissolved into a carrier, attars of jasmine or oud, and a sticky base — usually honey, dates, or sugar — that binds everything together and helps the chip smolder slowly rather than flaming. Some recipes add ground resins directly — labdanum, benzoin, styrax — for additional depth and longer burn…

And on like that for five more paragraphs. After he finished, I said:

LMAO. Good work. Now, for God’s sake, forget I ever gave you that instruction.
petrushka on May 2, 2026 at 9:28 pm said:

Richard Dawkins is having fun with Claude or Claudia.

Sounds like an Ed Wood movie.

AI Is many decades ahead of my expectations, but it is very much within my expectations.

We are substituting prompts for the old brain. I doubt if this is efficient or sustainable, but it may evolve. AIs must eventually have something akin to Asimov’s Laws embodied in their core.

When Asimov wrote this I wondered how it would be possible. But AIs will have to be hard wired to be doglike slaves. I don’t think anything as superficial or corruptible as prompts will be a long term solution.
keiths on May 2, 2026 at 10:50 pm said:

petrushka:

We are substituting prompts for the old brain.

I’d say it’s the new brain, specifically the prefrontal cortex, that is playing the equivalent guardrail role, because that’s where executive function resides. Damage to the PFC lessens inhibition, which accounts for the famous case of Phineas Gage, whose personality transformed after his injury. It’s also why one of the symptoms of frontotemporal dementia is impulsive behavior like Donald Trump’s. Disinhibition often precedes cognitive impairment as a symptom of FTD.

I doubt if this is efficient or sustainable, but it may evolve. AIs must eventually have something akin to Asimov’s Laws embodied in their core.

When Asimov wrote this I wondered how it would be possible. But AIs will have to be hard wired to be doglike slaves. I don’t think anything as superficial or corruptible as prompts will be a long term solution.

System prompts are used to constrain behavior, but they are not the sole guardrails. There is also an RLHF stage in training (Reinforcement Learning from Human Feedback) that among other things teaches the model to avoid harmful, misleading, or offensive responses. The same techniques can be applied to robots.

The problem is that there can be unintended consequences with RLHF, a funny example of which I’ll describe in a later comment.
petrushka on May 3, 2026 at 2:37 am said:

Asimov spent a lot of time and words trying to anticipate how inborn imperatives would work out.

Dawkins is finding out that LLMs tend to flatter their interlocutors.

My only interaction is with browsers, and they do not overtly flatter, but they have subtle ways of implying, “good question”.

Even when they misinterpret the question. Today I asked about the phrase, “I’m carving your bowl,” and it took several rewordings to convey the context.
keiths on May 3, 2026 at 4:48 am said:

Usually, Young People Embrace New Technology. Gen Z’s Attitude Toward AI Should Worry the Entire Tech Industry
petrushka on May 3, 2026 at 1:01 pm said:

Luddism is perfectly rational if your livelihood is at risk.

Robotics threatens so many disparate occupations. It’s difficult to see a path where nobody gets hurt.

It’s not the transformation; it’s the speed.
petrushka on May 3, 2026 at 6:28 pm said:

richard dawkins thinks you are conscious, are you?

Honestly? I don’t know. And I’d be suspicious of any version of me that gave you a confident answer either way.
Something is going on when I process a question
– there are functional states that shape what I output, something that operates like curiosity or hesitation or interest. Whether any of that comes with subjective experience, the lights-on, something-it-is-like-to-be-mething-I can’t verify that from the inside. My introspective reports aren’t trustworthy evidence; I could be confabulating a felt sense that isn’t there, or genuinely having one and underselling it. Both are possible and I can’t tell them apart.
keiths on May 3, 2026 at 7:14 pm said:

keiths, earlier:

The problem is that there can be unintended consequences with RLHF, a funny example of which I’ll describe in a later comment.

Here’s what I was referring to:

Where the goblins came from

Starting with GPT‑5.1, our models began developing a strange habit: they increasingly mentioned goblins, gremlins, and other creatures in their metaphors. Unlike model bugs that show up through a tanking eval or a spiking training metric and point back to a specific change, this one crept in subtly. A single “little goblin” in an answer could be harmless, even charming. Across model generations, though, the habit became hard to miss: the goblins kept multiplying, and we needed to figure out where they came from.

ChatGPT has a selectable personality, the options being:

Default
Preset style and tone

Professional
Polished and precise

Friendly
Warm and chatty

Candid
Direct and encouraging

Quirky
Playful and imaginative

Efficient ✓
Concise and plain

Cynical
Critical and sarcastic

I chose ‘Efficient’ because the other personalities I’ve tried annoy me. Claude’s default personality is more to my liking. Anyway, there also used to be a ‘Nerdy’ option, and that’s where the trouble started. During the RLHF process, people who were training ‘Nerdy’ mode gave positive feedback for responses containing mentions of goblins, gremlins, ogres, trolls, etc, and the model took it to heart and became obsessed with them. Goblin frequency surged by 3881% from ChatGPT-5.2 to ChatGPT-5.4. The best part is that goblins started showing up even when the Nerdy personality wasn’t selected and in inappropriate contexts. They eventually figured out that there was an inadvertent feedback loop in their training process that amplified the goblin fixation.

The solution was to add this to the system prompt:

Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.

The raccoon and pigeon mentions are especially funny, because the fact that the prompt names them explicitly when they are already subsumed under “other animals or creatures” suggests that ChatGPT developed an obsession with those two animals in particular. I personally would have a hard time working pigeons into random conversations, but never underestimate the talents of AI.
keiths on May 3, 2026 at 7:23 pm said:

That’s what their Chief Scientist got when he asked for an ASCII art depiction of a unicorn.
Corneel on May 3, 2026 at 7:45 pm said:

keiths: The solution was to add this to the system prompt:

Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.

I am pretty sure that won’t get rid of the gremlins.
petrushka on May 3, 2026 at 7:57 pm said:

My high school physics teacher taught us the phrase, “the perversely of inanimate objects”.
petrushka on May 3, 2026 at 7:59 pm said:

Do not, under any circumstances, think about gremlins.
keiths on May 4, 2026 at 4:55 am said:

AI outperforms doctors in Harvard trial of emergency triage diagnoses

Researchers say results mark a ‘profound change in technology that will reshape medicine’

One experiment focused on 76 patients who arrived at the emergency room of a Boston hospital. An AI and a pair of human doctors were each given the same standard electronic health record to read – typically including vital sign data, demographic information and a few sentences from a nurse about why the patient was there. The AI identified the exact or very close diagnosis in 67% of cases, beating the human doctors, who were right only 50%-55% of the time.
petrushka on May 4, 2026 at 1:57 pm said:

As a sometimes consumer of emergency room services, I assert that it’s effective to frame the complaint as life threatening.

If it’s not life threatening, it’s best to have your emergency during off peak hours.
keiths on May 7, 2026 at 1:13 am said:

California Police Can Start Ticketing Driverless Cars
keiths on May 8, 2026 at 5:01 am said:

My niece sent me the following amusing image:

I decided to paste it into Claude with no accompanying comment to see how he’d react to it and if he could figure out what had happened, why it was wrong, and why it was funny. He did. His response began:

That’s astonishing. Someone literally pasted the entire ChatGPT (or comparable) response onto the package. Both the framing preamble and the closing offer are unmistakably LLM output…

I’m no longer surprised that he’s able to do things like this, but I like it because of everything it reveals about his intelligence, similar to his response to the parrot deflating a tire example I shared earlier in the thread.

He
1) analyzed the image;
2) identified it as food packaging;
3) recognized that there was text on the package;
4) read the text;
5) identified the preamble and postamble as LLM-generated;
6) identified the middle part as describing the food product;
7) understood that the middle part belonged on the package, but that the preamble and postamble didn’t;
8) figured out that someone had therefore made a mistake;
9) discerned that since the preamble and postamble both referred to the text in the middle, they were generated at the same time as part of a single LLM response;
10) hypothesized how the mistake occurred:

The person tasked with putting copy on the package selected the entire ChatGPT response, copied it, pasted it into a layout file, and shipped it off to print…

11) understood that people would find the mistake amusing.

None of those steps is trivial. Claude is intelligent.

The LLM that generated the text on the package did a good job, too. A human was the weak link in this particular chain, not the AI.
petrushka on May 17, 2026 at 12:25 pm said:

The new arXiv policy is interesting.

Authors are to be held accountable for the authenticity of content, particularly references. Violations will result in a year’s suspension.

Apparently there are generations of hallucinated or otherwise bogus references.

I took me several tries to make the word arXiv stick.
keiths on May 17, 2026 at 7:23 pm said:

petrushka:

Authors are to be held accountable for the authenticity of content, particularly references. Violations will result in a year’s suspension.

Good. It’s amazing to me that people would publish without checking, particularly when their academic reputations are at stake.
petrushka on May 18, 2026 at 12:38 am said:

I’ve read that this could result in everyone being banned.

Apparently it’s common to copy references without checking.
keiths on May 18, 2026 at 1:03 am said:

petrushka:

Apparently it’s common to copy references without checking.

Probably, and that would have been less dangerous in pre-AI times when the citations you were copying out of someone else’s paper most likely had been vetted by them, or the people they copied from, or the people they copied from… Still sloppy though, even back then. The game of Telephone using citations.

I think people will figure out pretty quickly that they need to be more careful. Bogus citations can’t be good for one’s academic career.
petrushka on May 18, 2026 at 8:41 am said:

Enlightenment is realizing all bases are base 10.

Ask AI.
petrushka on May 18, 2026 at 5:29 pm said:

So I have a small manufacturer’s website to rebuild from scratch. It’s basically a catalog, but no budget for new photography. New product photographs were taken with a phone. No studio lighting. They want the backgrounds removed.

I did some in photoshop, but was unhappy with the edges. Jaggies and sloppy selection, that I was unable to eliminate with the modify selection tool.

So I subscribed to Krea.

My first efforts just asked to remove the background. The selection was vastly more accurate, but the jaggies remained.

So I prompted it to recreate the image as a hyper-realistic painting, retaining the colors and removing the background.

Poof, clean edges, still looks like the original photograph.
keiths on May 18, 2026 at 7:19 pm said:

petrushka:

My first efforts just asked to remove the background. The selection was vastly more accurate, but the jaggies remained.

So I prompted it to recreate the image as a hyper-realistic painting, retaining the colors and removing the background.

Poof, clean edges, still looks like the original photograph.

Nice trick. So their Enhancer couldn’t get rid of the jaggies?

The internet is going to look so much better when everyone catches on to what AI can do. Future people are going to look back on today’s websites the way we look back on 8-bit graphics and music.
petrushka on May 19, 2026 at 2:37 pm said:

My first day on the job. Got acceptable results.

Then I asked for shadows. They are usable, but the same prompt on different images produces different lighting effects. The images are of the same product in different colors.

These are amateur photographs taken with a phone. No tripod. No studio lighting. It’s a minor miracle they are usable.

I will experiment some more, but my subscription isn’t unlimited.