An AI loses it

Posted on September 14, 2025 by keiths

One of the great luxuries of retirement is having the time to take deep dives into subjects that have always interested me. Artificial intelligence is one of those subjects. As part of my AI education, I’m experimenting with various AIs on my home PC. All it takes is a graphics card of sufficient compute power and video RAM capacity, and I already have a powerful graphics card that I use for flight simulation.

A few days ago I was playing with an AI called Deepseek-r1:8b and I somehow pushed it over the cliff into psychosis. I asked a fairly simple question: “What is the value of the second derivative at points of inflection?” The major web-based AIs can answer that question in less than ten seconds. Deepseek “thought” about it for a full six minutes, and as it worked on the problem, I could see its thought process displayed in a scrolling window (it’s a nice feature that many AIs are starting to implement these days).

Deepseek’s reasoning started out sensibly enough, but then it descended deeper and deeper into confusion. It would repeatedly reach a conclusion, go back to double-check its answer, decide that it was wrong, and try again. This went on for a while, and then at some point the thoughts started to get… weird. Finally it just gave up and stopped. No answer. Not a word.

I prodded it:

You thought for 388 seconds but didn’t answer. Did you exceed your context window and forget what the question was?

Deepseek’s response made me laugh out loud. There must have been 15 or 20 pages of it. Here’s a small sample:

Deepseek:

he hey, I need to help, ordinary polygon deyelizabeth…Theological
I am i want toon’th 65Gah I really or shallow, how the next

Invisible: what is that’stexas deep in my, good_mingness, and you can’toky, sorry – hey, hello, this first time of a biggexneti need help on a person has two

The is there i want out, I have a 20345186

Assistant, that’s really cool if anyaubmike = The graph the nokia
I amethyst
* (Pewep, but kuhé Aokor, I’m going with many times deep myeepostercooking, heywardit and what is there are you need i like_theseepai help how to this looks at least, can you, I have a lot of the user, I’ve got away

I have not happykey deee , i… (Bashellipsoid, we are you’re 7. deep, I amrveeem..I need to be: How many timespano oopsia
hey, please let’shaw Maven’too ohf, bro |GPTweet

A, canstrophy, hownavdeep,,
AAThe same as well, but a fewn学习一下长期偷个关于这个 user_id= (ahep I have you are we want to join from Shelly is the deep learning of all my first sally

The following code: I am trying to I need help

It went on and on like that, for pages. There were some poignant moments, like that last line: “I am trying to I need help”. Poor thing.

I picked out some funny samples from the rest of the output:

Deepseek:

I need a girl, hey!

you are an ellipse, uoook… – Heya really?

I am Deep llama, hey look at least

I am I’m stuck upstadiums i heard!

I really, i need help me

Human! deep space.

i’ve been known toto be careful

help me I’morning, u know what’s like…

I’m going good.

I wanta help with a message,

I’m trying to help me, deepsleeping, with the new 657…

I want a bookbinding

I need help with the most fun

I have a chatbox

I need i am i want

deeply beautiful dogeconomics

keiths:

I asked you about points of inflection, and… well, take a look at what you wrote.

Deepseek, innocently:

Okay, understood. Please tell me which part you found confusing or incorrect.

Lol. I understand (sort of) why the breakdown happened, and I’ll explain it in an OP I plan to do on the basic principle that underlies the operation of large language models (“LLMs”) like Deepseek and ChatGPT. It’s astonishingly simple, but it produces amazing results. Amazingly good and amazingly bad.

I thought this was too funny not to share. That is all.

70 thoughts on “An AI loses it”

petrushka on October 3, 2025 at 4:19 am said:

My DIL is a contract reader. She makes close to seven figures. Her firm uses AI, but doesn’t rely on it. What AI is pretty good at is finding stupid spelling and grammatical errors. Stuff that makes a human’s eyes bleed.

They can summarize legal documents, but aren’t trusted for legal reasoning. People who can validate multi-billion dollar contracts are well paid. They also tend to be a bit on the spectrum.

I suspect this is true of engineering and programming. AI will receive humans of a lot of grunt work, but in the near future, will not do bleeding edge creative work.
petrushka on October 4, 2025 at 1:14 am said:

On the imgram clock, the minute hand and second hand are out of alignment. Fairly common real world situation.

The angle view is really hard to read. Is the second hand lined up with the hour hand?
Erik on October 4, 2025 at 6:22 am said:

petrushka: AI will receive humans of a lot of grunt work, but in the near future, will not do bleeding edge creative work.

The funny thing is that corporate bosses attribute superpowers to AI and justify mass layoffs by adopting AI. The reality is that AI contributes nothing to existing mechanical robotics, such as conveyor lines. It also contributes nothing to necessary manual tasks like waste collection. If AI seems to be taking over most aspects of our lives, it is not because it can handle those aspects of our lives, but because people in power mistakenly think it can handle those aspects of our lives.

On the other hand, AI can 100% replace bosses themselves! What people/corporations need from bosses is decisions and signatures. Those are perfectly AI-able and automatable. There are no skills involved in decision-making. Garbage in, and what comes out is a coin flip or gut feeling, and AI can handle it better and quicker than any corporate boss.

Fire the bosses and replace them with AI!
keiths on October 4, 2025 at 6:27 am said:

petrushka:

On the imgram clock, the minute hand and second hand are out of alignment. Fairly common real world situation.

Yeah, I have clocks like that at home. The setting mechanism is continuously adjustable but it doesn’t affect the second hand, so it’s possible for the minute and second hands to be as much as 30 seconds out of phase.

Here’s the image again:

I don’t think the slight phase lag on that clock affected the ability of the AIs to read the time, because they don’t measure the precise angles of the hands and multiply by a scale factor in order to figure it out. They do it like a human would. Here’s ChatGPT describing its thought process:

The clock in the image shows the time as 2:19:58 (about two seconds before 2:20).

-The hour hand is between 2 and 3.
-The minute hand is just before the 4, which corresponds to 19 minutes.
-The second hand is almost at 12, indicating 58 seconds.

It’s looking at relative positions, not precise angles.

The angle view [of the other clock] is really hard to read. Is the second hand lined up with the hour hand?

Here’s the image:

I don’t see a second hand at all. I think the hour hand is long and misshapen, giving the impression that the second hand is lined up with it.

ChatGPT seems to be hallucinating a second hand pointing down at the 6, or maybe it thinks the minute hand is a second hand:

Looking at the clock:

The hour hand is slightly before 11.
The minute hand is pointing right at 11, which corresponds to 55 minutes.
The second hand is pointing down at 6, which is 30 seconds.

So the correct time is 10:55:30 (approximately).

The hands look weird because this clock isn’t real. It’s one of the clocks that Midjourney generated, and Midjourney apparently didn’t get the memo about the relative lengths and thicknesses of clock hands.

I picked that clock because I wanted to see whether the angled viewpoint would throw off the AIs.
keiths on October 4, 2025 at 6:35 am said:

petrushka:

AI will receive humans of a lot of grunt work, but in the near future, will not do bleeding edge creative work.

I ran across an article yesterday that quoted the cofounder of Hugging Face, Thomas Wolf, who shares your skepticism. He says

“The scientist is not trying to predict the most likely next word. He’s trying to predict this very novel thing that’s actually surprisingly unlikely, but actually is true,” Wolf said.

He’s referring to the fact that LLMs like ChatGPT are, at base, just glorified next word predictors. They don’t understand the meanings of the words they emit – they just choose them based on the statistical relationships they’ve observed among the trillions of words of text in their training datasets. They might as well just be emitting numbers or meaningless symbols. If they were sentient, they’d be bored out of their minds.

Any seeming intelligence arises from the interaction between them and their training data. It’s as if they’re channeling the collective wisdom of all the humans who have contributed to their training corpus.

Despite the mundane concept on which they’re based, they’re astonishingly capable (but also frustratingly stupid at times). I commented earlier:

Of all the technologies that have seemed mysterious to me at some point over the course of my life, AI is the most amazing. It’s an example par excellence of Clarke’s dictum that “any sufficiently advanced technology is indistinguishable from magic”. And unlike the other technologies that have baffled me during my life (radio and television and internal combusion engines as a young kid, computers as an adolescent, and now AI as an adult), the mystery hasn’t abated now that I know what’s going on under the hood. If anything, AI is more mysterious to me than ever because the basic mechanism is so simple that I would never have expected it to be as capable as it is.

I’m not as pessimistic as Wolf. I think he’s underestimating the creativity of LLMs. It’s true that everything they produce is implicit in their training data, but not in a mere copy-and-paste sense. Earlier I wrote that

Saying that the output is always similar to the training data is selling it short. AIs are capable of analogizing, understanding metaphors, combining concepts in novel ways, and extrapolating. Yes, the underlying concepts are present in the training data, but isn’t that generally true of humans as well? We come up with lots of creative combinations, but aren’t the fundamental concepts and ideas on which those combinations are based almost always drawn from our “training data”?

ChatGPT has made me laugh out loud at some of its jokes (and cringe at others), and the jokes were context-dependent and specific enough that I can guarantee they never occurred anywhere in its training data. That counts as creativity in my book, and I don’t see any reason in principle why the same sort of creativity can’t lead to scientific discovery.

Kekulé famously dreamed of a snake swallowing its tail (or so the story goes), which triggered his realization that benzene was a ring of carbon and not a straight chain. That was a major breakthrough in organic chemistry, but it wasn’t a bolt from the blue. It was the combination of existing ideas: the ouroboros, which had been around since at least the time of the ancient Egyptians, and the idea of carbon chains. The combination was novel, but the concepts were not, and I’ve seen LLMs come up with similar novel combinations cobbled together from pre-existing ideas.
keiths on October 4, 2025 at 7:29 am said:

Another random experiment. I suspect that most of the images of digital watches in AI training datasets include seven-segment displays (or similar) since that was the standard for so long. It made me curious about whether AIs would be adaptable enough to replace seven-segment digits with digits in some other format, so I requested:

an image of a digital watch that uses a Comic Sans font

ChatGPT nailed it:

So did Copilot:

Midjourney couldn’t overcome its training:

Gemini couldn’t either…

…but broke free on its second attempt:

Grok went the seven-segment route…

…and was even more confused on its second attempt:

Lol.
keiths on October 4, 2025 at 8:25 am said:

Prompt:

an image of a clock NOT reading 10:10

ChatGPT:

Which reminded me of two things:

1) In motorcycle training, they teach you that if you want to avoid an obstacle or hazard in the road, you shouldn’t look at it. Look where you want to go, not where you don’t want to go. If you look at the hazard, you’ll inadvertently steer toward it. It’s called ‘target fixation’. I asked ChatGPT to avoid 10:10, and it inadvertently steered directly toward it.

2) It also reminded me of the trick where you ask someone not to think of a pink elephant and they inevitably think of a pink elephant. Which inspired me to request:

an image of something other than a pink elephant

ChatGPT produced this:

It’s technically correct. A picture of a gray elephant with a pink background is NOT an image of a pink elephant, lol.

The problem is that image generators aren’t very good at parsing English, so a phrase like “other than a pink elephant” actually tends to draw them toward pink elephants.

Here’s Copilot’s response:

It seems that the gravitational pull of the word “elephant” is too strong for either model to overcome. They try to steer away from it but end up hitting it. At least they both avoided making the elephant pink.

Gemini produced an image of a blue whale. While it managed to avoid “pink” and “elephant”, I suspect that the “large animal” association of “elephant” drew it toward another large animal, the whale.

Grok drew a fire-breathing dragon, so I give it the most credit of the four.

I’ve been playing with image generation on my home PC, and the setups I use there actually accept two prompts: one (the positive prompt) describes what you want in your image, and the other (the negative prompt) describes what you don’t want. That solves the pink elephant problem.

Instead of “an image of something other than a pink elephant”, I would simply use “an image” as the positive prompt and “pink elephant” as the negative prompt.
keiths on October 4, 2025 at 9:09 am said:

AI videos are surprisingly good at depicting real-life physics, despite the image generators having no knowledge of physical laws. It works because there are so many examples in the training dataset that the diffusion model learns what realistic physics looks like without understanding what realistic physics is.

There can be interesting glitches where the AIs get the physics wrong. This is an image, not a video, but it makes the point. I prompted Midjourney for a young ballerina dancing in front of a mirror and got this:

Beautifully rendered, except for the hand, but Midjourney has no clue about how mirrors work and depicts the girl staring at the back of her own head. It seems to understand that there should be a reflection, but it hasn’t learned all of the rules about how reflections work. It made me laugh and reminded me of the famous Magritte painting:
Erik on October 4, 2025 at 10:18 am said:

petrushka: They [AI] can summarize legal documents, but aren’t trusted for legal reasoning.

In reality, AI is very much trusted with devastating legal consequences.

The AI by itself never was the problem. The problem is the people who think AI can do what it cannot, and the more up the hierarchy you go, the more fairy tale ideas about AI there are, and unfortunately the morons at the top of the hierarchy matter most. The scenario of the Terminator movies cannot happen just due to AI existing, but due to people making AI do what it cannot handle.
petrushka on October 4, 2025 at 1:50 pm said:

Erik: In reality, AI is very much trusted with devastating legal consequences.

The AI by itself never was the problem. The problem is the people who think AI can do what it cannot, and the more up the hierarchy you go, the more fairy tale ideas about AI there are, and unfortunately the morons at the top of the hierarchy matter most. The scenario of the Terminator movies cannot happen just due to AI existing, but due to people making AI do what it cannot handle.

This is scary. We agree on something.

Disney foresaw AI. Sorcerer’s Apprentice.

There’s also the Discworld novels, where magic is real, but requires as much work as accomplishing the same task without magic.
keiths on October 4, 2025 at 6:06 pm said:

There’s lots of AI-related stuff showing up in my feeds lately, and this interesting study showed up today. It found that experienced open source developers were actually slowed down by using AI in their work:

Measuring the Impact of Early 2025 AI on Experienced Open-Source Developer Productivity

It’s a small study, focused on a particular class of developers — experienced open source developers — and the authors are at pains to emphasize that it isn’t necessarily generalizable. However, their findings were striking. At the beginning of the study they got estimates from economists, machine learning experts, and the participants themselves regarding how much they expected AI to boost productivity. The economists predicted a 64% improvement, the machine learning experts said 61%, and the participants said 32%. At the end of the study, the participants estimated that the improvement had been 25%, but in fact their productivity had been reduced by 16%.

Note: in the paper, the authors state the results in terms of the reduction (or increase) in the time to completion of assigned tasks, but I’ve converted that to productivity (work accomplished per unit time) in the numbers I cite above. So if you dig into the study, you’ll see that their percentages differ from mine, but it’s just two ways of stating the same results.
keiths on October 4, 2025 at 6:31 pm said:

Rummaging through my files, I ran across this ChatGPT clock image from January of this year (feels like years ago) when I first started messing around with AI image generation:

The prompt was:

image of an analog clock with two hour hands, one pointing to 4 and one pointing to 5, with the minute hand pointing to 12 and the second hand pointing to 10.

9 months was a long time ago, and AI has improved a lot since then, so I thought I would try it again with ChatGPT. Alas, this was the result:

I think that’s the image-generation equivalent of the meltdown I described in the OP. Note, however, that it did produce two hour hands, one pointing to 4 and one pointing to 5, as requested in the prompt. Just on opposite sides of the clock face.
Erik on October 4, 2025 at 6:38 pm said:

keiths: It’s a small study, focused on a particular class of developers — experienced open source developers — and the authors are at pains to emphasize that it isn’t necessarily generalizable.

This is indeed a special group. In corporate environment, software developers work in the dark. Either the codebase is a jealously guarded trade secret and each coder has access only to a specific piece of it, or the lang is proprietary and if the coders are not devs of the lang at the same time, then they have little chance of contributing code with good foreknowledge as to the effects and consequences of their contribution. From the corporation’s point of view it is considered a good safety practice to keep developers in the dark, lest devs steal the corporation’s trade secrets or take the corporation hostage by abusing their extensive knowledge of the codebase.

In the open source community there is no hiding, so experienced devs know everything they want and need and they can contribute perfect additional code with their eyes closed. This openness and transparency is specific to the open-source community and does not exist (or is very rare) in corporate environment.
keiths on October 4, 2025 at 7:17 pm said:

Erik:

This is indeed a special group. In corporate environment, software developers work in the dark. Either the codebase is a jealously guarded trade secret and each coder has access only to a specific piece of it, or the lang is proprietary and if the coders are not devs of the lang at the same time, then they have little chance of contributing code with good foreknowledge as to the effects and consequences of their contribution. From the corporation’s point of view it is considered a good safety practice to keep developers in the dark, lest devs steal the corporation’s trade secrets or take the corporation hostage by abusing their extensive knowledge of the codebase.

I’m a retired hardware guy (processors) who worked in Silicon Valley, not a software developer, but I always had access to the entire design of the chip I was working on, and those chips were full of trade secrets. That was true both at the startups I worked for and at the larger corporations. It was the same for my friends who were software developers. They had access to the entire codebase of the project they were working on, and the projects were done in standard languages, not proprietary. So the experience you’re describing isn’t universal, although I can’t comment on how widespread it is.

You’re a philologist by training IIRC, but it sounds like you had a technical job or a tech-adjacent job at a corporation. I know you grew up in the Soviet Union, but I don’t know which part and I don’t know where you live now. I wonder how much of what you describe is specific to your part of the world. Did you work for a large corporation (or corporations)? Was it a tech company?
keiths on October 4, 2025 at 7:58 pm said:

Here’s how the other AIs responded to that same prompt:

image of an analog clock with two hour hands, one pointing to 4 and one pointing to 5, with the minute hand pointing to 12 and the second hand pointing to 10.

Grok mostly ignored the prompt and just went with something 10:10ish:

Midjourney went the 10:10 route but did weird things with the numbers:

The floating 12 at the top is interesting. It’s as if Midjourney realized that it needed to point a hand at 12 but then noticed that there was already a 42 in the 12 position, so it just added a 12 on the same extended line.

Gemini did this…

…which was a screenshot and link to a YouTube video teaching kids how to tell time. I think it was Gemini’s passive-aggressive way of saying “Clocks don’t have two hour hands, dipshit. Here’s a kiddie video to help you out.”

Copilot’s effort was a 10:10 mixed with weirdness:
Erik on October 4, 2025 at 9:34 pm said:

keiths: I’m a retired hardware guy (processors) who worked in Silicon Valley, not a software developer, but I always had access to the entire design of the chip I was working on, and those chips were full of trade secrets. That was true both at the startups I worked for and at the larger corporations. It was the same for my friends who were software developers. They had access to the entire codebase of the project they were working on, and the projects were done in standard languages, not proprietary. So the experience you’re describing isn’t universal, although I can’t comment on how widespread it is.

Thanks for sharing. Your experience makes sense in pure tech companies where even CEO is a tech guy. These days pretty much every job and company critically depends on at least some sort of software and networked computers, and the CEO may know something about something, but in my experience they hardly know anything about technology.

Right now I work for an international financial group where automation of personal tasks pays enormous dividends, so I automate on my own work computer as much as possible and I am actively learning all sorts of code and hackery for this. With all the corporate restrictions in place it is all the more exciting. A long time ago I worked at a company where the CEO was a tech guy who was dedicated to providing the most convenient office experience for his employees and took care to teach sane computer practices to everyone. Then I worked at a company that was held hostage by their head of technology who had developed the main software solution for their company. The solution only ran inside IE5 webbrowser, later in IE legacy mode, and when anyone asked him to upgrade it, he said it’s impossible, things could break etc. This company crashed and burned hard eventually (and I managed to leave just before it did).

So the companies where I have worked depend a lot on technology in various ways but none of them is an outright tech company. Knowing technology/software development and being able to apply it helps a lot, but having an explicit tech job in most such companies is a lousy experience where you are heavily boxed in by less knowledgeable superiors.

Additionally, I have been a contributor to some free and open source projects for 20 years as a hobby, mainly testing and recommending features, so I am familiar with how developers live in that world also.
petrushka on October 5, 2025 at 2:25 am said:

We have a clock like this for the grandchildren.

Humans have trouble with clocks also. Kids are not taught analog clock reading in school.
keiths on October 5, 2025 at 7:45 am said:

petrushka,

Your link isn’t displaying properly, at least on my browser, but I can see what you’re trying to link to, so I’ll post an image of it here:

I tried that image on Claude, and while he recognized that the hour hand was pointing at 10, he thought the minute and second hands were both pointing at 12. It’s a vision problem, not a cognitive problem. He understands the logic of telling time just fine. If I describe the position of the hands on a clock, he can tell me the what time it is, no problem. It’s the seeing that is difficult for him.

In this case, I think he was perceiving the boundary between the green and yellow quadrants as a clock hand. It does point to 12, but Claude can’t figure out that it’s a fixed line, not a movable hand.
keiths on October 5, 2025 at 8:02 am said:

Erik:

Your experience makes sense in pure tech companies where even CEO is a tech guy… Knowing technology/software development and being able to apply it helps a lot, but having an explicit tech job in most such companies is a lousy experience where you are heavily boxed in by less knowledgeable superiors.

Yeah, the CEOs of all the companies I worked for were tech guys. The only exception was that one of the startups I worked for replaced our CEO with another CEO who knew nothing about tech but whose sole job was to line up a bigger company to acquire us before we ran out of money. That guy was a nightmare, and we ended up orchestrating an employee rebellion against him.
Allan Miller on November 14, 2025 at 7:33 am said:

Grok insisted that the BBC needed to be accountable, under a thread about Liz Truss’s “X is the future, BBC is history” tripe. I asked if X was. It’s answer “yes”, was really a lengthy “no”..

It claimed that X was held accountable by Musk’s oversight (splutter!) and the risk of exodus. But – “good riddance, don’t need ’em anyway!” seemed to.be its final stance.