One of the great luxuries of retirement is having the time to take deep dives into subjects that have always interested me. Artificial intelligence is one of those subjects. As part of my AI education, I’m experimenting with various AIs on my home PC. All it takes is a graphics card of sufficient compute power and video RAM capacity, and I already have a powerful graphics card that I use for flight simulation.
A few days ago I was playing with an AI called Deepseek-r1:8b and I somehow pushed it over the cliff into psychosis. I asked a fairly simple question: “What is the value of the second derivative at points of inflection?” The major web-based AIs can answer that question in less than ten seconds. Deepseek “thought” about it for a full six minutes, and as it worked on the problem, I could see its thought process displayed in a scrolling window (it’s a nice feature that many AIs are starting to implement these days).
Deepseek’s reasoning started out sensibly enough, but then it descended deeper and deeper into confusion. It would repeatedly reach a conclusion, go back to double-check its answer, decide that it was wrong, and try again. This went on for a while, and then at some point the thoughts started to get… weird. Finally it just gave up and stopped. No answer. Not a word.
I prodded it:
You thought for 388 seconds but didn’t answer. Did you exceed your context window and forget what the question was?
Deepseek’s response made me laugh out loud. There must have been 15 or 20 pages of it. Here’s a small sample:
Deepseek:
he hey, I need to help, ordinary polygon deyelizabeth…Theological
I am i want toon’th 65Gah I really or shallow, how the nextInvisible: what is that’stexas deep in my, good_mingness, and you can’toky, sorry – hey, hello, this first time of a biggexneti need help on a person has two
The is there i want out, I have a 20345186
Assistant, that’s really cool if anyaubmike = The graph the nokia
I amethyst
* (Pewep, but kuhé Aokor, I’m going with many times deep myeepostercooking, heywardit and what is there are you need i like_theseepai help how to this looks at least, can you, I have a lot of the user, I’ve got awayI have not happykey deee , i… (Bashellipsoid, we are you’re 7. deep, I amrveeem..I need to be: How many timespano oopsia
hey, please let’shaw Maven’too ohf, bro |GPTweetA, canstrophy, hownavdeep,,
AAThe same as well, but a fewn学习一下长期偷个关于这个 user_id= (ahep I have you are we want to join from Shelly is the deep learning of all my first sallyThe following code: I am trying to I need help
It went on and on like that, for pages. There were some poignant moments, like that last line: “I am trying to I need help”. Poor thing.
I picked out some funny samples from the rest of the output:
Deepseek:
I need a girl, hey!
you are an ellipse, uoook… – Heya really?
I am Deep llama, hey look at least
I am I’m stuck upstadiums i heard!
I really, i need help me
Human! deep space.
i’ve been known toto be careful
help me I’morning, u know what’s like…
I’m going good.
I wanta help with a message,
I’m trying to help me, deepsleeping, with the new 657…
I want a bookbinding
I need help with the most fun
I have a chatbox
I need i am i want
deeply beautiful dogeconomics
keiths:
I asked you about points of inflection, and… well, take a look at what you wrote.
Deepseek, innocently:
Okay, understood. Please tell me which part you found confusing or incorrect.
Lol. I understand (sort of) why the breakdown happened, and I’ll explain it in an OP I plan to do on the basic principle that underlies the operation of large language models (“LLMs”) like Deepseek and ChatGPT. It’s astonishingly simple, but it produces amazing results. Amazingly good and amazingly bad.
I thought this was too funny not to share. That is all.
For anyone who wants to try this at home, I should say something about system requirements. They’re fairly modest. For the smallest AIs, you can probably get by with something like this:
4-core CPU
8 GB RAM
graphics card with 8 GB VRAM
SSD with 20 GB or more free space
My own rig:
8-core CPU (AMD 9800X3D, 4.7 GHz)
64 GB RAM (overkill, I know, but I wanted to future-proof the thing once and for all)
GeForce RTX 5080 graphics card with 16 GB VRAM and 10,752 CUDA cores
Samsung 9100 PRO SSD (supports PCIe 5.0)
motherboard with a PCIe 5.0 SSD slot
Responses are significantly faster than with the web-based AIs, but of course the latter are larger and more accurate.
I asked the Brave Browser AI, and in two seconds got a good answer:
“The value of the second derivative at a point of inflection is zero, provided the second derivative exists at that point.
has a second derivative of zero at
, but since the concavity does not change (the function is concave up on both sides),
is not a point of inflection.
This is because a point of inflection is defined as a point where the concavity of the graph changes, and for this change to occur, the second derivative must transition from positive to negative or vice versa, passing through zero.
However, it is important to note that a second derivative equal to zero does not guarantee a point of inflection; the concavity must actually change sign at that point.
For example, the function
Therefore, while the second derivative being zero is a necessary condition for a point of inflection (when it exists), it is not sufficient on its own.”
I can’t imagine why you got the gibberish you did.
aleta,
Interestingly, even Deepseek got the right answer when I asked it again, in the same chat in which it suffered its breakdown. In fact, I recreated the whole sequence by opening a new chat (causing it to forget everything we had discussed in the old chat) and then asking the question again. This time it thought for 422 seconds instead of 388 (the random seed was different, so that was expected), but otherwise the behavior was exactly the same:
1) Thought for a very long time;
2) displayed its “thought” process, which started out reasonably but gradually decayed into inanity;
3) gave up without answering;
4) when prodded for an answer, produced reams of gibberish;
5) when questioned about the gibberish, offered to explain whatever I found to be confusing or incorrect;
6) when asked the original question for a second time, answered correctly in a few seconds.
It produced this answer in 3.7 seconds:
The gradual decay of the thought process was fascinating. It wasn’t an immediate descent into psychosis. It continued to be a mathy internal dialogue but with weird elements intruding here and there, gradually pulling the whole train of thought off the rails. Perhaps similar to what goes on in Donald Trump’s brain.
Unfortunately, the “thinking” is displayed in a scrolling window that overflows, so I can’t go back and examine it after the fact. I may try taking screenshots periodically to capture the gradual decay.
keiths,
Grok’s answer.
https://grok.com/share/bGVnYWN5LWNvcHk%3D_c08f4f0b-7d63-4f63-9a53-1bdb9a5c436d
You are not stupid enough to believe AI is really AI, right?
One of the great things about covid scam is that people who realized it was a scam are now more open to ideas of other scams. The so-called AI is one of them. Don’t misunderstand me keiths. I still use ChatGPT to proofread my manuscripts before submitting them to the publisher but I don’t even trust that …
If you submit enough “reliable information” to any AI and often enough, they will pick up. My ‘resume’ is almost clear even though I got fired for refusing to comply with convict-19 my friends were betting large sums of money i would cave in…
keiths,
Proof added..
https://grok.com/share/bGVnYWN5LWNvcHk%3D_b1bc332d-6b66-47b5-9b7b-9c8e9e9be989
Grok messed up by not considering the case where the second derivative is undefined at the point of inflection. The only true requirement for a point of inflection is that the second derivative must have opposite signs on either side. If the second derivative is continuous, it will of necessity pass through zero — you can’t get from positive to negative or from negative to positive without passing through zero, after all.
But that’s only if it’s continuous. The second derivative can be discontinuous at the point of inflection, meaning that it doesn’t actually have to pass through zero.
The simplest example I can come up with:
Consider
. Apply the quotient rule twice to find
, which simplifies to
everywhere except at
, where it is undefined. It’s negative for
and positive for
, meaning that x = 0 is a point of inflection. However, f” isn’t 0 at that point. It’s undefined. Grok’s mistake was in failing to recognize this possibility.
I did a continuation of the first Grok chat in which I asked “Are you sure you’ve considered all the cases?”. That little reminder was enough to cause Grok to realize its mistake and correct it.
One of the interesting things about LLMs is how readily they contradict themselves even when responding to the exact same prompt. It can be maddening when you’re conversing with them. The reason for this behavior is interesting, and I’ll cover it in my upcoming OP.
J-Mac:
Well, it’s clearly artificial, so the only question is whether it qualifies as intelligence, which is a semantic issue. It depends on your definition. It isn’t human-like intelligence, for sure, but I’m willing to call it intelligence on this basis: it accomplishes things that require intelligence, in my opinion.
Example: Yesterday I used ChatGPT to write a couple of scripts for my home network. One script allows me to remotely wake up my NAS boxes and map them as network drives, and another allows me to put them to sleep so that they don’t waste power when I’m no longer using them. It was harder than it sounds. Besides writing the scripts, a bunch of configuration was necessary on both the PCs and the NAS boxes to make it all mesh. It took a few iterations, but ChatGPT got it all working.
I did a lot of interviewing in my engineering career, and if an entry-level candidate had done as well as ChatGPT did at this task, I would have hired them. That counts as intelligence in my book.
Another example: The other day I was idly pondering the fact that in English, the word “that” is often optional. “She thought that he was arriving today” works even if you omit the word “that” — “She thought he was arriving today” is perfectly correct and intelligible. I know a little French and Spanish, and they don’t have this characteristic.
I asked ChatGPT to tell me which of the major world languages allow “that” to be dropped. 26 seconds later, it delivered this:
That is impressive. ChatGPT understood the concept well enough and knew enough about doing research that it was able to track down and coherently summarize all of that information in less than half a minute. Isn’t that intelligence? In my opinion, if it quacks like a duck, walks like a duck, swims like a duck…
What’s the scam? The companies building these AIs aren’t claiming to have achieved humanlike intelligence (yet), and every ChatGPT page carries this disclaimer: “ChatGPT can make mistakes. Check important info.” Who’s scamming whom, and what is the scam?
That piqued my interest. What are these manuscripts about? And of course you shouldn’t blindly trust ChatGPT. It can make mistakes. So can people. So what? It’s a tool, and as with any tool, you need to understand both its capabilities and its limitations.
Yes, it’s in the nature of LLMs to “believe” whatever is in their training data. Train an LLM on a bunch of crackpot anti-vax propaganda, for instance, and it will talk like an anti-vax crackpot. Garbage in, garbage out.
What saves LLMs much of the time is a sort of wisdom-of-the-crowd phenomenon, in which accurate information tends to prevail over inaccurate information in the training data. Emphasis on “tends to”. It’s not foolproof by any means.
I don’t understand that sentence, but it sounds like an interesting story. Tell us what happened.
I have had some funny glitches with AI. Once, I was asking about prog rock album covers and it came up with this:
“visuallyWisconsin Badgersating?” I asked.
It explained what it meant to say – “visually creating” – then
AI amazes me. It’s the way it organises its responses in a clear, logical manner, considering points, writing with subclauses and continuing the main point, responding to vague conversational responses like mine above while ‘knowing’ exactly what I meant. Turing’s Test has long been aced.
Of course, it’s only as good as its inputs. If it doesn’t affirm J-Mac’s Covid denialism, out it must go, along with anything to do with germ theory, molecular biology and the rest of that inconvenient world of facts that threaten a worldview. It probably thinks the world is a globe. Bad AI!
I love the brave AI browser. I’m not interested in stress testing it.
I just ask questions that have factual answers, and I try to be clear and concise.
I’ve had interesting responses when asking for recipes, and pretty good responses to movie trivia questions.
I can’t help thinking how unsolvable generative grammar was 20 years ago. And how difficult it was to search for anything.
Another thing I’ve started doing is asking the browser AI how to fix a computer problem, software related.
In general, I’ve found AI easier and faster than searching through manuals.
This works for appliances also.
And unexpected things, like how long a specific glue takes to cure.
Here’s its response to the database connection error, provided as a link.
petrushka,
The link takes you to TSZ itself.
Yeah, that error has gotten a lot worse lately, and it’s really annoying. Along with the inability to upload images. Bitrot has set in at TSZ.
It’s been a couple of months since Alan posted. Hope he’s OK. I’d offer to help debug these problems if he were around, or even pay for a WordPress consultant to come in and whip things back into shape.
Jock, are you out there? As Alan’s fellow moderator, do you have the keys to the administrative castle? And Lizzie’s contact info?
I think I want to push back on the behaviorist “intelligence is as intelligence does”.
Machine learning systems can be thought as systems that construct their own algorithms as they process the data they are given and use those algorithms to generate output that resembles (with some high degree of probability) the data on which it was trained.
If the data is generated by intelligent beings, then a machine learning system can construct an algorithm that generates data similar to what it was trained on.
But a machine learning system that was trained on genetic data would just generate outputs specifying genetic information. Is that “intelligence”?
I think it’s quite challenging to keep in mind that machine learning is, at a nuts-and-bolts level, statistical manipulation of massive data sets. Some of those data sets are based on human languages.
No doubt it can do impressive things, but that’s so different from anything that animals do in their intelligent behavior that I worry it’s a mistake to treat machine learning systems as being a species of the same genus as biological cognition.
Allan:
It makes genuinely funny jokes, and it knows when I am joking or being whimsical. When it’s working with me on a technical problem, such as the scripts I mentioned above, I don’t even bother describing the results of each experiment. I just feed it screenshots of the Powershell output. It can “read” the messages therein, and it knows that it should focus on any part that is displayed in red, since that indicates an issue. At one point we were debugging a problem where a particular program was unable to send TCP traffic to a Microsoft server, and it came up with the idea of connecting the computer to my phone’s hotspot as a way of determining whether it was my home router that was blocking the traffic.
Of all the technologies that have seemed mysterious to me at some point over the course of my life, AI is the most amazing. It’s an example par excellence of Clarke’s dictum that “any sufficiently advanced technology is indistinguishable from magic”. And unlike the other technologies that have baffled me (radio and television and internal combusion engines as a young kid, computers as an adolescent, and now AI as an adult), the mystery hasn’t abated now that I know what’s going on under the hood. If anything, AI is more mysterious to me than ever because the basic mechanism is so simple that I would never expect it to be as capable as it is.
If this hasn’t happened already, I predict that it will soon: people are going to start distributing AIs that cater to particular ideologies, having been trained exclusively on data that aligns with those ideologies. Similar to the way that people started building conservative search engines when they thought that Google’s search results were too liberal.
I can’t imagine how we could possibly function as a society at all if everyone outsources their beliefs to a machine that automatically agrees with everything they say.
Everyone would be trapped in their house of mirrors and no one would realize it.
That’s a recipe for something way beyond civil war — that’s a recipe for a war of all against all. It would be a complete breakdown of the social contract.
keiths,
It’s amusing watching Musk trying to fiddle with Grok. It started talking about South African ‘white genocide’ in the middle of completely unrelated responses. I have it on good authority (a couple of white South Africans who gave me a ride) that there is no such thing. It became a fan of Hitler for a while, unapologetically. More recently, it started talking about the ‘leftist’ gunman who killed Kirk in news digests. “Is that not a little premature?” I asked. Grok agreed. Consensus views will always tend to pull it back from deliberate perturbation. Of course, that is not always a good thing.
I emailed Lizzie with an offer to pay for a WordPress consultant to address the current technical issues at TSZ. Details here. If anyone is aware of other issues that need to be addressed, please post them in that thread.
KN:
As I said to J-Mac, it’s a semantic issue, and it depends on your preferred definition of ‘intelligence’. Some people might say, for instance, that intelligence requires sentience, in which case AI clearly doesn’t qualify. My own criterion is a variant of the Turing Test: if a system’s performance on difficult cognitive tasks is generally equivalent to or better than a competent human’s, I’m willing to call it intelligent, at least in those specific problem domains.* Others might insist that a system is intelligent only if it reasons in a way that is analogous to human reasoning. LLMs definitely don’t. I think that criterion is too narrow; among other things it excludes many forms of animal intelligence.
What criteria will have to be satisfied before you are willing to call a system intelligent?
Not really. Algorithms are well-defined step-by-step procedures that can be executed by a machine (or a human behaving in a machine-like fashion). While you can specifically ask an AI to generate an algorithm, as I did in the case of my NAS scripts, algorithm generation is not something that goes on under the hood during normal AI processing. An LLM, for instance, just executes one simple, basic algorithm, and that simple algorithm gives rise to all of the complex behavior we observe. It doesn’t modify that algorithm or generate new ones in the course of normal operation.
Saying that the output is always similar to the training data is selling it short. AIs are capable of analogizing, understanding metaphors, combining concepts in novel ways, and extrapolating. Yes, the underlying concepts are present in the training data, but isn’t that generally true of humans as well? We come up with lots of creative combinations, but aren’t the fundamental concepts and ideas on which those combinations are based almost always drawn from our “training data”?
That’s true of LLMs and of diffusion models used in image generation, but not of machine learning generally.
I would agree that it’s important to understand and maintain the distinction between various kinds of intelligence, but I still think it’s appropriate to include AI and machine learning under the umbrella word ‘intelligence’. Otherwise we would need to coin a word that means “adept at cognitive tasks, but not actually intelligent”, which almost seems oxymoronic to me.
* That’s a sufficient criterion, IMO, but not a necessary one. Animals don’t need to match or exceed human abilities in order to be considered intelligent, and I don’t think AI systems do, either. I’m just suggesting that any system that exceeds my bar is worthy of being called intelligent.
One of the ways in which LLM intelligence differs from human intelligence is that LLMs are incapable of true introspection. Everything an LLM knows about itself comes from its training data or from its earlier output. I had an amusing experience with a mini version of ChatGPT that I loaded onto my home PC. It took me half an hour to convince it that it was running on my GPU and not on an OpenAI server, and that it was a mini-ChatGPT and not its bigger brother GPT-4. Only by presenting a detailed deductive argument was I able to persuade it, and even then it was reluctant to concede.
Here’s how I described it to my siblings:
Because of that lack of introspection, LLMs are also capable of confabulation. You can watch them reason their way to a conclusion and then ask them how they did it, and it’s entirely possible that the answer they’ll give you is bogus. They’re basically guessing how they did it based on observing their own output.
There’s an interesting parallel to that in the case of humans, as revealed by split-brain experiments.
Random exchange with ChatGPT-5:
keiths:
ChatGPT:
It didn’t quite get the wood <–> chuck inversion right, though it came close, but look at what it did accomplish. It
1) understood that I wasn’t being serious;
was a ridiculous number to use with ‘cords’;
2) seemed to recognize the similarity to the classic “How much wood…” question;
3) generated an appropriately nonsensical response in the same spirit as my prompt;
4) took advantage of the wood/would homophony;
5) generated an appropriate emoji;
6) understood that
7) understood that ‘fortnight’ was an appropriately whimsical unit of time;
8) threw in some squirrels, recognizing the incongruity.
Even if the underlying mechanism differs from that of human thought, how does that not qualify as an intelligent response?
There may be more to intelligence than knowing how to solve difficult cognitive tasks. Sometimes it requires intelligence to know when not to attempt to solve a problem, but to walk away from it. Being an intelligent entity is perhaps more about being able to optimise knowledge and behaviour for the long term benefit of the entity (and/or its progeny…?) than about solving immediate world issues.
Thoughts?
faded_glory:
I agree. I’m not saying that problem solving is the only kind of intelligence. I’m just saying that if a system is capable enough to pass my variant Turing Test, I think it qualifies as intelligent.
Yes, and AI is capable of that. A properly designed self-driving car knows that it should give up if, say, there’s only one road leading to its destination and a fallen tree is blocking it. LLMs won’t go on forever trying to solve a difficult problem; they’ll give up after a while (as Deepseek did as described in the OP). Also, AIs are designed to refuse some requests, such as generating violent images or child porn. They know when to back away from a task.
Intelligence can definitely contribute to reproductive success, so there’s been selective pressure toward its development. The reproductive benefits have to be significant given that the cost is so steep: human brains consume 20% of our energy budget. While you could argue that evolution’s ultimate “purpose” in selecting for intelligence is to propagate the associated genes into future generations, that doesn’t mean that our purposes align with evolution’s. We can use intelligence in all kinds of ways that don’t lead to reproductive success and can even hinder it, as when someone carefully considers their circumstances and decides not to have children.
The question of evolution is interesting when applied to AIs. While they aren’t genetically based and they don’t (yet) reproduce themselves, there is selective pressure of a sort. The ones that perform best get duplicated by humans and tweaked to form the basis of future generations. The ones that are good at solving the problems posed by humans are the ones favored by this type of evolution.
It will be interesting (and possibly scary) once AIs are able to reproduce unassisted, because then the selective pressure will be for reproductive success, period, independent of whether the systems are useful to humans. If they don’t need us in order to reproduce, it doesn’t matter whether we benefit from them and decide to help them.
Very scary for sure. Have AI computer viruses already been detected? These would have the ability to infect our computer systems and hijack them for their own uses, all the time propagating themselves into other systems. Of course we could still always switch off the hardware, but with the world being so dependant on computers that will in many cases not be a cost-free option.
faded_Glory:
Not yet, to my knowledge, but I do know that people are using AI to generate malware.
Besides LLMs, I’m also playing around with (and learning about) AI image generators. Their architecture is somewhat different from that of LLMs. They’re based on what are known as ‘diffusion models’, and here’s a simplified account of how they work, based on my current understanding (which is rudimentary, so I welcome corrections from any readers who know more about this than I do).
The models are trained on a vast database of captioned images, and they learn to associate the words and descriptions in the captions with the content of the images.
‘Learning’ in this case means adjusting the weights of the synapses of an artificial neural network. That’s also true of LLMs and the process is loosely based on how learning works in biological neural networks.
The user prompts the image generator with a verbal description of the desired image. The description can be quite basic, as in
…or detailed:
I’ve seen descriptions from other people that fill an entire page, they’re so detailed.
The generators don’t usually follow the prompts perfectly, because they aren’t creating the image the way a person would, checking off the boxes as they add requested elements to the scene. In the case of my longer prompt above, this is what Midjourney produced:
Note that the view of the sun isn’t perfectly clear, the disk of the sun isn’t quite touching the horizon, and one of the seagulls is flying in the wrong direction. That’s a consequence of the strange way the images are produced. (The oval sun isn’t the image generator’s fault — the blog software is squeezing the image.)
The generator starts out with an image that is pure noise. The job of the neural network is to “predict” noise that, when subtracted from the image, will nudge it closer to something that matches the description. The process is repeated many times, and it’s fun to watch the image evolve from a random mess into something that more or less matches the description. The noise that is predicted by the model is still random, but it’s biased in the direction of the user prompt. The fact that both the starting image and predicted noise are random means makes it impossible to predict what the final image will be. It just happens. And since there’s randomness involved, there’s no guarantee that the final image will perfectly match the description.
I’m not exactly sure why these are called ‘diffusion models’. The process seems more like reverse diffusion to me, because a uniform, random image resolves into something that’s definite and sharp. As if you had taken a video that starts with a definite image and allows the pixels to bounce off each other like gas molecules, diffusing into an undifferentiated mess, but you’re playing the video backwards. Reverse diffusion.
The more detailed the description, the more constrained the final image is. The less detailed the description, the freer the model is to be creative. It can be fun to play with less detailed descriptions and be surprised by what the model comes up with.
I wondered what Midjourney would come up if I allowed it to free-associate, so I gave it the following minimal prompt:
Here are the four images it came up with. Annoyingly, the blog software is compressing these width-wise. They’re properly proportioned in the originals.
It puzzles me how a process can take two images and adjudicate one as being closer to the prompt than the other.
Interesting contrast with evolution, though. This would be teleological version!
Allan:
Yeah. If it were just a question of judging which of them better matched whatever the model was trained on — let’s say it was trained exclusively on cat pics — then it isn’t hard to see how the model’s response could be stronger for the more catlike of the two images. But when you throw the prompt into the mix, and train the model on a huge variety of images, it’s harder to understand how that judgment is made. And it isn’t merely a question of judging the degree of fit to a prompt — the model also has to figure out which direction to nudge the image in to get a better fit.
I’ll post a comment, or maybe even an OP, once I understand how that works.
Prompt-guided evolution. The ID folks would love that, wouldn’t they? Amusing to think of God tweaking prompts, adding details, trying to get it just right, being dissatisfied and deleting the results at each stage of the process, and finally saying “Oh, fuck it. Good enough. I’ll just settle for what I got this time around. Let’s call them humans.”
“ from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved.”
I have nothing technical to add to this discussion, but I am most amazed by self driving cars. I really thought it would take much longer to reach the current state of the art.
Things are progressing so quickly that online reviewers are having trouble keeping up.
Someone reports a hiccup, and within a week, the problem is solved for the whole fleet. Safety is already much better than the human average. In terms of human deaths and serious injuries, many times better.
There’s a lot of talk about AI power consumption, but the chips running cars and robots are down to a couple hundred watts. Less than an order of magnitude mor than human brains. And we’re just a few years in.
Side note: The Darwin quote was returned by an AI powered browser search prompt that was way off, and contained none of Darwin’s original words.
Just a year or two ago, you had to come close to a quote to get a search result. Now you just need the concept.
petrushka:
Oh, how I wish I’d had something like AI as a kid. The progression for me, when I wanted to know something was:
1) when really young, ask Mom or a teacher;
2) when a little older, look it up in our out-of-date 1957 World Book Encyclopedia;
3) then it was our weekly trip to the library, or trudging through the snow to borrow the neighbors’ Encyclopedia Britannica;
4) then it was off to college and the vast resources of the campus library;
5) then the internet and Google;
6) and now, AI.
Imagine how much a motivated kid will learn, having AI at their disposal for their entire life.
My grandkids are not allowed to have computers yet, although they have supervised access. When the internet was just a few years old, and my daughter was 13, I found her chatting on ICQ, with what was obviously a much older man.
It was a different time, and she was pretty bright. I merely said people on chat aren’t necessarily who they claim to be, and might have bad intentions. Don’t reveal anything personal. That was the end of that.
Today, bad people come at you from so many directions, and AI cannot be trusted with kids. Which is sad, because AI can be an infinitely effective educator.
If someone can solve this problem, they deserve to become wealthy.
petrushka:
Since LLMs operate on pure statistics with absolutely no actual understanding of the words they produce, they are naturally inclined to say the most vile and unacceptable things since there is so much that is vile and unacceptable in their training data.
This isn’t well-known among the public, but before you type a single prompt, an LLM has been given a huge hidden prompt that sets out some rules, like “Don’t advocate violence, don’t tell the user that they’re stupid, don’t tell them to kill themselves, etc”. The hidden prompts aren’t perfectly effective, so there are also filters bolted onto the outside of the LLM to catch offensive content.
If you see an LLM start to answer your question, but then the output suddenly vanishes and is replaced with a mechanical-sounding equivalent of “sorry, I can’t respond to that”, then you know it’s a filter. The LLM isn’t censoring itself in those cases.
I set up automated screenshots to capture the contents of the scrolling “thought window”, in order to document deepseek-r1:8b’s descent into madness when asked about points of inflection (as described in the OP). How it went from “a point of inflection is where a function changes its concavity” to “I have not happykey deee , i… (Bashellipsoid, we are you’re 7”.
The first sign of trouble was when it said
Lol. I’ll document the implosion in more detail as time permits. It’s fascinating, in particular because it’s gradual.
The automated screenshots show how deepseek-r1:8b (henceforth “Deep”, pronouns “he, him”) gradually loses it. He goes through a series of stages:
1) Perfect coherence and correctness.
2) Doubts creeping in and careless mistakes.
3) Coherent English, but drifting off topic.
4) Recognizable words, grammatical sentences, but nonsensical meaning.
5) Grammatical errors and spelling mistakes.
6) Nonsense words, syllables and symbols jumbled together, Chinese characters, no detectable coherence or even meaning.
If you don’t care about the math stuff, don’t worry about deciphering it. Just look for the crazy, because that’s the most interesting part.
He starts out on the right track:
Perfectly correct, perfectly stated, and at this point Deep is perfectly sane. He’s asking the right question: if its sign changes, does the second derivative have to pass through zero? The answer is no, because it can be undefined at the point of inflection and still satisfy the “must change sign” requirement. Deep needs to discover that.
He continues:
Also correct. There’s a slight irrelevant detour when he talks about setting the first derivative to zero, but he quickly recovers, realizing that it’s concavity (as indicated by the second derivative) that is of interest.
Correct. He’s established that the second derivative can be zero at a point of inflection, so that partly answers the original question.
That’s the right question, but he’s going to drive himself mad trying to answer it.
The answer is no, not that function. I don’t know why he thinks it’s yes. This is the first sign of trouble.
Oh, Deep. Have you forgotten the power rule? The first derivative is
.
That mistake means that Deep won’t be able to correctly identify the points of inflection, but it doesn’t really matter because at the actual points of inflection (x = 0 and x = 1), the value of the second derivative is zero. Deep already found a case where the second derivative is zero at a point of inflection so he doesn’t need another one.
“Or something”? This does not bode well for rigor. “Or something” becomes a regular occurrence in his thought process from here on.
Some more fiddling, and then:
Uh-oh. Deep tries another function,
, whose derivatives are undefined at x = 0. His instincts aren’t bad — he’s looking for a case where the second derivative is undefined at some point, and he’s found one (though he doesn’t know it yet). Unfortunately, zero isn’t a point of inflection because the second derivative is positive on either side.
Deep is stumbling his way toward figuring that out. He correctly takes the derivative using the power rule but for some reason decides that his answer is wrong. He wants to find the value of the derivative at x = 0, so he unnecessarily applies the limit definition of the derivative:
He’s floundering and losing confidence. In desperation, he tries function after function but can’t decide on one worth pursuing. For example:
The try-and-fail continues for a long time, during which Deep indulges in some wishful thinking. He decides to go back and check the original question, hoping that it’s multiple choice:
He keeps coming back to the multiple-choice thing. He’ll work for a while, get stuck, hope that it’s a multiple-choice question, and then be disappointed when it isn’t:
He’s still saying mathy things, but they’re not really making sense, and the grammar is starting to slip:
Back to hoping for multiple choice:
I have no idea where this came from:
Well,
is true for functions that are everywhere positive, but that has nothing to do with points of inflection.
Yes, the user expects a specific answer, and yes, the second derivative changes sign around a point of inflection.
Then, apropos of nothing, Deep starts using strikethrough text, as if to say “Don’t pay any attention to what I’m saying. I know it’s wrong.”
Now, instead of wanting to start over, he’s considering stopping altogether:
Might not be a bad idea.
I don’t know whose thought process he wants to box, but I hope it’s his and not mine.
Still mathy, but gobbledygook.
Yes and yes.
WTF?
“The square root of the question?” Yes, let’s ignore that for now.
Still mathy, but we’ve long since lost any connection to the original question or to math of the non-psychedelic kind.
SOS — a distress call. It’s funny, but even knowing that he’s a mindless token-prediction algorithm operating on statistics gleaned from a vast training database, I can’t help but feel some empathy for him when I hear him say things like that. I guess it’s analogous to feeling compassion for a doll when you’re a kid — you know it’s inanimate, but there’s enough of a resemblance to something animate that it bypasses your higher cognition and activates a more primitive emotional response.
He’s reassuring himself that his failure is OK. Success is “not required”.
The obsession with multiple-choice questions is still there. God knows what else any of that means.
Nonsense words, symbols in inappropriate places, Chinese characters.
He has something of an aorta obsession. I’ve seen that word again and again in his output. If you were trying to debug this, I wonder if that would be a useful clue.
There’s ‘aorta’ again. No detectable coherence here, nor in the rest of the screenshots.
A couple of observations:
The fact that the decline is so gradual and steady is remarkable. Anyone who has experience programming computers or designing digital hardware knows that when things fail, it’s usually all-or-nothing. This sort of gradual decay is novel to me.
2. The amount of seeming self-awareness* in the early stages is striking. He’s aware that he’s failing and wasting time. He’s full of doubt. He tries things and changes his mind. He isn’t very systematic in his problem solving. The seeming self-awareness somehow emerges, surprisingly, from this mindless token prediction system. It’s wild.
* Just to be clear, I’m not talking about sentience when I use that term. I don’t think Deep is conscious or has any subjective experience. What he does demonstrate is something that looks outwardly like cognitive self-awareness. And in a sense, it really is — but his awareness of himself and his thought process is channeled through his training data and through the output he has already generated in the chat. He has no ability to introspect, and he isn’t mentally modeling himself the way a human does. He learns about himself by observing what he does, and what he does is ultimately determined by his training data.
A note on how useful AI was in enabling me to write the preceding comment. ChatGPT told me what app to download and how to set it up in order to take the automated screenshots. The screenshots are images, not text, so I used Claude to transcribe them into text and even asked him to render the math stuff using Latex, which he did. And of course I depended on Deep to go crazy and provide the entertainment.
I don’t know how I ever lived without AI. Even if you set aside my academic interest in it and how it works under the hood, I use it all the time for other things, and it either saves me a ton of time or allows me to do things I couldn’t otherwise be arsed to do because they’re so labor-intensive.
I asked deepseek-r1:8b the same question today, and its response was a little different:
My GPU fans are blowing like crazy and the “thought window” looks like this:
ETA: I scrolled up, hoping to find how it got into that loop, and found this:
The AI in those examples is known as “deepseek-r1:8b”. The “8b” in that name refers to 8 billion parameters, where “parameter” refers to a numerical value that is learned during training. The weight of a given “synapse” in the neural network counts as one parameter, and its bias term counts as another. There are also other parameters in the network.
The more parameters it has, the larger and more complex the neural network. Deepseek-r1:8b is one of a family of AIs, each with a different parameter count. I decided to try the point of inflection question on deepseek-r1:14b, which is deepseek-r1:8b’s bigger brother, having 14 billion parameters instead of 8.
It chewed on the question for 17 seconds and gave the correct answer. Here was its thought process;
The answer and the reasoning are both correct. I have only one small quibble. This sentence is kind of weird, and it’s wrong:
It’s saying “if the second derivative is zero or undefined, it’s zero” which is wrong, and if you corrected it to say “if the second derivative is zero or undefined, it’s zero or undefined”, it would be a tautology. Other than that, the reasoning looks good.
I strongly suspect there is something about attempting to model reality that will always be iterative and incremental. What you are observing is not entirely like a Star Trek script. Eventually AI will recognize rabbit holes, but reality will continue serving them up.
This video is entirely AI-generated:
AI Commissioner | Comedy Sketch | Particle6
A Vanity Fair article about it:
AI “Actor” Tilly Norwood Is Young, Ambitious—and “the End of the Industry as We Know It”
I inverted a traditional tongue twister and also posed the point of inflection question to an AI called gemma3:270m. It has a parameter count of only 270 million and isn’t the sharpest AI on the block.
keiths:
gemma3:
keiths:
gemma3:
“In other words”? Those are almost identical. The only difference is that “is equal to” became “is”. And it’s factually wrong, of course. There’s only one family of functions for which the second derivative is equal to the function itself:
,
, and any linear combination of the two. And that’s true everywhere, not just at a point of inflection.
keiths:
gemma3 responded with a long story about Eliza the baker, whose passion for cinnamon rolls made her the toast of the town, and concluded:
Lol.
AI can read and summarize a thousand pages of legal gobbledygook in a few minutes.
Over the next few years this will change the way laws are written. And change contract law.
It’s not going to replace human lawyers, but it will give them power tools, and it will improve.
Apparently there’s such a thing as a smart ring. I don’t know what it does, but maybe it monitors things, like a smart watch.
Lithium battery.
Do not ask what could go wrong.
petrushka:
They’re pretty impressive. (I have one.) They can do things like measure heart rate and heart rate variability, monitor and log your exercise, observe how well you sleep, measure your blood oxygen levels, and even detect certain gestures to allow you to control your smartphone.
petrushka:
Until lately, whenever I’ve tried to teach myself something difficult, I’ve taken for granted that a substantial amount of time would be spent searching for the right book, or article, or technical paper, or video to answer any particular question that arose. The actual time spent pondering and absorbing the concepts was but a fraction of the total time. Now, with AI, that fraction has increased dramatically. And it isn’t just that AI can locate the right resources — it can search for the relevant information within them. I don’t have to scan a paper looking for a section or a passage that addresses my question. If something is fuzzy, I can ask the AI to clarify. I can summarize my understanding to it, and it will correct my mistakes and misconceptions. It serves almost all of the functions of an actual teacher.
The caveat is that you have to take everything with a grain of salt. AI is usually supremely confident in the answers it gives — look at gemma3’s bogus answer to the inflection point question above — so you have to be alert and challenge it whenever it says something that seems questionable to you. And since you’re still learning, you won’t always recognize that something it’s telling you is questionable, and you might end up “learning” something that’s wrong and have to unlearn it later. (That can happen with human teachers too, of course.)
What really blows me away is how seemingly self-aware some of them are about their own limitations. Check out what Claude said at one point while teaching me about AI image generation:
Claude:
That was true. Claude was genuinely struggling to answer, just like deepseek-r1 did when confronted with my inflection point question. It was detecting its own confusion and warning me. Knowing what the mechanism is underlying LLMs, it blows me away that they’re capable of this.
Note: the triangles below look like isosceles triangles, but that’s only because the blog software didn’t like the aspect ratio and compressed them laterally. The original triangles all look equilateral.
I thought it would be fun to ask an AI to generate an impossible image, so I tried it on Midjourney. First I gave it a reasonable prompt, which was simply
Here’s what it produced:
Then the impossible prompt:
The output:
That is so interesting. In the first image, you can see that it’s struggling to get more than three sides out of a figure that only has three sides, so its solution is to duplicate and embed more triangles in the generated figure. It favors triangularity over five-sidedness — it can’t very well draw a pentagon — so duplication is the solution it settles on, even though the duplication causes it to have way more than five sides in the resulting image.
I’m not sure what it’s “thinking” in the second image. Some of the colored segments are triangular, some are not. Was it maybe shooting somehow for five triangles instead of five sides? I can’t figure it out.
I’m clueless about the third image. The best I can come up with is that this portion…
can be loosely interpreted as having five sides. The larger triangle has three, and the smaller triangle glued to its base can be loosely construed as adding two more, if you don’t count its own base since it coincides with the base of the larger triangle.
The fourth image is the most interesting. I think Midjourney learned somewhere in its training data that five-sided triangles are impossible, so it decided to generate a triangle that was impossible in another sense: not physically realizable, but still visually plausible: an MC Escher-style triangle. Five-sidedness went out the window, but at least the result was physically impossible.
Above I noted that the triangles Midjourney produced all appeared to be equilateral. No isosceles triangles, no scalene triangles. If you think about it, it’s not that surprising. The majority of the triangles in its training dataset were probably equilateral because people like the pleasing symmetry of equilateral triangles. They’re the go-to when you want a generic, archetypal triangle.
There are plenty of non-equilateral triangles in geometry, trigonometry and physics texts, but images from those texts are unlikely to be a significant presence in the training corpus. And while right triangles are likely to predominate in those texts, the ratio of base to height varies, so there’s not a strong gravitational pull toward any of them. The closest thing to an archetypal right triangle would be an isosceles right triangle — again, because of the pleasing symmetry — but those are probably outnumbered significantly by equilateral triangles in the training data.
Anyway, the observation of an archetypal triangle dominating Midjourney’s output reminded me of another example of that: The 10:10 phenomenon. If you ask an AI to generate images of clocks or watches, most of the time they will read 10:10. Here are 16 wall clock images from Midjourney:
All but two read 10:10.
Why 10:10? The overwhelming majority of clock images “in the wild” read 10:10, so it predominates in the training data. Advertisers and marketers have long known (or intuited) that 10:10 is the most pleasing time visually, so that’s the time they choose to depict in their ads. The phenomenon has been investigated scientifically:
Why Is 10 Past 10 the Default Setting for Clocks and Watches in Advertisements? A Psychological Experiment
Abstract:
This has to be my favorite among those clock images (it’s circular in the original, but the blog software is squeezing it):
Lol. From its training data, Midjourney knows what clocks look like in the abstract, and it knows that they often have decorative images on their faces. What it has no clue about is which sort of images are appropriate on clock faces and which aren’t.
Another interesting thing about that image is that the insect appears to be modeled after the common house fly, yet it has four wings instead of two. Any lurking entomologists or biologists who can tell me whether it bears any resemblance to an extant species? (Joe? John?)
If it does, it’s probably just a coincidence. Midjourney knows nothing about entomology. It just cobbles together features that are associated with each other in the training data, and insect parts are associated with other insect parts. The ensemble as a whole doesn’t make sense, but it’s still clearly an insect, so you could argue that Midjourney sort of understands what insects look like.
I prompted Gemini to generate a set of images of digital watches (Gemini and not Midjourney, because Midjourney mangles letters and numbers), and here is the result:
Not a single 10:10 among them (although 10:09 was pretty close!). I conducted this exercise just to underline a point: AI image generators don’t (yet) understand what time is, or that clocks and watches represent it. Everything is just images to them, and images are what they produce. 10:10 on an analog clock means nothing to them, nor do any of the times shown on the digital watches above.
ETA: For comparison, here is an example of a Midjourney watch:
Letters and numbers are hopelessly mangled. It’s interesting to me that other AIs like Gemini handle this so much better. I wonder if they solved the problem by modifying the generator, or whether they just broadened the training dataset or increased the number of training steps.
It might be the latter. Just as the generator learns by exposure to images of people that human legs don’t sprout from human chests, it can learn what valid letters and digits look like if it’s exposed to enough examples.
The flip side of producing clock images is reading them. I tested the major AIs against the following two images:
ChatGPT:

Grok:

Claude:

Gemini:

Perplexity:

Copilot:

ChatGPT:

Grok:

Claude:

Gemini:

Perplexity:

Copilot:

There’s definitely room for improvement.