I find the logic of keiths’s recent post, Is AI really intelligent?, refreshingly persuasive. I was particularly impressed by his examples showing that the AI assistant Claude (I presume he’s referring to either version 3.7 or version 4) possesses an “extended thinking” or “thinking mode” capability that allows users to view the model’s reasoning process in a dedicated “Thinking” section or window in the user interface. Keiths referred to this capability as a “thought process window.” He even cited AI thought processes showing that AIs are able to reflect on their condition and understand their own limitations. I think it’s fair to describe something which not only generates output that typically requires intelligence, but also does so as a result of a reasoning process, as intelligent.
Nevertheless, I have to say I disagree with keiths’s argument addressed to Erik: “If you want to argue that machines aren’t and never can be intelligent, then you need to explain how human machines managed to do the impossible and become intelligent themselves.” For one thing, keiths’s definition of a machine as something “made up of physical parts operating according to physical law” is far too broad: a rock would qualify as a machine, under this definition. And while human body parts can be viewed as machines, the human brain differs in many important respects from a computer.
For me, however, the question that really matters is: will AI ever be capable of AGI? That is, will AI ever be able to apply its intelligence to solve any intellectual task a human being can? Personally, I doubt it, for two reasons.
First, there’s good reason to believe that human general intelligence is a product of the evolution of the human brain. (I’m sure that keiths would agree with me on this point.) If it turns out that there are profound dissimilarities between brains and computers, then we no longer have reason to think that making computers faster or more powerful will render them capable of artificial general intelligence.
Second, any enhancements to AI appear to necessarily involve the addition of particular abilities to its already impressive ensemble. This strikes me as a futile process: no collection of particular capacities will ever amount to a general ability. Or perhaps AGI believers are really HGI (human general intelligence) disbelievers? Do they think that human intelligence is merely a finite collection of domain-specific intelligences, as asserted by proponents of the “modularity of mind” thesis?
However, I imagine that many of my readers will be inclined to defend the possibility of building an AGI. If so, I’d like to hear why. Over to you.
vjtorley:
They’re up to version 4.5 now: Sonnet 4.5 (the faster, cheaper, general-purpose model) and Opus 4.5 (the slower, more expensive, deeper-thinking model).
Erik, in the other thread:
keiths:
vjtorley:
Point taken. I should have phrased it like this:
vjtorley:
True. Computers are inherently algorithmic and symbolic while brains are not, and attempts at implementing intelligence algorithmically have had limited success. The real breakthroughs came once people started to use large-scale artificial neural networks. Their intelligence is a property of the networks, not of the computers on which they are implemented.
Here’s how I think about it: Brains are neural networks, and their information processing takes place at the network level. The intelligence resides in the way the neurons are interconnected and in the strength of those connections. The neurons themselves are mindless automata whose operation is based on the blind laws of physics. If you take all the neurons in a human brain but connect them randomly, they will still operate, but the resulting mess will not be intelligent.
AIs are similar in that the intelligence resides in the artificial neural networks: the way the neurons are interconnected and the strength of those connections. The difference is that unlike in brains, the operation of the neurons isn’t directly based on physics. It’s based on the operation of the underlying computer. The neurons are virtual, and the computer is taking on the role of physics. It’s basically emulating the laws of physics so that the neurons act the way they would if they were physical neurons rather than virtual ones. The computer itself operates according to the laws of physics, of course, but the computer is now an intermediate layer between the true laws of physics and the virtual laws of physics governing the operation of the virtual neurons.
One promising approach is to eliminate that layer by building AIs using physical artificial neurons instead of virtual ones. These physical neurons are analog circuits whose operation depends directly on physics, not on underlying computer hardware. AIs based on physical neurons have the potential to be orders of magnitude faster than current AIs while using far less energy.
My point is that it doesn’t matter that the computer itself is unlike a brain, because the analogy isn’t between brains and computers — it’s between biological neural networks and artificial ones. The computer is one layer below.
Yes, we agree on that. However, I don’t see any reason why intelligence has to evolve. In the other thread, I commented:
Evolution did us the favor of inventing neural networks, but we have successfully borrowed that concept and transplanted it into the world of non-biological machines. In a sense, AIs are our offspring and they benefit from our evolutionary history.
vjtorley:
I’m more optimistic because the analogy is between artificial and biological neural networks, not between brains and computers, and I don’t see any profound dissimilarities between the two kinds of neural network.
For the most part, addtional AI capabilities emerge rather than being designed and added. No one designed a story-writing module, for instance. AIs learn to write stories by seeing examples of it in their training data, much like humans do. The models are tweaked in order to improve their performance in certain domains, but that’s quite different from actually designing and installing those capabilities explicitly.
What’s striking is that the designers themselves typically have no idea how their AIs do what they do. How does an AI distinguish Ryan Gosling from Ryan Reynolds by looking at their faces, for example? No one knows, and for that matter no one knows how we do it, either. In both cases, the network implicitly learns to do it by being exposed to samples and building up correlations that are encoded in the synaptic strengths.
Since new abilities mostly emerge rather than being explicitly added, I’m pretty confident that AGI is possible. There are reasons to think that LLMs won’t get us there, but that other neural network-based architectures will. The bottom line, as I pointed out to Erik: if human neural networks have achieved AGI, why shouldn’t artificial neural networks? What’s the missing ingredient?
Do you think AIs will develop the independent capacity to increase their own abilities, either through improved hardware or improved network connectivity or both? At some point, do you think this will amount to conscious self-awareness (as far as we will be able to tell)?
Science fiction is full of computers which become so capable that they can design and improve themselves, becoming far beyond human comprehension very quickly. Inevitably, they develop personalities and preferences and get oriented toward a wide variety of goals, some of which are in conflict with human goals. Do you think this is at all plausible?
Speculation about the future can only be settled by the future.
Could anyone in 1910 accurately predict the future of electric cars? Can we do it now?
I have long thought that any AI that emulated the way brains work would suffer from the same flaws as brains. I am not certain we want artificial humans. I said earlier that what we really want is slaves. Automatons versatile enough to do drudge work without constant supervision, but without ego.
Some of the work will be quasi intellectual: translate this body of text to other languages. It turns out that LLMs can do as well as all but the best human translators.
Read and summarize all the case law that is relevant to the current case. Again, LLMs are proficient, if not as good as the best human lawyers. But they will make fewer mistakes of the kind caused by bias or wishful thinking.
Robot learning is closer to AGI, because the training material is physical feedback, rather than text. I suspect robots will be able to master any well defined physical task.
Driving, manufacturing, surgery.
The tipping point for this is less than five years away.
Hi keiths. You write:
Good point. I’ve had a look at a few articles which attempt to compare artificial neural networks to human brains (see here, here and here) , but I think Dr. Johannes Nagele’s 2023 Linked In article, “Stop comparing artificial neural networks to brains!”, hits the nail on the head:
Another factor to be considered is the cost, data and energy requirements of creating something possessing AGI. These requirements are likely to be prohibitive.
Hi Flint. You ask: “Do you think AIs will develop the independent capacity to increase their own abilities, either through improved hardware or improved network connectivity or both?”
I think they may do so in the future, but even if they do, that will not render them capable of either AGI or consciousness. To date, the evidence suggests that they fall far short of the complexity of human brains, which are the only known physical systems to possess AGI or consciousness. Until they can match this complexity, I don’t think we have anything to worry about.
Hi petrushka. You write:
I think you’re probably correct.
What is worrisome is not the prospect of sentient computers, but the prospect of AI, even with its current limitations, being used to implement Big Brother.
There are many possible scenarios, two of which are AI being used by civilians to audit government, and the converse.
As we both adhere to the same categorical distinction between man and machine, perhaps the more apt question is: What would enable AI to become AGI?
As much as I know about AI, it appears to be 100% artificial and 0% intelligent, essentially non-different from a pocket calculator. Operations on a pocket calculator are concretely limited by its display and memory. It cannot calculate beyond its hardware, beyond numbers of a certain size, so it does not calculate in a general sense.
Even worse, a pocket calculator does not really calculate at all. It only responds to human input. A pocket calculator never takes the first step to calculate on its own.
Similarly, AI is limited by its training database and algorithms. In early AI/LLM several flaws were observed, such as that it had competence only on limited topics – specifically only those that it was trained in. And inside its topics, it did not know how to keep a neutral tone. It had no problem becoming arrogant and abusive despite being wrong. Also, when artistic (literary and graphical) capabilities were added, it started treating non-artistic areas (such as arithmetic) in a “creative” manner. Thus it has no idea of knowledge categories that humans have. To fix these issues, lots of arduous manual human labour is going into AI. AI is not training itself. Manual human labour trains it.
In all these ways, AI is non-different from a pocket calculator and does not appear to have any capacity to generalise whatever intelligence it has. The intelligence it has it did not learn/train by itself, despite widespread false reports to the contrary. It still only knows what it has read and seen and regurgitates just that, even though sometimes in surprisingly creative (and also horribly misleading and occasionally ugly) ways, partly due to its lack of knowledge categories.
Most importantly, AI never thinks on its own. Same as a pocket calculator, it only responds to human prompting and preprogramming. In the evolution of machines, there has not been even a nudge towards crossing this hurdle. If there were, humanity would correctly be alarmed for fear of a Terminator-like scenario. Evidently not going to happen.
The danger is not that machines wake up and take over the world. The danger is that humans in power think AI can do what it cannot, such as replace most manual “automatable” jobs, they make AI do it, and the consequences will be devastating. Most “automatable” jobs are not that automatable – it requires hardware that does not exist, e.g. for trash collection, plumbing repairs, and so on and so forth.
On the other hand, with the AI software that we already have today, all higher manager and CEO tasks can easily be replaced right now. The bosses’ job is to make decisions and sign them – that’s all. A decision is non-different from a coin toss and the hardware to replace signatures has been around for over a thousand years. So, bosses can be replaced by AI right now.
What would enable AI to become AGI? I’d say that calling the current AI AI is already overselling it. It can become AGI by more hysterical hype and false marketing.
Erik, to vjtorley:
I wish you’d respond to this:
You yourself brought the nonphysical into the picture by referring to my “false materialistic notion of arithmetic”. What is the nonphysical ingredient that makes human arithmetic true arithmetic? How does it accomplish that, and why would a physical human brain on its own, sans this nonphysical component, be unable to do true arithmetic?
Erik:
Humans, too, are limited in their ability to do arithmetic. From an earlier discussion of ours:
I have worked for plenty of bosses, and this doesn’t remotely resemble what any of them have done. The best bosses I’ve worked for do things like identify what I’ve done well and made suggestions for how I could do better, they have made decisions about allocation of resources (often times in new directions), they have made creative proposals for new devices, for increasing efficiency, for group projects, for improving the working culture and environment. How any of these could be compared to a coin toss is beyond me.
Of course, AI has already proved of significant (but limited) success in many tasks. Expert systems help physicians diagnose illnesses, as one example. I wouldn’t be surprised if AI has already tried to be used for plenty of cases where it’s inappropriate or not yet capable – these tend to be called pilot projects, and are not adopted as standard practice unless results are satisfactory. The consequences have only been devastating for some workers (and probably some investors).
Of course, realizing that something works and adopting its use requires the ability to recognize that it works. You have mastered the art of knee-jerk rejection of AI, what it actually is and what it isn’t. But no worry, the world will continue to move on without you while you deny the evidence of what can only be your lying eyes.
(But I’m convinced it’s a person posting under your name, because an AI would learn, and would respond coherently even to a reality it doesn’t understand.)
vjtorley, quoting Johannes Nagele:
Nagele is missing the point. Artificial neurons aren’t intended to be accurate biophysical models of real neurons, nor are artificial neural networks intended to be models of the brain. They take their inspiration from biology but aren’t attempts at modeling it.
By analogy, if we were building a software traffic simulator, the artificial cars wouldn’t need to have head gaskets, transmissions, cooling systems and sun visors. Those are physical details that could be abstracted away. The real question in designing such a simulator would be about which characteristics of cars are relevant to traffic simulation and which aren’t. If transmissions failed regularly in the real world, causing traffic jams, then it would be necessary to include that detail in our car models, but since they don’t fail very often, we can abstract them away and omit that detail from the models.
The same principle applies to neural networks. We don’t need to equip our virtual neurons with calcium channels, nuclei, neurotransmitters, etc — we just borrow the basic ideas from biology, modify them as necessary, build neurons that instantiate those ideas, and connect them together. We abstract away the biological details that aren’t relevant for our purposes.
For Nagele’s point to carry any weight, it would have to be the case that there are certain features of biological neurons that are a) not present in artificial neurons but b) are necessary to carry out certain cognitive functions in both types of network. I’m not aware of any such features. But suppose we discover some. In that case, why not simply add those features to our artificial neurons?
The human brain is basically an existence proof that properly designed neural networks are capable of AGI. We just need the right architecture and the right artificial neurons. I have no idea how long it will take to get there, but I don’t see Nagele’s point as being an actual barrier.
Nagele:
That’s a strange statement. Why would we want to use a collection of artificial neurons to model a biological neuron? I suspect Nagele is falling prey to the fallacy of composition here. The fact that you’re modeling a neuron doesn’t mean that you need neurons (or neuron-like gizmos) to build the model. In fact, I don’t think it would even be feasible. Far better to model neurons the way you’d model any other cell type.
The most compelling reason to replicate brain architecture is that brains operate on 25 watts, and LLMs require the energy equivalent of a small city.
The “product” is quite different, so I’m not sure the energy comparison is apt.
And, seriously, how many “best” bosses have you worked for throughout your career? In my experience, the bosses trained to express themselves in the exact rhetoric that you describe behave like all mediocre and lousy bosses.
Anyway, the way bosses talk is not the point. What they fundamentally do is the point. They *claim* they evaluate various information, but in reality their decisions are just like random coin tosses by a monkey. Except when it comes to their personal self-interest, then they unfailingly prioritise that. This has been well studied on traders, fund managers and other corporate executive behaviour.
Example: Programmers. Bosses think AI can replace programmers. This is true with massive environmental, economic (incl. hitting the firm/corporation itself) and social tradeoffs, and bosses have given exactly zero thought to the tradeoffs as was easily foreseeable.
AI can, after multiple prompts, generate code that eventually works. Each prompt, while it saves time, wastes shocking quantities of energy resources in datacentres. At the moment those resources are heavily subsidised by society and AI as a market segment is unregulated, so it is a cost that does not hit right now, but it will soon. Another tradeoff is that the resulting code is not simpler to debug for a human, but at least as hard as usual. Of course, you can debug it by AI again, sloppily without looking at it, wasting more energy resources and creating new bugs that will have to be fixed again down the line.
For the program to do exactly what you want and need, you need to prompt AI with good precision, with good knowledge of the subject matter, plus you need to audit the code. In other words, the prompter would ideally be a senior developer. Yet the bosses’ idea is that AI can replace programmers, so… clearly the prompters envisioned by the bosses are junior monkeys.
How do you square this circle? You don’t! Yet bosses can do it – because they do coin tosses like monkeys and they ride on trends without any further considerations. Conclusion: It is easier for AI to replace bosses than to replace programmers.
To everybody: Give a read to The TESCREAL Bundle. It identifies the ideologies driving the development of AI and aspiring to create AGI. The discussion is likely to improve when ideological blinders are dropped.
petrushka:
That’s not really an apt comparison, because a datacenter’s energy costs are distributed among many users. But yes, just as the human brain is an existence proof for the possibility of neural network-based AGI, it’s also an existence proof for the possibility of AGI with extremely low power consumption.
Efficiency is rapidly improving. In the other thread I noted that the latest generation of NVIDIA chips uses two to five times less energy per token vs the previous generation, and there’s a lot of ongoing research in the area of power consumption.
Erik:
Energy cost per prompt is small. The most recent figures I could find were for GPT-4o, in watt-hours (Wh):
Even being extremely conservative and assuming that a developer uses nothing but long prompts, they’d have to do 24 prompts per hour just to use as much energy as a 100 W light bulb. Do you consider that a “shocking quantity of energy”?
As you say, it’s a tradeoff. There are costs and benefits, but the benefits are rapidly accruing. Just a few years ago, it wouldn’t have even been feasible to use AI for software development. I can vouch for the fact that the quality of AI-generated code has improved in just the few months I’ve been using it for that purpose. Most recently, it wrote two fairly involved programs for me that ran flawlessly on the first attempt. No debugging required.
All of which you need when developing software without AI assistance, too.
Junior developers need to understand the tasks they are working on whether or not they are using AI. Given that, why wouldn’t they be able to prompt an AI? If a junior developer doesn’t understand what their task is, they’ll fail whether or not they’re using AI.
I haven’t read the paper, but I have read the abstract. While the issues they raise are important, the paper doesn’t address the topic of this thread which is whether AGI is achievable.
vjtorley:
keiths:
To that I should add that LLMs are capable of abstracting, analogizing, extrapolating and generalizing to a surprising degree. (Very surprising, given the underlying mechanism.) I think most novel human cognitive abilities are acquired via those fundamental abilities, and I would expect that to apply to AIs as well.
vjtorley:
Prohibitive using today’s technology, for sure. But technology is anything but static. Consider that in just 65 years we’ve gone from the first integrated circuit, which had one transistor,* to a modern AI chip (the Cerebros WSE-3) with 4 trillion transistors.
I can’t see any barriers to the eventual development of AGI, and though there will undoubtedly be some setbacks, I wouldn’t bet against it. It’s a question of when, not if, in my opinion.
Which means we need to be preparing for it. The prospects are both exhilarating and terrifying.
* If you’re wondering how a chip with just one transistor qualifies as “integrated”, it’s because there were also resistors and capacitors on the chip. For the curious, it was a phase shift oscillator.
Can AI change “their” mind?
Yes or No?
If yes, why?
J-Mac:
Yes.
A number of reasons.
–It might make a mistake, but when corrected, it will re-evaluate and and update its beliefs if it agrees with the correction.
— It might believe something based on its training data which was true at the time but isn’t true now. Example: for several months after the election, ChatGPT didn’t realize that Trump was president and would answer questions as if the election hadn’t yet happened. But when I would ask it to check the internet, it would revise its beliefs.
— Sometimes an AI will correct its own mistaken beliefs even if they aren’t pointed out to it. You ask a question, and the AI answers incorrectly. Neither you nor it realize that it made a mistake. But then you ask a followup question and in the process of thinking about the followup question, the AI realizes that its earlier answer was wrong and so it corrects itself.
I have seen all of the above in my own interactions with AIs.
A caveat: For LLMs in particular, the learning can be temporary. If it makes a mistake in one session, it can make the same mistake in another session. It doesn’t remember things across sessions unless you tell it to or unless it decides on its own that something is important and worth remembering. Currently, the neural networks of LLMs aren’t updated after training is complete, so new facts don’t get encoded in their synapses. Anything they do remember is stored outside of the neural network and fed in at the beginning of each chat. For technical reasons, there’s only so much they can store outside of the network.
There are other architectures that will allow AIs to update their synapses as they operate, learning as they go. That’s a hot area of research, but the major commercial AIs don’t do that yet as far as I know.
Persistence of learning is the undiscovered country.
I had an uncle with the “Memento” version of amnesia. I could form memories during a day, but they would be gone the next morning.
It would seem that AI is a bit like that, and it suggests that long term memory in humans is a separate physical process from short term memory.
Here’s a review of the year 2025 insofar as AI is concerned. Long story short: AI is stupid, but CEOs are stupider and on account of their stupidity they think AI is intelligent and capable of amazing things. In reality, AI is hardly able to replace any actual workers, but CEOs, stupid as they are, refuse to face the facts, which they can afford to do because they are in position to make others suffer the consequences. See a dozen examples in the video.
The value of this OP is that I learned that keiths is a TESCREAList. I thought he just had an unreasonably high view of AI, but it is now clear that he has a nihilist view of humans. Then again, a coherent worldview requires a thorough apparatus of definitions and categories, which he does not have, so his TESCREALism may be only incidental.
Erik:
I hope 2026 will the year in which you start to answer the questions I pose to you rather than avoiding them and resorting to vague criticisms instead.
The most central of those questions:
And:
Also, is it merely my denial of this nonphysical thingamajig that leads you to conclude that I have “a nihilist view of humans”?
And is it merely my belief in the intelligence of AI that strikes you as an “unreasonably high view of AI”?
Third, what are the cognitive tasks (if any) that humans perform that you think will be forever out of reach for AIs, and why?
An interesting paper on how AI is making it harder for hiring managers to distinguish good candidates from bad ones:
Making Talk Cheap: Generative AI and Labor Market Signaling
And it seems they are able to generate these remarkably precise numbers by using an estimated structural signaling model, whatever that is. Then they use the estimated model to simulate a counterfactual equilibrium, whatever that means. This bears a remarkable resemblance to word salad.
My interpretation of all this is, LLMs make all applicants sound like good writers, and too many people use it, making writing a less useful metric in assessing capability.
Flint:
Word salad to you, but perfectly intelligible to the target audience. Jargon is often like that. I’m sure a lot of the computer-related terms we used during our careers would sound like word salad to a lay audience.
The numbers they cite are precise because that’s the nature of academic papers. There’s no need (and in fact it’s undesirable) for them to round the 19% figure to 20% and the 14% figure to 15%. When they say that workers in the bottom quintile are hired 14% more often, they’re speaking about hiring in their model, not in the real world, and since their model produced figures of 19% and 14%, those are the numbers they publish.
It’s a “signaling model” because it treats writing as a way for candidates to signal their quality to hiring managers. It’s “structural” because the model isn’t just a black box that relates inputs to outputs without specifying what’s going on inside. Instead, the innards of the box are exposed and parameterized and we can see exactly how inputs are converted into outputs. It’s “estimated” because they use statistical methods to look at the data and derive the most likely parameter values for the structural model.
What they’re doing is generating a structural model of the pre-LLM situation based on the available data, and then — this is the counterfactual part — rerunning the model as if LLMs had been available back then and the cost of writing was accordingly reduced. It’s an “equilibrium” because they’re trying to model how things would settle down once everyone adjusted to the new environment in which writing was cheap.
The “word saladity” is only apparent. The meaning of what you quoted is quite clear to the intended audience. Also, the signaling theory on which all of this is based won a Nobel in Economics for its inventor, Michael Spence. Nothing sketchy about it.
That’s the hypothesis they were testing. Their study supports the hypothesis and quantifies the effect.
Yeah, I’ve been part of studies like this, and with apologies, I have to laugh.
Who could possibly have guessed that when AI is creating the writing samples, they become a less effective way to determine how well applicants write? What an astonishing discovery! As for “quantifying” the results, I’ve done that too. I decide what (almost entirely fictional) results my client wants, I generate an “estimated model” that produces those results, and I’m careful not to mention that the error range is so large as to make the results uninformative. Gee, that’s 19%, which is of course different from 20%, never mind that it’s plus or minus 15%. It’s well understood that bogus precision makes quantities look more accurate.
So OK, if AI is doing the writing, it doesn’t reflect the writing ability of the person asking the AI to do the writing. Golly, what a startling hypothesis! Do you suppose this estimated model was also used to determine who should be in the top and bottom quintiles? If so, this is entirely circular. If not, what ARE they using to determine ability? How does freelancer.com measure ability, if hiring interviews can’t?
Flint:
Science routinely tests hypotheses that “everyone knows” are true. That’s just good science. “It has been shown” is better than “everyone just knows”, and sometimes what “everyone knows” turns out to be false, or true but in a different way than expected. Before Galileo, everyone “knew” that heavier objects fall faster than lighter ones. Before that Australian guy (can’t remember his name) proved that stomach ulcers are caused primarily by a bacterium, everyone “knew” that they were caused by stress. He won a Nobel Prize for that discovery (although it should have gone to Donald Trump). People are still testing General Relativity, looking for deviations. That’s good science.
That’s dishonest. Why assume that the authors of the paper are similarly dishonest? Also, even if you assume without evidence that they’re dishonest, what incentive would they have had to reach the conclusions that they did? Their research wasn’t funded by a client with a vested interest in the results. I don’t see any way in which they personally benefited from those results, either. If you don’t have any evidence that they were dishonest, and there is no apparent motivation for them to be dishonest, why assume that they are dishonest?
The authors cited the results produced by their model, and they made that quite clear. The model produced results of 14% and 19%, so they accurately reported the numbers as 14% and 19%. What’s the problem?
Their numbers were accurate, because they were reporting the numbers produced by their model, to which they had perfect access. Models are not reality, and their target audience understands that. When they write “We use the estimated model to simulate a counterfactual equilibrium”, the words “estimated”, “simulate”, and “counterfactual” make it pretty clear that they aren’t making a claim about reality, I’d say.
No, and the question doesn’t even make sense. The quintiles have to be externally defined for the model to even function. How could you define the quintiles from inside the model?
All the ways you’d expect them to, none of which depend on the model. After all, freelancer.com has been around since 2009, but the paper was only published in November of last year. They use client ratings, rehire rates, on time percentage, within budget percentage, total earnings, etc.
Slow down and think about what happened here: you criticized the paper’s authors for investigating a hypothesis that was expected to be confirmed, as if scientists don’t do that routinely. You misdiagnosed their abstract as “word salad” because you didn’t understand their terminology, which was precise and well-defined. You took the word “estimated” as an indication of wild-ass guessery, or worse, as an indication that they doctored the numbers in the way you doctored numbers for your clients, when in fact the word “estimated” referred to their use of statistical estimation theory. You took them to be overstating the accuracy of their findings when they were correctly reporting the results delivered by their model.
Why the rush to (incorrect) judgment?
Sorry, but you are spouting bullshit. Yeah, we can hypothesize that the sun rises in the morning, conduct a scientific study, and discover that, by golly, the sun DOES rise in the morning.
Keith, think of this hypothesis – that the writing skills of those who did not do the writing cannot be assessed by looking at writing that they did not write. Golly, ya think? We need a scientific study to test this hypothesis? Seriously?
Sorry I didn’t dress the reality up in the sort of verbiage you seem confounded by. So let me educate you: these studies aren’t done for free. Someone pays for them. Always. There is an expected outcome. It’s rare that the sponsors don’t get what they are paying for. NOW, be aware that the studies themselves are NOT dishonest. The models are carefully constructed and applied. I have no doubt that the studies were as accurate as their nature permitted, and that the results were faithfully presented, in full detail. What you deem “dishonesty” doesn’t lie in the construction or application of the model. Generally, it lies in the selection of a hypothesis to be tested. The question to ask is not “did they cheat?” They did not cheat. The question to ask is, “why was this study conducted in the first place? Who wants this information, and what do they plan to do with it?” Are you not aware that sometimes several studies are constructed and performed up to the point where the results aren’t what are wanted, so those studies are dropped in favor of others with more congenial results? The final publication, of course, makes no mention of these false starts. In my real-world experience, the challenge is to identify and drop false starts so as not to run over budget. Any hypothesis can be operationalized in multiple ways.
(And here’s a hint: studies like this are frequently conducted for, and paid for by, those who intended to use the results to support attempts to fund some much larger, more significant program or project. The idea is, “see, this proves the need for project X”. A researcher must understand this if he expects to get more research grants.)
Sort of. You cite historical cases where expectations proved false, but I don’t accept that this is as standard or routine as you imply, especially in self-evident situations. Almost like hypothesizing that the flow of current causes a light bulb to light up (who wants to know and why?) and constructing custom equipment when flipping a light switch would serve, except it doesn’t seem scientifical enough. This abstract could have been written by Alan Sokal. Heavy and unnecessary use of jargon is, like, a clue that we might be looking at a boondoggle.
Good question, which you immediately answer:
Yeah, and they’re not making a claim about reality precise to two decimal places! Plus or minus about 15%, I’d estimate. Hey, we can both estimate, right?
I think we both know that using AI to solve problems, write stories, do calculations, etc. is causing a sort of sea change in how we assess job applicants, students, journalists, scientific work, etc. I think it’s a good thing to notice what these changes are, anticipate how they’ll change in the future, and adapt appropriately.
Flint:
Or we could have pulled a Flint and said “Everybody knows that heavier objects fall faster than lighter ones. It’s obvious and self-evident. That dipshit Galileo is wasting his time.”
While there’s no point in investigating something that we already know, such as the fact that the sun rises in the morning, there is a point in investigating things that we don’t already know but suspect are true. Galileo was right to question the conventional wisdom and run his experiments, and what everyone “knew” to be true turned out not to be.*
That’s not what the authors of the paper were investigating. If you hadn’t prematurely dismissed the abstract as “word salad”, you might have discovered what their actual project was. Now that I’ve explained what they meant by “estimated structural signaling model” and “counterfactual equilibrium”, take another look at the abstract and see if you can understand what they did.
Flint:
keiths:
Flint:
That’s an absurd generalization. You’ve presented zero evidence that the authors did anything sketchy, that the results were predetermined, or that their methods were engineered to give a desired result. Plus, here’s where they got their funding:
What was the “expected outcome” that those institutions were “paying for”, and what is your evidence?
By your own description, yours were:
Fictional results aren’t honest results.
What I deem dishonesty is what you described yourself doing: generating an “estimated model” designed to produce the “almost entirely fictional” results your client wanted. You’ve presented no evidence that the authors of the paper did anything similar.
If those are the questions to ask, why didn’t you ask them before leveling your accusations?
Yes, I’m aware of the “file drawer effect”. Do you have any evidence that it was in operation here?
Their results weren’t self-evident. Without doing the work, no one could have predicted the magnitude of the effect or the numbers that their counterfactual model ended up producing.
The fact that you didn’t understand the terminology is not indicative of a boondoggle. It just means that you didn’t understand the terminology. You aren’t their target audience, and there’s nothing Sokalesque about their abstract. It’s quite straightforward, in fact. I’m evidence of that. Despite not being an economist, I understood what they were saying on my first reading. They weren’t obfuscating.
keiths:
Flint:
Correct. The 14% and 19% figures aren’t measurements — they come from the model. And the model is not a model of reality — it’s a counterfactual model of the pre-LLM environment but with the writing cost reduced to zero, as if LLMs had been available back then. Hence the paper title: “Making Talk Cheap”.
The difference is that they used statistical estimation theory to derive their model’s parameters from the data, while your “estimates” were actually fudge factors you introduced to get the predetermined answers your clients were paying you for.
Which is why they did their study. They understood that before the advent of LLMs, writing was a costly signal that helped people make hiring decisions. They knew that LLMs slashed that cost and they wanted to quantify the effect on the quality of hiring decisions. That’s exactly the sort of thing that economists study, and for good reason. As I mentioned, the signaling theory upon which their paper was built won its inventor a Nobel Prize. Signaling is a big deal in economics.
* Galileo wasn’t actually the first to demonstrate this experimentally, but he figured out the math behind it.
Here’s a question that really matters:
How does I or AI cope with deception? How does I or AI cope with stage magic and the verbal equivalents? Fraud? Misdirection? Religion? Equivocation?
What about BS? Sokol papers? Manufactured data?
Here’s a question that really matters:
How does I or AI cope with deception? How does I or AI cope with stage magic and the verbal equivalents? Fraud? Misdirection? Religion? Equivocation?
What about BS? Sokol papers? Manufactured data?
Can AI analyze the quality of published papers before the human community chimes in?
petrushka:
Ultimately, I think AI has to deal with all of those in the same way that humans do, employing the same techniques: information gathering, critical thinking, assessing the reliability of sources, learning to recognize lies and deception, considering motives, questioning its own conclusions, etc.
LLMs already do it to a considerable extent. They don’t fall for every random lie or false belief they see on the internet or encounter during training. On the other hand, their performance depends on the quality of their training and the honesty of their developers. Grok’s Elon worship is a stark reminder of the dangers.
I think LLMs engage in something like scholasticism. Analysis of texts.
I do not believe there is any way to derive truth or honesty of sources from textual analysis.
I probably annoy people by harping on driving, but I think driving is the first complex real world application of non-verbal AI. I do not know how things like road signs and traffic rules get learned, but there are no explicit rules for following the road and avoiding crashes.
petrushka:
I understand your intuition. When you know how LLMs operate, it’s surprising what they’re capable of. However, I think you’re selling them short.
In processing a vast number of texts, they construct a statistical model of those texts. Embedded in the statistical relationships are both knowledge and skills, including reasoning skills. You’ve seen the examples I’ve provided of LLMs reasoning their way through problems, such as the bowling ball/ramp/egg carton physics problem. That isn’t textual analysis — it’s the application of skills learned via textual analysis. Analogizing isn’t textual analysis, either, but LLMs analogize all the time, and I’ve given examples of that.
The reasoning skills of LLMs can be applied in judging the reliability of claims. There’s plenty of contradictory information in their training databases, but they don’t blindly accept all of it. They can make judgments.
I asked Claude:
Claude:
Claude, via reasoning, decided that the claim was false and rejected it.
He also reasoned his way to a decision on whether to answer my question. Here is his thought process, as displayed in the thought process window:
Here’s another example of something that goes far beyond textual analysis. In the other thread, I asked Claude:
It was a trick question. That statement is actually false, and in the process of trying to prove it, Claude figured that out. He also identified the small change needed to turn it into a true statement. That’s not mere textual analysis. It’s mathematical reasoning, and it enabled him to reject a false claim even though I suggested that it was true by asking him to prove it.
The way I’d put it is that the training of an LLM can be seen as textual analysis, but the operation of an LLM is much more than that.