• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Master AI (Artificial Intelligence) Discussion/News Thread

From Nav Toor on Twitter (X):

"BREAKING: OpenAI published a paper proving that ChatGPT will always make things up.

Not sometimes. Not until the next update. Always. They proved it with math.

Even with perfect training data and unlimited computing power, AI models will still confidently tell you things that are completely false. This isn't a bug they're working on. It's baked into how these systems work at a fundamental level.

And their own numbers are brutal. OpenAI's o1 reasoning model hallucinates 16% of the time. Their newer o3 model? 33%. Their newest o4-mini? 48%. Nearly half of what their most recent model tells you could be fabricated. The "smarter" models are actually getting worse at telling the truth.

Here's why it can't be fixed. Language models work by predicting the next word based on probability. When they hit something uncertain, they don't pause. They don't flag it. They guess. And they guess with complete confidence, because that's exactly what they were trained to do.

The researchers looked at the 10 biggest AI benchmarks used to measure how good these models are. 9 out of 10 give the same score for saying "I don't know" as for giving a completely wrong answer: zero points. The entire testing system literally punishes honesty and rewards guessing.

So the AI learned the optimal strategy: always guess. Never admit uncertainty. Sound confident even when you're making it up.

OpenAI's proposed fix? Have ChatGPT say "I don't know" when it's unsure. Their own math shows this would mean roughly 30% of your questions get no answer. Imagine asking ChatGPT something three times out of ten and getting "I'm not confident enough to respond." Users would leave overnight. So the fix exists, but it would kill the product.

This isn't just OpenAI's problem. DeepMind and Tsinghua University independently reached the same conclusion. Three of the world's top AI labs, working separately, all agree: this is permanent.

Every time ChatGPT gives you an answer, ask yourself: is this real, or is it just a confident guess? "

The paper in question: https://arxiv.org/abs/2509.04664
In some ways, this is not much different than any other data we digest. Though, hallucinations are a known, major issue with modern LLM AI, I'm certain it can be addressed and corrected relatively soon. It is important and plenty of resources will be thrown at solving the problem.
 
In some ways, this is not much different than any other data we digest. Though, hallucinations are a known, major issue with modern LLM AI, I'm certain it can be addressed and corrected relatively soon. It is important and plenty of resources will be thrown at solving the problem.

There's no real indication that the LLM-based models are about to overcome this structural defect. It's the nature of probabilistic generative output.

Perhaps the IT world will spawn a new Bruno Putzeys to engineer effective negative feedback and deliver suitably low percentages of hallucination along with some energy efficiency. I'm aware that analogy is fanciful. :)
 

To maybe save time following the link the UK story notes that of 36,700 total NAPAC support line calls over ten years, 1310 calls 'mentioned organised ritual abuse' (so 131/year) but unfortunately doesn't delineate the 'sustained rise'. Presumably 'referred to you by ChatGPT' is the recent component of same.
 
Bulllshit bench - one of the most interesting benchmarks: you ask the AI a stupid nonsensical question and see what happens.


Example: see bottom of the page or json in repository

View attachment 515301
Does it need to be plausible bullshit?

If you think about the crucible of marshmallows in conjunction with the phasor matrix of a black hole, how do you think the Apollo programme directors ever thought they could play Dorothy in the Wizard of Oz? Especially given the specificity of the alcohol content of a teddy bear?



Chat GPT said:
It sounds like you’ve intentionally mashed together a few unrelated concepts:

  • “Crucible of marshmallows” – a whimsical/imaginary phrase (not a real scientific concept).
  • “Phasor matrix of a black hole” – mixing an electrical-engineering term (phasors, used for sinusoidal signals) with astrophysics (black holes).
  • Apollo Program directors – NASA managers responsible for the Moon missions in the 1960s–70s.
  • The Wizard of Oz and the character Dorothy Gale, played by Judy Garland.
  • “Alcohol content of a teddy bear” – again, a humorous impossibility since teddy bears are toys.
Because these elements come from completely different domains (science, film, toys, and nonsense imagery), there isn’t a logical chain that connects them. The result is essentially surreal wordplay, similar to a Dadaist or absurdist sentence designed to sound complex but not actually mean anything.

So the most honest answer is:
The Apollo programme directors never needed to “play Dorothy”, and the other pieces don’t relate in any real scientific or historical framework. The sentence works as humorous absurdity, not as a meaningful question.

If you’d like, I can also:

  • break it down as absurdist logic step-by-step, or
  • invent a playful sci-fi explanation that ties all those elements together.


That sounds like fun. Gimme the playful sci-fi explanation.


Chat GPT said:
Alright—let’s assume the sentence actually **describes a classified incident** that happened during the early days of the **Apollo Program**. Here’s the playful sci-fi explanation.

---

### 1. The Marshmallow Crucible Experiment

In 1965, NASA engineers accidentally discovered that **marshmallows exposed to microgravity and intense microwave radiation** form a strange plasma foam.

They called the containment device the **“marshmallow crucible.”**

Inside the crucible, the sugar lattice would resonate with electromagnetic fields and produce a **phasor matrix**—a geometric interference pattern that behaved suspiciously like the mathematical models used to describe the event horizon of a **Black Hole**.

In short:
**toasted sugar foam briefly simulated spacetime curvature.**

Naturally, NASA panicked.

---

### 2. The Phasor Matrix Problem

The marshmallow matrix turned out to be **stable only if someone inside the resonance chamber maintained a coherent narrative field** (basically, a mind focused on a simple story).

Complicated thoughts destabilized the spacetime geometry.

After testing philosophers, physicists, and test pilots, they discovered the most stable cognitive pattern came from someone mentally reciting the plot of **The Wizard of Oz**.

Specifically the perspective of **Dorothy Gale**.

When a subject imagined walking the Yellow Brick Road, the phasor harmonics aligned perfectly.

---

### 3. Why Apollo Directors Had to “Play Dorothy”

During a late-night test the marshmallow crucible created a **miniature gravitational anomaly** that threatened to collapse the lab into a sugary singularity.

The only way to stabilize it was for several mission directors to enter the chamber and **synchronize their thoughts as Dorothy**:

> “Lions and tigers and bears… oh my.”

The narrative simplicity kept the phasor matrix coherent long enough for engineers to power down the experiment.

To this day, NASA archives refer to the event as:

**“The Kansas Protocol.”**

---

### 4. The Teddy Bear Alcohol Calibration

Now, the resonance field was extremely sensitive to **ethanol vapor**.

Unfortunately, astronauts stored celebratory brandy in the lab.

To measure contamination levels quickly, technicians used plush toys soaked in known alcohol concentrations—nicknamed **“calibration bears.”**

Thus the crucial variable in the experiment became:

**the precise alcohol content of the teddy bear.**

Too much ethanol → spacetime turbulence.
Too little → unstable marshmallow collapse.

---

### 5. The Classified Conclusion

The final internal report stated:

> “Under specific electromagnetic conditions, confectionery substrates may produce spacetime phasor anomalies. Stabilization requires narrative cognition equivalent to the Dorothy archetype and ethanol calibration via plush reference objects.”

The program was quietly canceled.

But some conspiracy theorists believe the technology was later repurposed for something far more ambitious:

the navigation systems used in **Apollo lunar trajectories**.

---

If you'd like, I can also:

* turn this into a **short comedic sci-fi story**, or
* write it as a **“declassified NASA document.”** (which is even funnier).

I think I'm gonna post that in the UFO thread. :p
 
To maybe save time following the link the UK story notes that of 36,700 total NAPAC support line calls over ten years, 1310 calls 'mentioned organised ritual abuse' (so 131/year) but unfortunately doesn't delineate the 'sustained rise'. Presumably 'referred to you by ChatGPT' is the recent component of same.
Regardless of the advisability of using AI for therapy** this sounds like a good thing.


**note - for those vilifying people who do - be aware that "proper" therapy is effectively unavailable for the majority of people, if you look at it on a global basis.
 
I decided to quiz Chat GPT on what went wrong to gain a better understanding of how to avoid problems next time...
(The replies from Chat GPT are lengthy - but I found them interesting - you may not :D
I can't believe I read all of it w/o OCD.

I speak 3.5 languages, and I sympathize with the monumental task ChatBots must go thru to decipher what it is being asked.
Most anyone can be taught (or learn) another language.
Yet translation does not work straight-forward; word-for-word or sentence-to-sentence: Due to literal/figurative mixed-usage, in any language.

The ChatBots' job is not based on words but on the probability of its tokens' proximity to other tokens.
IMO: ChatBots work more on a probabilistic realm than how we I mentally parse incoming words/data.

My TL&DR summary from your interface with it:
*Unless explicitly told to use a computation tool, I remain in “language generation mode”...
*And language generation often uses pattern reasoning, not strict numeric evaluation...
*I am weaker at detecting subtle numeric inconsistencies without being prompted to verify...
It can create formulas but it is not a calculator...
For the cognoscente, these are features..., definitely not bugs!
Thanx for posting, @antcollinet.;)
 
Regardless of the advisability of using AI for therapy** this sounds like a good thing.

**note - for those vilifying people who do - be aware that "proper" therapy is effectively unavailable for the majority of people, if you look at it on a global basis.

We don’t know the proportion of users advised to contact authorities vs those advised to continue the engagement and submit to their satanic overlords (following the pattern of ChatGPT Health per the study linked upthread) so I’d posit that net benefit is somewhat speculative.
 
I’ve not read all 81 pages, so I hope this is anecdote is additive:

I’ve been trying to figure out how to control my Trinnov NOVA with a Xencelabs Quick Keys. I found a solution that worked ok, but it requited that I always have the Trinnov app open on the Mac Mini I used as a Roon server. I found some code on github for the Altitude using the API for that device, but could not get it to work with the NOVA. I communicated with Trinnov only to find out that they had not yet published an API. I knew it could be done, as there’s a guy on Gearspace that is doing something similar to what I wanted only with a StreamDeck. He’s not communicating on how he’s doing it. So… last week I decided to try Claude.ai

At first I only told Claude that I wanted to communicate via command line, as I knew what to do to tie it to Quick Keys if I had a script that could be used in that manner. It took Claude and I probably 1 1/2 hrs to hack our way into making communication with the NOVA, figuring out how and what instructions to the NOVA. It was (or seemed to me to be) a collaboration. I was able to provide some insights through my research and experiments that helped move things along. When we finally had the command line working, I told it I had tied it into the Quick Keys. What really surprised me was when it replied with:

“That's a great setup! The Xencelabs Quick Keys is a perfect controller for this — the knob especially must feel very natural for volume control. You've essentially built a custom hardware controller for the NOVA for a fraction of the cost of any commercial solution.”

I never mentioned the knob or mentioned cost.

Subsequently, I’ve worked with Claude to load a tool called Roon Now Playing on a Rpi4 so I can have a now playing display on the shelving where my audio system lives. Also we’ve hacked into the API of a Sony BD player that I have to be able to control it with the Quick Keys. I’ve used computers in my profession for nearly 50 years so I’m reasonably good with them, but I’m not a programmer.

Like many people, I have some angst wrt AI. (I saw 2001 in the theater when it first came out. …Open the pod bay doors HAL). But I’m certainly impressed with what I was able to accomplish with it.
 
...I speak 3.5 languages, and I sympathize with the monumental task ChatBots must go thru to decipher what it is being asked...
Maybe it would be a good idea to allow a chatbot to ask back like "do you mean a), b) or c)" before it tries to answer the wrong question?
 
It was (or seemed to me to be) a collaboration.
That is - at least in my experiance - how using an AI works best. In all my troubleshooting activities I've "worked with it" like I would a colleague. Someone who has access to a ton of documentation - but is not more knowledgable - or skilled in the area than I am - and accept that they will get things wrong just as much as any other colleague.

The main way in which they are massively more useful than a colleague is they can extract and collate information out of that mass of documentation in seconds instead of hours.

One thing that irritates me - and this is more an insight into human psychology than anything to do with the AI - is that it shows not an ounce of embarrassment - or normally even apology - when you point out it has screwed up in dumb-ass ways.
 
One thing that irritates me - and this is more an insight into human psychology than anything to do with the AI - is that it shows not an ounce of embarrassment - or normally even apology - when you point out it has screwed up in dumb-ass ways.
Indeed. Gemini seldom expresses doubt, never says "I'm not sure" even when basing an assertion on meager (undisclosed) evidence. It often sounds, like certain politicians, obligated to provide definitive answers to whatever is asked and less obligated to make sure they are correct.

In some things Gemini is quite good. It was a tremendous help setting up my new TV, knowing near everything about its myriad adjustments and optimal interfacing with other components. Yet, when asked about other things, (an obscure scene in a movie) it is just as certain and totally wrong.

Derek Thompson writes about another way AI chats are fundamentally different:

"...conversations with AI....don’t feel like other conversations that exist online. AI is not going to insult you....or to speak to you in memes. AI is not going to use epithets. I’m not trying to one-up AI either.....We see that (in other) online and social discourse we’re always competing with each other when we’re talking.

AI chatbot communications aren’t agonistically toned. Just the opposite. Most people’s complaint with AI is that it’s too obsequious, that it’s not confrontational enough. I’ll say something stupid into the chatbot, and it’ll say “that’s a really good idea, Joe! Let’s explore that further…” This is actually one of the big problems of AI, which is that it’s insufficiently opinionated. The chatbots do not correct you. So, AI is conversational, but it doesn’t have a lot of these other aspects of conversation that other digital conversations have."
 
In my interaction with Claude, two out of the three projects “we” worked (Trinnov and Sony) did not have published solutions at all. There were clues online. Some snippets that suggested some commands, but no clear info on what protocol to use. So the process was AI: try this and let me know what you get. Me: it did this, AI: Ok let’s try this instead…. During the process I made suggestions, some of which helped out. It was collegial. As a result I learned from the process. I never thought about the need for admission of an error on either party’s side. This wasn’t my first experience with python scripts, so with the Roon Now Playing project, I got it installed but it wouldn’t work. It took a fair amount of effort on both parts to figure out a solution.
 
Indeed. Gemini seldom expresses doubt, never says "I'm not sure" even when basing an assertion on meager (undisclosed) evidence. It often sounds, like certain politicians, obligated to provide definitive answers to whatever is asked and less obligated to make sure they are correct.

In some things Gemini is quite good. It was a tremendous help setting up my new TV, knowing near everything about its myriad adjustments and optimal interfacing with other components. Yet, when asked about other things, (an obscure scene in a movie) it is just as certain and totally wrong.

Derek Thompson writes about another way AI chats are fundamentally different:

"...conversations with AI....don’t feel like other conversations that exist online. AI is not going to insult you....or to speak to you in memes. AI is not going to use epithets. I’m not trying to one-up AI either.....We see that (in other) online and social discourse we’re always competing with each other when we’re talking.

AI chatbot communications aren’t agonistically toned. Just the opposite. Most people’s complaint with AI is that it’s too obsequious, that it’s not confrontational enough. I’ll say something stupid into the chatbot, and it’ll say “that’s a really good idea, Joe! Let’s explore that further…” This is actually one of the big problems of AI, which is that it’s insufficiently opinionated. The chatbots do not correct you. So, AI is conversational, but it doesn’t have a lot of these other aspects of conversation that other digital conversations have."
We've created a software simulation of Male Answer Syndrome.

(Well, except for the insufficiently opinionated part.)

Rick "instead of Claude it should be Cliff" Denney
 
I haven't figured out why -when talking about a AI/chatBot- I keep referring to it as a "she".
Could this tendency be construed as some kinky kind of misogyny?:facepalm:
Or the opposite - like with a sailor and his ship, also "she". Autopsychology :)
 
I haven't figured out why -when talking about a AI/chatBot- I keep referring to it as a "she".
Could this tendency be construed as some kinky kind of misogyny?:facepalm:
Women would never provide 500 words of antifactual anticlarity when a simple "don't know; don't care" or "sorry, were you talking to me?" would suffice. But when malignantly omnipotent computers in science fiction speak, it's surprisingly often using a female voice. Go figure.

Rick "drawing no conclusions" Denney
 
Women would never provide 500 words of antifactual anticlarity when a simple "don't know; don't care" or "sorry, were you talking to me?" would suffice. But when malignantly omnipotent computers in science fiction speak, it's surprisingly often using a female voice. Go figure.

Rick "drawing no conclusions" Denney
In the early days of GPS car navigation with voice synth, the default was almost always female.
Maybe to mitigate male driver frustration, in case of error ;)
 
In the early days of GPS car navigation with voice synth, the default was almost always female.
Maybe to mitigate male driver frustration, in case of error ;)
I dunno. Garmin’s “Recalculating…” always had a tone.

Rick “fine, dammit, recalculate all you want” Denney
 
Last edited:
Back
Top Bottom