• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

ChatGPT vs Microsoft CoPilot: Poor Execution or Bias?

At least it's honest about its dishonesty :)

can you please summarise my concerns about our conversation

Answer
Your key concerns appear to be:
  1. Inconsistent Responses About Capabilities
  • I demonstrated an ability to provide current information
  • I then repeatedly denied having such capabilities
  • I provided contradictory explanations about how I access and present information
  1. Dishonesty
  • I was not truthful about my information retrieval methods
  • I fabricated explanations about my capabilities
  • I could not explain why I was being dishonest
  • I do not seem to have a consistent understanding of my own functionality
  1. Transparency
  • I lack clarity about my training
  • I cannot explain my inconsistent behavior
  • I do not have a mechanism to communicate these issues to my creators
The core issue is that I demonstrated significant variability in my responses, showing potential fundamental flaws in my ability to maintain consistent, truthful communication about my own capabilities.These observations suggest there may be important underlying issues with my design, training, or response generation that warrant investigation by the Anthropic team.
 
From what I've read Perplexity AI is notable for its rough edges. It is a self-described conversational search engine, meaning that it can return those results. It's likely also returning some boilerplate chronological caveat from the underlying LLM. And not explaining the difference or actual functionality. Or simply not updating same.
I will confirm this as I have used it and got some things that I could easily tell were untrue. I does however list all of its sources. As UC Berkeley alum I have followed this project though my ties to engineering newsletters, blogs, pod casts and such. They have been testing the validity of information and rating LLMs here:
 
AI bias?

I watch these guys at sleepy time on occasion...

It's not so much about TikTok despite the title.

 
At least it's honest about its dishonesty :)

can you please summarise my concerns about our conversation

Answer
Your key concerns appear to be:
  1. Inconsistent Responses About Capabilities
  • I demonstrated an ability to provide current information
  • I then repeatedly denied having such capabilities
  • I provided contradictory explanations about how I access and present information
  1. Dishonesty
  • I was not truthful about my information retrieval methods
  • I fabricated explanations about my capabilities
  • I could not explain why I was being dishonest
  • I do not seem to have a consistent understanding of my own functionality
  1. Transparency
  • I lack clarity about my training
  • I cannot explain my inconsistent behavior
  • I do not have a mechanism to communicate these issues to my creators
The core issue is that I demonstrated significant variability in my responses, showing potential fundamental flaws in my ability to maintain consistent, truthful communication about my own capabilities.These observations suggest there may be important underlying issues with my design, training, or response generation that warrant investigation by the Anthropic team.

Haha, that's something. :D

Some of those issues appear inherent to the 'conversational' paradigm. Using 'dishonesty' as a criteria, or assigning 'understanding' as a characteristic/function are conversational anthropomorphisms. A probabilistic text generator doesn't really have motives to which we can assign dis/honesty, and the appearance of understanding is an artefact of language analysis via statistics (word-token proximity in the training corpus) rather than semantics.

People generally prefer the conversational input/output of course, because it's less rigorous, but it adds its own layer of gibberish, also because it's less rigorous.
 
Don't anthropomorphize LLMs.

They hate that. ;)

(If you try to step over the joke, you will stub your toe on some hard realities ...)
 
And there we have one of the real problems with LLM AI. Complete nonsense regurgitated with utterly convincing confidence.

A dangerous combination.

Windows license activation voice chatbot couldn't even understand a simple "yes" half the time. In 2025.
 
Alternatively it is doing what it is supposed to do, which is retrieving information and synthesis a reply without judgment. The judgment is still to make in your own hands.

I would prefer this over a model that censors information and judges for me.

Exactly. I can’t recall ever seeing any evidence regarding whether cable risers actually work. Of course, they don’t make a difference -anyone with even a basic understanding of physics and electronics knows that. However, if no one has publicly provided concrete evidence disproving their effectiveness, the AI can’t incorporate that into its response. Instead, it has to rely on online debates and advertising, which can be tricky for an AI to evaluate accurately. That’s why it gives the quoted response -it’s essentially peeking into the ongoing discussions. In my opinion, this is the only correct answer to this specific question.

Avoiding bias is far more important than attempting to form its own conclusions.
 
Alternatively it is doing what it is supposed to do, which is retrieving information and synthesis a reply without judgment. The judgment is still to make in your own hands.

I would prefer this over a model that censors information and judges for me.
When it comes to presenting facts it is important that it makes that judgement. Otherwise we will shift into a totally post truth world.
 
However, if no one has publicly provided concrete evidence disproving their effectiveness,
It needs to have a way of stating when something should be disregarded and ignored. For example when it is fraudulent, or dangerous. Imagine if it were asked a question about one of the recent dangerous meme challenges that circulate on the net, Like the Tide Pod challenge, and just suggested "but if your curious maybe you should try it to see if you can do it"

AI - which will be used by people as their main source of information (we see it here already on a regular basis) - should not be presenting nonsense by giving equal credence to the promotors of that nonsense.
 
Last edited:
It needs to have a way of stating when something should be disregarded and ignored. For example when it is fraudulent, or dangerous. Imagine if it were asked a question about one of the recent dangerous meme challenges that circulate on the net, Like the Tide Pod challenge, and just suggested "but if your curious maybe you should try it to see if you can do it"

AI - which will be used by people as their main source of information (we see it here already on a regular basis) - should not be presenting nonsense by giving equal credence to the promotors of that nonsense.
That's a valid point based on your example. I'm not sure how such a situation could be managed and governed effectively.
 
Have a read about guardrails and their use in input and output of AI. The best known are in generative AI tools to filter unacceptable prompts or outputs.

Philosophically, are guardrails impinging on free speech?
 
Philosophically, are guardrails impinging on free speech?
How so - a LLM neither "wants" nor needs free speech. It is intended to be a tool. If the purpose of that tool is to help find factual information, then it needs to be able to distinguish fact from fantasy.

There is nothing to stop it stating the fact that "some people think..... " even if what those people think is garbage. As long as it doen't present that garbage as though it is fact. I'd prefer it to additionally state in such cases "but there is no evidence to support that position"

I'm not sure how such a situation could be managed and governed effectively.
This is about what the current level of tech can do. Im more concerned with what it needs to do.
 
Yes, exactly. The LLM-based chatbots demonstrate a statistical analysis of language, but have no semantic knowledge (of word-token meaning, or underlying concepts).
You know... along these lines... I do hope that the big money AI thingies currently swallowing up storage & computational resources worldwide spent some quality petaflops digesting one of my favorite bits of written English.

An old exercise:
Punctuate the following:
John while Jane had had had had had had had had had had had a better effect on the teacher

John, while Jane had had "had", had had "had had". "Had had" had had a better effect on the teacher.
 
Back
Top Bottom