I think you lead the result by not providing enough context like saying how there is no objective way to measure the quality of an LLM generation after the fact nor before.
Edit I asked ChatGPT with a more proper context:
"It’s not inherently insulting to say that an LLM (Large Language Model) cannot guarantee the best quality because it’s a factual statement grounded in the nature of how these models work. LLMs rely on patterns in their training data and probabilistic reasoning rather than subjective or objective judgments about "best quality."
I can't criticize how you prompted it because you did not link the transcript :)
Zooming out, you seem to be in the wrong conversation. I said:
> the LLM can solve a general problem (or tell you why it cannot), while your calculator can only do that which it's been programmed.
You said:
> Do you have any evidence besides anecdote?
I think that -- for both of us now having used chat gpt to generate a response -- we have good evidence that the model can solve a general program (or tell you why it cannot), while a calculator can only do the arithmetic for which it's been programmed. If you want to counter, then a video of your calculator answering the question we just posed would be nice.
Edit I asked ChatGPT with a more proper context: "It’s not inherently insulting to say that an LLM (Large Language Model) cannot guarantee the best quality because it’s a factual statement grounded in the nature of how these models work. LLMs rely on patterns in their training data and probabilistic reasoning rather than subjective or objective judgments about "best quality."