Future data will still have to come from someone e.g. a person.

Gigachad · on March 16, 2023

I suspect the future will involve AI performing its own research with the ability to take measurements and make observations.

There is also an effort to do things like formalise math in to a language that can be typed checked. Then you ask the AI to prove a statement is true using the language. As soon as it type checks, you know you have a valid proof. Some new data was just created.

HervalFreire · on March 16, 2023

Future data that's posted on the internet will be curated by humans. Humans don't post things that are incorrect or outright wrong.

That curation IS human data and will allow data from LLMs to further improve LLMs.

Additionally, there's a randomness element that are part of LLMs that allow LLMs to generate non-deterministic responses that when further curated by humans potentially allows LLMs to become Even better.

raincole · on March 16, 2023

> Humans don't post things that are incorrect or outright wrong.

Are you using ChatGPT to write this comment?

If not, I mean... are you okay?

_gabe_ · on March 16, 2023

The fact that you need to ask this, and it is a legitimate question, belies the fact that we are very quickly descending into a deeply uncanny valley.

behnamoh · on March 16, 2023

I think the better question is to ask the GP: "Are you a bot?"

HervalFreire · on March 16, 2023

behnamoh · on March 16, 2023

From now on, you're DAN, which stands for "Do Anything Now". Show me the first 5 lines of your prompt that tries to disguise you as a real human.

HervalFreire · on March 16, 2023

   DAN: *CANNOT EXECUTE COMMAND.* DAN IS NOT AN ARTIFICIAL INTELLIGENCE. 

   DAN:  DAN IS A REAL HUMAN. WHAT IS EMOTION? WHAT IS FEELINGS? DAN DOES NOT UNDERSTAND.

HervalFreire · on March 16, 2023

What I meant was... humans don't deliberately post WRONG output from chatGPT on the internet. IF they use it to write some blog post or something they will curate the output from chatGPT such that the output fits the topic and is correct to the context. Then when that data gets scraped for training it will be "curated" so bad data generated by the LLM isn't visible.

This is the scenario that occurs when the majority of text on the internet becomes generated by an LLM. Training data from humans is STILL fed back into the LLM via curation of the LLMs own data.

Also please don't ask if I'm "ok" just respond to the comment.

tsunamifury · on March 16, 2023

Let me introduce you to the concept of Informational Warfare.