We're this close to getting a personal assistant with all or much of humankind's...

edgyquant · on Feb 20, 2024

NYT has to win this and I don’t get the argument in OpenAIs favor. Most of the ones I’ve heard rely on anthropomorphism of LLMs. There is plenty of public domain knowledge to train on and for that that isn’t why shouldn’t there be a payment made? We would still have the assistants coming the only difference is the cost would reflect the underlying work of the initial creators and thus VCs wouldn’t be able to destroy a ton of industries the way they’ve done for the past decade already.

This will also allow more competition as companies don’t have to accept that OpenAI downloaded the internet before it was locked down and thus will always have the most quality data. Instead all of these companies may start opening up their own apis for training on allowing anyone with compute to train on a similar dataset as OpenAI.

almatabata · on Feb 20, 2024

> I don’t get the argument in OpenAIs favor

The argument, or at least one of the arguments in openAI's favor is that the training is fair use because it is transformative. Is the resulting AI a replacement for the original work? I would argue it is not.

nefrix · on Feb 20, 2024

As i mentioned in an earlier comment, it is very easy to make ChatGPT post paragraphs from books, by tricking it. They added some kind of exceptions to not show copyrighted content, but still, it is not impossible to reproduce an exact text. From my point of view this is a clear copyright infringement case.

almatabata · on Feb 20, 2024

> post paragraphs from books, by tricking it

Posting individual paragraphs verbatim is still ok. An individual paragraph(s) is not a replacement for the whole book. Websites post extracts of books all the time (like the google example in the article), that is not enough of a bar for copyright infringement.

edgyquant · on Feb 21, 2024

They clearly don’t think so, if I ask it to give me the first few paragraphs of the first Harry Potter it starts to give a dump and then fails saying it can’t give that info. Clearly it’s trained on the book and its creators belief outputting this is iffy.

almatabata · on Feb 21, 2024

> They clearly don’t think so

Yes the plaintiff's disagree but just because they disagree does not make them right. I simply explained why, even if it outputs who whole paragraphs of the copyrighted work, it can still be considered as free use. If I cite a paragraph of an article in my youtube video, I can claim fair use.

I do not know if the judge will see it this way of course and I am not a lawyer.

nefrix · on Feb 20, 2024

Your right, but what if I type prompts in order to make ChatGPT to show me the full book, or at least big chunks of it? You agree that right now it is possible this, and they didn’t manage how to stop it doing that, right?

almatabata · on Feb 20, 2024

The moment OpenAI learns of the loophole they will patch it. Just like youtube takes down infringing video, OpenAI has to keep the bot from regurgitating the whole work.

OpenAI's business is not based on regurgitating the work. It is based on providing output derivated from those works. No buys an OpenAI subscription to get the AI to give you an existing book. People buy it to have OpenAI generate the next original book.

nefrix · on Feb 21, 2024

I get your point. I am reflecting on it, cause in a way it tends to be stronger than mine, specialty discussing about written work (books or in the example of the article, NYT articles)

But what about the images. The italian plumber who looks very much like SuperMario mentioned in the article. How would a judge not punish a company selling visual representations of a character that is copyright owned by someone else?

almatabata · on Feb 21, 2024

> But what about the images. The italian plumber who looks very much like SuperMario

That looks indeed like a case where any regurgitation becomes a problem in my opinion. I think the closest thing to this would be Fan-art. If I draw a picture of Mario for fun and post it online, it remains fair use as long as its not commercial or used for promoting a product (from my limited understanding). In the OpenAI case they sell subscriptions to their service. You can therefore make the case that they are selling Fan-art for profit.

This leads me to think companies that own this kind IP have a more solid case against OpenAI. It is entirely possible that New York Times loses the lawsuit but Nintendo wins if they sue.

kromem · on Feb 20, 2024

You'd be unable to do so.

ninjasaid13 · on Feb 20, 2024

ChatGPT would hallucinate it.

scjody · on Feb 20, 2024

Even the anthropomorphism argument doesn't hold up under close scrutiny. When I was in high school I was asked to memorize several poems, including a few that are under copyright today. If I regurgitate one of these poems and present it as my own, this clearly infringes copyright, even if I no longer recall where the poem came from or who wrote it.

How is what OpenAI is doing with NYT stories any different, other than the architecture and substrate of the neural network?

kromem · on Feb 20, 2024

Was your memorizing it infringement?

I'm all for policing the outputs of generative models and enforcing copyright on their usage.

I am very much against ruling that their training is infringement.

A model which uses old NYT articles to learn the relationship between words and concepts which turns around and is used to identify potentially falsified research papers for review should not be prevented from existing.

If the model is used to reproduce copyrighted material - by all means the person running it should be liable.

This would create a ML industry around copyright identification as a pre filter before outputting (ironically requiring training on copyrighted material to enforce).

trilobyte · on Feb 20, 2024

How far are we from that right now? If you have an internet connection you basically have access to 95%+ of the information available in the world right now. Is your goal to completely delegate having to think through that information? To what end?

kromem · on Feb 20, 2024

Human brains have a limit on the number of things they can deal with simultaneously called "the rule of seven plus or minus two."

I don't know if you've seen the demo of Gemini 1.5 parsing a video with a 1M context length, but it does things few humans could.

The ability to take all that information and put it into an engine which can identify relationships between data with greater breadth and depth than any individual human will be unfathomably valuable to progress and advancement.

As a trivial example - there's been a number of different diets that have shown success for autoimmune conditions across meta-analyses. But many of the details in the diets are contradictory, such as one being very protein heavy and another being vegetarian. How convenient would it be to ask a model what the common factors are across the half dozen diets that all seem to work?

One day soon it will be feasible for medical trials to do full genome sequencing for participants. Would it be convenient to have a model identify common genes for those where treatment was ineffective vs effective?

asadotzler · on Feb 21, 2024

Knee-capping is just what we need.

aleph_minus_one · on Feb 20, 2024

> We're this close to getting a personal assistant with all or much of humankind's knowledge.

We are still this close to getting a personal assistant with all or much of humankind's knowledge - just not a completely legal one. ;-)