> The piracy comes first, and it's exactly the same thing. GenAI Corp. can't train models on illicitly obtained media before illicitly obtaining said media.
My contention is that this is not happening. Most generative AI companies do not source their training data from illegal torrents and the few that do are currently paying for it. Further, I suspect the companies that get away with it today are _smaller_ not larger.
Training data is typically sourced by scraping the publicly available web.
> Of course it's not the same thing -- it's way worse.
Setting aside your own moral standards here, we should at least be able to agree that from a legal standpoint training a model is not copyright infringement.
My contention is that this is not happening. Most generative AI companies do not source their training data from illegal torrents and the few that do are currently paying for it. Further, I suspect the companies that get away with it today are _smaller_ not larger.
Training data is typically sourced by scraping the publicly available web.
> Of course it's not the same thing -- it's way worse.
Setting aside your own moral standards here, we should at least be able to agree that from a legal standpoint training a model is not copyright infringement.