As much ChatGPT says I’m basically a genius for asking it a good Vegan cake reci...

DenisM · 2025-10-31T15:26:23 1761924383

ChatGPT chat logs contain massive amount of data teased out of people’s brains. But much of it is lore, biases, misconceptions, memes. There are nuggets of gold in there but it’s not at all clear if there’s a good way to extract them, and until then chat logs will make things worse, not better.

I’m thinking they eventually figure out who is the source of good data for a given domain, maybe.

Even if that is solved, models are terrible at long tail.

api · 2025-10-31T16:17:18 1761927438

When I say models will plateau I don't mean there will be no progress. I mean progress will slow down since we'll be scraping the bottom of the barrel for training data. We might never quite run out but once we've sampled every novel, web site, scientific paper, chat log, broadcast transcript, and so on, we've exhausted the rich sources for easy gains.

DenisM · 2025-10-31T17:09:04 1761930544

Chat logs don’t run out. We may run out of novelty in those logs, at which point we may have ran out of human knowledge.

Or not - there still knowledge in people heads that is not bleeding into ai chat.

One implication here is that chats will morph to elicit more conversation to keep mining that mine. Which may lead to the need to enrage users to keep engagement.

alonmower · 2025-10-31T22:54:24 1761951264

The necessity of higher quality data from vetted experts is why Mercor just raised at 10B

delis-thumbs-7e · 2025-11-01T13:30:14 1762003814

I’mafraid I don’t share your optimism. I think we are more or less seeing the limitations of the transformer architecture.