If you think this is proof of it being true, then I am both worried and astonished. How about looking for the information yourself, instead of relying on LLMs? This is HN I thought?!
I have it, it does not. Well, it may. It depends on the firmware you install on the cable. Depending on the firmware, different things will be broken. I tried them all. There's no version that will consistently support 2160p@120 and 4:4:4/RGB and HDR and VRR, and without random handshake issues.
This doesn't even seem to look at "predictions" if you dig into what it actually did. Looking at my own example (#210 on https://karpathy.ai/hncapsule/hall-of-fame.html with 4 comments), very little of what I said could be construed as "predictions" at all.
I got an A for commenting on DF saying that I had not personally seen save corruption and listing weird bugs. It's true that weird bugs have long been a defining feature of DF, but I didn't predict it would remain that way or say that save corruption would never be a big thing, just that I hadn't personally seen it.
Another A for a comment on Google wallet just pointing out that users are already bad at knowing what links to trust. Sure, that's still true (and probably will remain true until something fundamental changes), but it was at best half a prediction as it wasn't forward looking.
Then something on hospital airships from the 1930s. I pointed out that one could escape pollution, I never said I thought it would be a big thing. Airships haven't really ever been much of a thing, except in fiction. Maybe that could change someday, but I kinda doubt it.
Then lastly there was the design patent famously referred to as the "rounded corner" patent. It dings me for simplifying it to that label, despite my actual statements being that yes, there's more, but just minor details like that can be sufficient for infringement. But the LLM says I'm right about ties to the Samsung case and still oversimplifying it. Either way, none of this was really a prediction to begin with.
Not sure what the fuss in this thread is about, this is a completely believable claim. In table 5 he gets 83.26% with labels only (which I assume means not using the teacher) and 91.40% with the teacher. This is a nice result, not hugely ground breaking I'd say. Maybe training longer or using some clever normalisation would even close the gap. It's not something you can call 224x compression though so I would remove that claim everywhere.
This is basically a variation of distillation through the entire network, not just the last layer as typical
I appreciate this take. I largely agree with the framing, and I think this is closer to the intended reading than some of the more heated responses in the thread. (I'm understanding this is whats expected in the forum, and now I welcome it.)
You’re on point that the result is believable and not presented as some singular, world-ending breakthrough. Not at all. The point of Table 5 was to show that a surprisingly large amount of task-relevant signal survives under very strict constraints, not to claim that this alone replaces full inference or training. In that sense, calling it “nice but not shocking” is totally fair. Also making a lot of the other takes confounding more than anything.
On the 224× compression language, the claim is specifically about task-specific inference paths, NOT about compressing the entire model or eliminating the teacher. I agree that if someone reads it as end-to-end model compression, that framing invites confusion. That's good feedback and I’m taking it seriously and tightening up going forward.
I also agree that, viewed narrowly, this overlaps with distillation. The distinction I'm trying to surface (the part thats interesting here) is where and how early the structure appears, and how stable it's under freezing and extreme dimensional collapse. The paper deliberately avoids additional tricks, longer training, or normalization schemes precisely so that effect size is not inflated. In other words, this is closer to a lower bound than an optimized ceiling.
What I would add is this: believe it or not, the paper is actually intentionally conservative contrary to what the thread may suggest. It isolates one axis of the problem to make the geometry visible. There's ongoing work that relaxes some of those constraints and explores how these representations compose, persist across tasks, and interact with different extraction points. It's not ready to be released yet (and may never be released) But it does address several of the gaps you’re pointing out.
So basically I don’t disagree with your characterization. This is exactly what it is. A first, deliberately narrow step rather than the full story. Thanks for engaging with it at that level. I appreciate your time.
> On the 224× compression language, the claim is specifically about task-specific inference paths, NOT about compressing the entire model or eliminating the teacher.
I understand that after reading the paper, but it's not in the title and that's what people read first. Omitting it from the title might have given you a much more favorable reception.
It's not easy to get noticed when you're not from a big lab, don't get discouraged. It's nice work.
the numbers put the idea of total extinction of life on earth way out there in the relm of the improbable, and suggest that perhaps some living things will survive
the expansion of our sun, when it inevitably turns into a red giant, or at least till some later phase in that process
it also means that irrate alliens looking to rid the universe of earth life, have got there work cut out for them
When the Sun turns into a red giant the water on Earth won't stay liquid for much longer. Lack of water will be a much bigger challenge for life before the planet hits that limit of 122°C.
Look up what kind of tracking UK ISPs are mandated to do by law and how easy it is to request that information. Your VPN can't possibly be worse than that.
reply