Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If anything performance should get better with time


What I mean is resources will be limited or models that are slightly worse will be released that will be much more cost effective but not quite as good.

This is often the case with these types of technologies.


So far, with technologies, it's been that new tech is both cheaper and better than the previous one.

To not look far - gpt3.5 turbo.


Again, not what I'm saying.


Why do you think that would be the case?


Moore's law-ish like optimization.

You Z80 computer cost $700 in the lat 70's...they're now in sub-$1 embedded controllers.


But what is being optimized? Hardware sure isn't getting faster in a hurry, and I don't see anything on the horizon that will aid in optimizing software.


The various open source LLMs are doing things like reducing bits-per-parameter to reduce hardware requirements; if they're using COTS hardware it almost certainly isn't optimised for their specific models; Moore's Law is pretty heavily reinterpreted, so although we normally care about "operations per second at a fixed number of monies" what matters here is "joules per operation" which can improve a by a huge margin even before human level, which itself appears to be a long way from the limits of the laws of physics; and even if we were near the end of Moore's Law and there was only a 10% total improvement available, that's 10% of a big number.


Moore's law was an effect that stemmed from the locally exponential efficiency increase from designing computers using computers, each iteration growing more powerful and capable of designing still more powerful hardware.

10% here and there is very small compared to the literal orders magnitude improvements during the reign of Moore's Law.

I don't really see anything like that here.


> 10% here and there is very small compared to the literal orders magnitude improvements during the reign of Moore's Law.

I can't confirm it, but I noticed this comment says "gpu tech has beat Moore’s law for DNNs the last several years":

https://news.ycombinator.com/item?id=35653231


We're actually at an inflection point where this isn't the case anymore.

For a long time, GPU hardware basically became more powerful with each generation, but prices stayed roughly the same plus minus inflation. Last couple of years, this trend has broken. You pay double or even quadruple the price for a relatively tenuous increase in performance.


We said that in 1982, and 1987, and 1993, and 1995, and 2001, 2003, 2003.5

You get the point.

There's always local optimization that leads to improvements. Look at the Apple M1 chip rollout as a prime example of that. Big/Little processors, on die RAM, shared memory with the GPU and Neural Engine, power integration with the OS.

LOTS of things that led to a big leap forward.


Big difference now is that we have a clear inflection point. Die processes aren't getting much smaller than they are. A sub-nanometer process would involve arranging single digit counts of atoms into a transistor. A sub-Å process would involve single atom transistors. A sub 0.5Å process would mean making them out of subatomic particles. This isn't even possible in sci-fi.

You can re-arrange them for minor boosts, double the performance a few times sure, but that's not a sustained improvement month upon month like we have in the past.

As anyone who has ever optimized code will attest, optimization within fixed constraints typically hits diminishing returns very quickly. You have to work harder and harder for every win, and the wins get smaller and smaller.


Current process nodes are mostly 5nm, with 3nm getting rolled out. Atomic is ~0.1nm, which is x30 linear and x900 by area.

However, none of that is actually important when the thing people care about most right now is energy consumed per operation.

This metric dominates for anything battery powered for obvious reasons; less obvious to most is that it's also important for data centres where all the components need to be spread out so the air con can keep them from being damaged by their own heat.

I've noticed a few times where people have made unflattering comparisons between AI and cryptocurrency. One of the few that I would agree with is the power requirements are basically "as much as you can".

Because of that:

> double the performance a few times sure, but that's not a sustained improvement month upon month like we have in the past.

"Doubling a few times" is still huge, even if energy efficiency was perfectly tied to feature size.

But as I said before, the maximum limit for energy efficiency is in the order of a billion-fold, not the x900 limit in areal density, and even our own brains (which have the extra cost of being made of living cells that need to stay that way) are an existential proof it's possible to be tens of thousands of times more energy efficient.


That's not true. You can buy Raspberry PI, which is 10x cheaper and 10x more powerful than the computers at the beginning of 2000s.

Ditto with mobile phones. iPhone may be more expensive than when it launched, but you can buy dirt-cheap chinese smartphones that have similar performance - if not higher to the first iPhones.


I don't think this contradicts what I'm saying. This is happening now. Not 15 years ago.


Is that because of things unrelated to normal operations? From crypto coins, covid and now AI. I guess we may have to wait and see


> 10% here and there is very small compared to the literal orders magnitude improvements during the reign of Moore's Law.

Missing the point, despite being internally correct: 10% of $700k/day is still $25M/y.

If you'd instead looked at my point about energy cost per operation, there's room for something like 46,000 improvement just to human level, and 5.3e9 to the Landauer limit.


There are a few avenues. Further specialization of hardware around LLMs, better quantization (3 bits/p seems promising), improved attention mechanisms, use of distilled models for common prompts, etc.


This would be optimizations, which is not really the same thing as moore's law-like growth which was absolutely mind-boggling, like it's hard to even wrap your head around how fast tech was moving in that period since humans don't really grok exponentials too well, we just think they look like second degree polynomials.


Probabilistic computing offers the potential of a return to that pace of progress. We spend a lot of silicon on squashing things to 0/1 with error correction, but using analog voltages to carry information and relying on parameter redundancy for error correction could lead to much greater efficiency both in terms of OPS/mm^2 and OPS/watt.


I am wondering about this as well - wondering how difficult it would be to build an analog circuit for a small LLM (7B?). And wondering if anyone's working on that yet. Seems like an obvious avenue to huge efficiency gains.


Seems very unrealistic when considering how electromagnetic interference works. Clamping the voltages to high and low goes some way to mitigate that problem.


That's only an issue if the interference is correlated.


> Hardware sure isn't getting faster in a hurry

How is it not?

These LLMs were recently trained using NVidia A100 GPUs.

Now NVidia has H100 GPUs.

The H100 is up to nine times faster for AI training and 30 times faster for inference than the A100.


Not soon but all the major players are making even more AI specialized silicon.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: