I don’t use Reddit, but I did occasionally read Ask Historian threads when they were linked from other places on the internet:
Good luck trying to find the actual historians that made the subreddit worth visiting (many often mods themselves, I presume) once said mods are demoted and policies are changed.
The AskHistorians mods themselves have said that there's no alternative. One way or another, they will be back. As for the rest of the subs that do not need subject matter experts, like r/funny, their demodding will not matter.
What I mean is resources will be limited or models that are slightly worse will be released that will be much more cost effective but not quite as good.
This is often the case with these types of technologies.
But what is being optimized? Hardware sure isn't getting faster in a hurry, and I don't see anything on the horizon that will aid in optimizing software.
The various open source LLMs are doing things like reducing bits-per-parameter to reduce hardware requirements; if they're using COTS hardware it almost certainly isn't optimised for their specific models; Moore's Law is pretty heavily reinterpreted, so although we normally care about "operations per second at a fixed number of monies" what matters here is "joules per operation" which can improve a by a huge margin even before human level, which itself appears to be a long way from the limits of the laws of physics; and even if we were near the end of Moore's Law and there was only a 10% total improvement available, that's 10% of a big number.
Moore's law was an effect that stemmed from the locally exponential efficiency increase from designing computers using computers, each iteration growing more powerful and capable of designing still more powerful hardware.
10% here and there is very small compared to the literal orders magnitude improvements during the reign of Moore's Law.
We're actually at an inflection point where this isn't the case anymore.
For a long time, GPU hardware basically became more powerful with each generation, but prices stayed roughly the same plus minus inflation. Last couple of years, this trend has broken. You pay double or even quadruple the price for a relatively tenuous increase in performance.
We said that in 1982, and 1987, and 1993, and 1995, and 2001, 2003, 2003.5
You get the point.
There's always local optimization that leads to improvements. Look at the Apple M1 chip rollout as a prime example of that. Big/Little processors, on die RAM, shared memory with the GPU and Neural Engine, power integration with the OS.
Big difference now is that we have a clear inflection point. Die processes aren't getting much smaller than they are. A sub-nanometer process would involve arranging single digit counts of atoms into a transistor. A sub-Å process would involve single atom transistors. A sub 0.5Å process would mean making them out of subatomic particles. This isn't even possible in sci-fi.
You can re-arrange them for minor boosts, double the performance a few times sure, but that's not a sustained improvement month upon month like we have in the past.
As anyone who has ever optimized code will attest, optimization within fixed constraints typically hits diminishing returns very quickly. You have to work harder and harder for every win, and the wins get smaller and smaller.
Current process nodes are mostly 5nm, with 3nm getting rolled out. Atomic is ~0.1nm, which is x30 linear and x900 by area.
However, none of that is actually important when the thing people care about most right now is energy consumed per operation.
This metric dominates for anything battery powered for obvious reasons; less obvious to most is that it's also important for data centres where all the components need to be spread out so the air con can keep them from being damaged by their own heat.
I've noticed a few times where people have made unflattering comparisons between AI and cryptocurrency. One of the few that I would agree with is the power requirements are basically "as much as you can".
Because of that:
> double the performance a few times sure, but that's not a sustained improvement month upon month like we have in the past.
"Doubling a few times" is still huge, even if energy efficiency was perfectly tied to feature size.
But as I said before, the maximum limit for energy efficiency is in the order of a billion-fold, not the x900 limit in areal density, and even our own brains (which have the extra cost of being made of living cells that need to stay that way) are an existential proof it's possible to be tens of thousands of times more energy efficient.
That's not true. You can buy Raspberry PI, which is 10x cheaper and 10x more powerful than the computers at the beginning of 2000s.
Ditto with mobile phones. iPhone may be more expensive than when it launched, but you can buy dirt-cheap chinese smartphones that have similar performance - if not higher to the first iPhones.
> 10% here and there is very small compared to the literal orders magnitude improvements during the reign of Moore's Law.
Missing the point, despite being internally correct: 10% of $700k/day is still $25M/y.
If you'd instead looked at my point about energy cost per operation, there's room for something like 46,000 improvement just to human level, and 5.3e9 to the Landauer limit.
There are a few avenues. Further specialization of hardware around LLMs, better quantization (3 bits/p seems promising), improved attention mechanisms, use of distilled models for common prompts, etc.
This would be optimizations, which is not really the same thing as moore's law-like growth which was absolutely mind-boggling, like it's hard to even wrap your head around how fast tech was moving in that period since humans don't really grok exponentials too well, we just think they look like second degree polynomials.
Probabilistic computing offers the potential of a return to that pace of progress. We spend a lot of silicon on squashing things to 0/1 with error correction, but using analog voltages to carry information and relying on parameter redundancy for error correction could lead to much greater efficiency both in terms of OPS/mm^2 and OPS/watt.
I am wondering about this as well - wondering how difficult it would be to build an analog circuit for a small LLM (7B?). And wondering if anyone's working on that yet. Seems like an obvious avenue to huge efficiency gains.
Seems very unrealistic when considering how electromagnetic interference works. Clamping the voltages to high and low goes some way to mitigate that problem.
This, there's like a endless line of companies waiting to snatch OpenAI's employees right outside the door. $200k average comp at OpenAI would be laughable.
As a side, I am a bit shocked by these numbers. Is this an American thing? I understand myself to be good software engineer with good well rounded experience of 14+ years. Yet my income, in Europe, is really above 100k.
What I am wondering, for those earning 500k, how big is your work load/stress. Would this be a 9-5 job you leave at the office when going home. Or does a job that earns so much consume your life?
Honestly, depends. Some teams at FAAMNG are really stressful and if you work on a Tier 1 service even with loads of SRE support you have to be working a fair bit. That being said, the pay is for design decisions at the higher IC level (senior or staff) and most people at that level are very smart. I’m not saying this salary is for 10x engineers or anything.
I would say 50% the work is harder and consuming and then 50% they can just afford to pay you more and lock up talent because of the wild margins on their products.
I’ve been through both horror (endless 100 hour weeks) and bliss (just attending meetings and not really stressing about much of anything) in that range. It’s highly variable.
Your standard of living might be comparable. Your retirement is taken care of, you have a reasonable amount of vacation, you have better job security, your health care, in most European countries, has much less hassle, and your property costs are lower.
I am seriously considering a move if my husband can find an academic job over there. The retirement won't be a great lure (fewer years in the system) but we almost have enough to coast from here, so it's about the rest.
Amazon has a terrible reputation for internal infrastructure issues, with "on call" being a truly shitty experience for employees. aka burn out over a year is common
Note that there's likely to be some variation per team, but Amazon is famously bad, so ... ;)
Taxes in the bay area can be insane - ~40% if I remember correctly. On top of that you have crazy-expensive healthcare, and crazy expensive housing costs.
~100k€ in (western) Europe may be comparable to ~200k€ in Bay Area.
I'd argue it's the opposite. We're coming off a decade of free money driving a second tech boom.
If interest rates stay elevated, and value investing becomes valuable again, it will be interesting to see how the tech space transforms. When start-ups have to compete with money market funds or even treasuries for investor cash, things become orders of magnitude tighter and more challenging.
Yes, though Switzerland approaches it. If you want to see how much people of various levels of experience get paid at different companies and in different locations go to levels.fyi
Americans get paid much, much more than anyone else.
I'm 20 years into programming and a senior architect and lead on an enterprise project.
I don't even make that first number.
But I value certain things way more than other things, and my current job provides it. Fully remote, leaves me completely alone to accomplish what they need done (and I get it done), unlimited vacation, great benefits, zero pointless meetings (almost an empty calendar).
I'm sure these other companies offer some of that but 500k?! That is absurd.