There are a few avenues. Further specialization of hardware around LLMs, better quantization (3 bits/p seems promising), improved attention mechanisms, use of distilled models for common prompts, etc.
This would be optimizations, which is not really the same thing as moore's law-like growth which was absolutely mind-boggling, like it's hard to even wrap your head around how fast tech was moving in that period since humans don't really grok exponentials too well, we just think they look like second degree polynomials.
Probabilistic computing offers the potential of a return to that pace of progress. We spend a lot of silicon on squashing things to 0/1 with error correction, but using analog voltages to carry information and relying on parameter redundancy for error correction could lead to much greater efficiency both in terms of OPS/mm^2 and OPS/watt.
I am wondering about this as well - wondering how difficult it would be to build an analog circuit for a small LLM (7B?). And wondering if anyone's working on that yet. Seems like an obvious avenue to huge efficiency gains.
Seems very unrealistic when considering how electromagnetic interference works. Clamping the voltages to high and low goes some way to mitigate that problem.