GPUs are not really the ideal architecture for running neural networks; they are heavily bottlenecked by memory bandwidth and struggle to keep all their tensor cores supplied with data.
There is significant room to make more specialized neural network accelerators with new compute-in-memory architectures.
If the brain can run 86 billion neurons on 30W it must be possible.
There are already some companies doing specialised inference hardware, Cerebras Systems for example. Such designs are still early days and I wouldn't be surprised to see more innovation there. Though because custom silicon design takes time I expect a multi-year cycle.
For training, not sure. But even if training runs on GPUs, once you have the model the main cost is inference.
There is significant room to make more specialized neural network accelerators with new compute-in-memory architectures.
If the brain can run 86 billion neurons on 30W it must be possible.