> I think for the general public it sounds like what Nvidia has is unique (like ...

> I think for the general public it sounds like what Nvidia has is unique (like in gaming)

Gaming is where Nvidia is least unique; they run standards-compliant software by supporting well-documented APIs in-hardware (eg. DX12/Vulkan). AMD does the same thing, and even Intel is able to scale up a simple dGPU setup.

The hard part is software. Nvidia "won" because they spent 10 years developing CUDA when everyone else was smothering their OpenCL implementations in the crib. Now it doesn't matter what Nvidia ships, as long as it's fast and supports CUDA. The Blackwell/Grace systems seem like a good example of this.

TPUs have a slim shot at disrupting things; NPUs are pretty much dead-on-arrival. TPUs are hopeful because they genuinely represent a yak-shaving project that can ignore CUDA semantics to simply infer or train faster. It will be hard to make TPUs as efficient as TSMC-manufactured Nvidia chips, but there's room for disruption given how expensive a single GH200 is.

NPUs... I hate to be a pessimist, but they don't have a very bright feature. In the best of scenarios, an NPU is redundant silicon idling or in-usage to alleviate pressure from the more-powerful main GPU and CPU. Seems great on paper; until you start scaling to LLM/Stable Diffusion size. Now you're bottlenecked by such a low-power component, and have to switch to the GPU which was more powerful all-along. In the worst of cases, the NPU is an expensive waste of space on your SOC. Unlike TPU pods, I think there is no hope for NPUs to compete directly with CUDA. If anything it increases the demand for high-performance training compute, which Nvidia monopolizes.