All you have to look at is ASIC miners. Once they had them, they were 10x faster...

adastra22 · 2025-11-11T12:29:04 1762864144

The fundamentals are different. Bitcoin mining is not intrinsically suited to acceleration on a GPU. It is a not-very-wide serial integer operation.

AI inference on the other hand is basically just very large floating point tensor matrix multiplication. What does an ASIC for matmul look like? A GPU.

jasonwatkinspdx · 2025-11-11T22:49:04 1762901344

No, GPUs are not a particularly ideal architecture for AI inference, it's just inference needs way more memory bandwidth than a general purpose CPU's memory hierarchy can handle.

> What does an ASIC for matmul look like?

A systolic array, and ultimately quite different than a GPU. This is why TPUs et all are a thing.

In general with a systolic array you get a quadratic speedup. For example with a 256x256 array, it takes 256 cycles to shift operations in and out, but in doing that you accomplish 65k MACs in 512 cycles for a speedup of 128x over serial.

adastra22 · 2025-11-12T00:25:00 1762907100

The tensor cores in your NVIDIA chip are systolic arrays.

jasonwatkinspdx · 2025-11-12T03:32:19 1762918339

Not in the same way as a TPU, and again, the ISA and overall threading architecture matter.

skylurk · 2025-11-11T13:23:46 1762867426

Sorta? If that was the full story, TPU would not be a thing.

lukeschlather · 2025-11-11T18:31:07 1762885867

I'm not an expert in chip design by any means but I think it's fair to say that TPU is a marketing term and it's not substantially different from a GPU like an H100. H100's cores are also called "Tensor Cores."

jasonwatkinspdx · 2025-11-11T22:51:23 1762901483

You are entirely mistaken. The TPU and GPU are organized very differently particularly with how the memory subsystem works.

In the big picture, TPUs are systolic arrays. They don't have threading, divergence, or similar.

GPUs in the big picture are SIMT, a hybrid of SIMD and multithreading where individual data streams in SIMD are relaxed to allow them to diverge somewhat.

Memory Wise the TPU can keep the partial products right in the array. Parameters and weights are held in large on die scratch memories, and backing those are streams coming from the HBM. The TPU acts as a single giant CISC coprocessor, and has much more predictable memory and communication patterns vs a GPU, which its design exploits for higher efficiency at inference vs GPUs.

So even if they use the word "tensor" and both have HBM based memory systems, how those are actually architected is very different.

adastra22 · 2025-11-11T19:21:20 1762888880

TPUs are not fundamentally different or more efficient than NVIDIA hardware. They are just cutting out the middleman.

EvgeniyZh · 2025-11-11T14:01:02 1762869662

Asic for matmul is systolic array more or less