> For the CPU AMD Ryzen 7 9800X3D mentioned in the repo, just reading 100 bytes from RAM to L1 should take ~100 nanos.
It's important to note that throughput is not just an inverse of latency, because modern OoO cpus with modern memory subsystems can have hundreds of requests in flight. If your code doesn't serialize accesses, latency numbers are irrelevant to throughput.
It's important to note that throughput is not just an inverse of latency, because modern OoO cpus with modern memory subsystems can have hundreds of requests in flight. If your code doesn't serialize accesses, latency numbers are irrelevant to throughput.