How much does memory and compute overlap in a GEMM?

isaaclee2313 · February 3, 2020, 2:00pm

Do I think of the overall latency as memory latency + compute latency or max(memory latency, compute latency)? The former would imply that most of memory and compute are not overlapping, and latter would mean the opposite.

Thanks!

NVES_R · February 3, 2020, 10:31pm

Hi,

Are you referring a specific latency that’s being reported by some tool? Or latency in general?

If it’s the latter, there is a general guide on how GPU Performance is measured here: https://docs.nvidia.com/deeplearning/sdk/dl-performance-guide/index.html. The overall perf can be bottlenecked very differently by compute, memory, etc. depending on the problem.