my gou is GM107(maxwell)
for examplem the menory latency is 400 cycles and how to calculate the number of warps need to hidew the latency?
64 warps * number of SMs
How to calculate the 64 warps, could you explain to me?
and 64 warps means 2048 threads? is it too much?
64 warps = 2048 threads
look up the cuda max threads per multiprocessor spec