Unaccounted memory consumption while running the kernel

local memory is stored in the same physical backing (GPU DRAM memory) as the logical global space. Therefore a large local allocation per thread will use up a large amount of this space. The amount will be determined by the size of the per-thread allocation and the characteristics of your GPU, which is why it appears to be “constant”.

The memory is not immediately released when your kernel finishes. It will/should be released when your application finishes.

1 Like