The CUDA profiler displays both static & dynamic shared memory per block.
What is the difference between the two?
Static shared memory per block has non-zero values even when no shared variable is defined in the kernel function. What are stored in the static shared memory?
I do have this problem too and it’s really confusing me.
In my codes, there is none declaration of
__shared__
variable/function. But the profiler reported that there were 40 of static shared memory.
The program run fast however, despite executed on mobile G210M (From 2800ms to 8-10ms on CUDA)…so i’m very excited to see when i compare it on my GTX 560Ti at home.
Surprisingly, the code run slower at just 20ms. The profiler reported that any shared memory (dynamic/static) has null value. This comparation of value really tickle me.
FYI, on G210M profiler and the toolkit was 3.1 version, and for the GTX560Ti is latest 4.0 version. And the codes about DFT (Discrete Fourier Transform), using two level nested loop each has 2048 iteration.