There is something about separate compilation I don’t understand; it is related to shared memory usage.
Let’s take an example and use the code from the official documentation in cuda 5.5:
Let’s consider file a.cu, b.cu and b.h and and verbose flag to compile line:
nvcc --ptxas-options -v -arch=sm_20 -dc a.cu b.cu
It reports that kernel foo requires 64 bytes of shared memory (I would expect 8x4=32 bytes).
Now if you build the exact same app without separate compilation, the verbose report says kernel foo requires 32 bytes of shared memory as expected.
I don’t understand what’s going on here; is it real ? normal behaviour ? Is it a bug ?
Thanks for your help.