Separate compilation and shared memory reports

Hello,

There is something about separate compilation I don’t understand; it is related to shared memory usage.

Let’s take an example and use the code from the official documentation in cuda 5.5:
/usr/local/cuda-5.5/doc/html/cuda-compiler-driver-nvcc/index.html#examples

Let’s consider file a.cu, b.cu and b.h and and verbose flag to compile line:

nvcc --ptxas-options -v -arch=sm_20 -dc a.cu b.cu

It reports that kernel foo requires 64 bytes of shared memory (I would expect 8x4=32 bytes).

Now if you build the exact same app without separate compilation, the verbose report says kernel foo requires 32 bytes of shared memory as expected.

I don’t understand what’s going on here; is it real ? normal behaviour ? Is it a bug ?

Thanks for your help.

Pierre.