Shared memory variables are strictly block scope and must be declared inside global functions.
Also the concept of extern in this context is meaningless. All CUDA declarations are file scope only. You cannot declare device variables extern, and there is no linking of device code that could make externally declared symbols work, even if it was permitted in the language.
I’m afraid you are misinformed. nvcc does support extern shared declarations outside the kerne, and the programming guide does state that it is possible. While I agree that it makes more sense to declare it inside the kernel, I do find myself copying and pasting old code that has it outside even now: proof https://codeblue.umich.edu/hoomd-blue/trac/…uteThermoGPU.cu - compiles and works 100% correct with CUDA 3.0.
To the OP:
Are you certain that you are requesting the proper amount of extern shared memory in your kernel launch:
kernel<<<grid, threads, shared_mem_bytes>>>(...)
This is a common error I often make that leads to run time errors or incorrect behavior in device code using extern shared memory arrays.
I don’t know. I didn’t take the time to try and understand what your code is doing. Just offered up a common cause of errors with this type of shared memory…
Which clearly you have this error. Without a 3rd parameter in the execution configuration, there will be 0 bytes allocated for the extern shared array. Meaning that all of your blocks will be writing data to random memory locations and may or may not produce correct results. On GTX 480, this will produce an unspecified launch failure.