Toolkit 3.0 bug with extern __shared__ terrible template temporary type troubles

I think I hit an nvcc bug… some code which works in toolkit 2.3 won’t even compile in 3.0. The difference seems to be some compiler confusion when two different functions both declare a pointer to the dynamic shared memory using the [font=“Arial Black”]extern shared int mydata;[/font] style of declaration. In toolkit 2.3, these declarations are local, but in 3.0 they seem to have become global and therefore can collide and kill the compiler.

I use this in templates a lot when I know the first (n) bytes of my dynamic shared memory is scratch space, so I can call functions which use that dynamic buffer for their own temporary storage.

This is the troubling definition.

template<typename T>

__device__ T sumReduce(T val)


  extern __shared__ T array[]; 



Attached is a short demo program that sums the numbers from 0 to 255 both as ints and floats. It compiles and runs fine in toolkit 2.3 but in toolkit 3.0 you get a compilation error:

*** start of expression ***

[lvalue] variable: name = array, node type: array [0] of float

*** end of expression *** internal error: check_type_of_variable_node: enk_variable has wrong type

1 catastrophic error detected in the compilation of "/tmp/tmpxft_00001b37_00000000-4_externtemplate.cpp1.ii".

I was about to simply report this to NV as a bug, but before I did, I realized that the description of the extern shared declaration in the programming guide doesn’t really talk about what namespace the declaration lives in. Perhaps my scratch space definitions themselves are illegal but they just happen to work in 2.3? (2.16 KB)

I just hit the same problem. How did you fix yours ?

tried CUDA 3.1 beta yet?

OK, I found that :

//extern __shared__ T s_array[];  // replace by following 2 lines :

	extern __shared__ char __s_array[];

	T* const s_array = reinterpret_cast<T*>(__s_array);

which comes from here :…bba69a881c/diff

which solved to problem. Now I have new compilation errors :

Unaligned memory accesses :verymad:

Hum… actually, I am more tempted by downgrading to 2.3 where my code compiled all right…

I added and it is fixed.

extern shared align(4) char __s_array;

SPWorley’s test case compiles and runs fine in 3.1 beta.

NVIDIA just closed my report as “Not a Bug” but I bet that’s because it’s working in 3.1 and therefore needed no action.

BTW, NVIDIA, I like the more rapid toolkit updates! That helps stabilize the tool chain. Some people would argue that rapid releases are unstable since everyone runs different versions, but when a tool is newer and evolving, rapid releases are still better since we get the small bug fixes (like the namespace issue above) fixed fast and we don’t need to use workarounds for a year.