Strange behaviour with more kernels


I tried to put 2 kernels into one file and declaread at the top:

shared float4 smem1[512];
shared float4 smem2[512];

and then

global func1() {
… // something with smem1

global func2() {
… // something with smem2

I got some weird results back from func1 (although func2 was never executed).

I tried to comment func2 and smem2 out and then func1 worked properly. I tried to figure out what caused the problem but after commenting in func2 and smem2 it still worked and I don’t have any idea what caused the problem.

Any ideas?


It is a memory problem “both arrays are starting in the same memory address!!”
See Variable Type Qualifiers (shared) section on pages 19,20 in the CUDA_Programming_Guide_0.8.pdf for details about how to define arrays in shared memory without having this problem.


Yes, but this shouldn’t affect anything because the kernels don’t run concurrently at the same time.

Or did I miss something?

This is not correct. Please read section carefully again. Shared arrays with specified dimension are always disjoint. You only need to calculate offsets for extern shared arrays (where the size is specified on kernel invocation).


I don’t know, what you say is correct, but may be there is a memory initialization or something like that the runs in parallel with the kernel and corrupts your calculations!

let’s try at first to do the correct allocation and if the problem still there, then we think!!

Sorry, I didn’t notice that our case is static allocation!!

I tried the example with 2 statically allocated arrays and 2 kernels, and I called only one kernel… everything was OK and no data corruption happened!!!

please check your main program, sure there is something wrong in your code!!

I finally got the problem … perhaps a CUDA bug?

The declaration at the top was wrong … this works perfectly:
shared char array[512*sizeof(float4)]

global func1 {
float4* ptr = (float4*) array;
… //something with ptr

global func2 {
float4* ptr = (float4*) array;
… //something with ptr

Why does this work and my first attempt not?