wierd problem copying from global to shared memory

I have a weird problem when loading data into shared memory from global memory.
I have a two dimensional array in shared memory and a one dimensional array in device memory. i access data like this:

shared[i][j] = global[k]

it immediately crashes when debugging normally but doesnt crash in emudebug.
also if i first copy the global portion to a local variable and then copy the local variable to the shared then it works fine

temp = global[k]
shared[i][j] = temp

works fine

both global, shared, and temp are of type float3
any help would be appreciated

thank you.

I suppose that it is a problem with CUDA. I faced with similar things (http: // forums.nvidia.com/index.php? showtopic=32495).
In my case I copy the data to the shared memory using type unsigned char (without dependence from the type of the original data). For example

//unsigned shar *global, *shared;

for (int i = 0; i <N; i ++)
shared [i] = global [i];

If I use complex types (for example float, float2, float3) then I faced with mistakes. My method longer, but it works.
If in your case the structure (temp = global [k]; shared [i] [j] = temp;) works than use it. I suppose that this problem will be solved in new version Tookit.

From the declaration of float3 you see that it has a forced alignment. Do you account for that in the shared array layout?


i’m not 100% sure what you mean by alignment but if i’m using the same built-in type (float3) for both of them then why should it act the way it’s acting

  • i mean the kernel works perfectly in emu debug and in normal debug when using the intermediate temp variable but when i assign immediately it crashes in normal debug but not in emu debug - makes no sense to me at all