Hello,
I have a problem accessing shared memory.
if I allocate the memory by the following way:
__shared__ char shared_mem[8192];
and access it (the first 4 Bytes) that way:
unsigned int test1 = *(unsigned int *)(shared_mem);
Ill get the nvcc compiler error:
“Error: Unaligned memory accesses not supported”
I dont know why this error occours, because the access is aligned to
4 Byte and starts at the total offset of the allocated memory…
I figured out that I can handle this problem allocating the shared memory
as an array of 4 Byte values:
__shared__ unsigned int shared_mem[8192/4];
as a result, I dont get the compile error and for the first test
everything seems to work fine…
BUT!
Now I have another problem…
in contrast the the upper solution with the char array, the profiler now tells
me that cuda has to serialize the wrap threads for memory acces…
But it should be a broadcast because all threads access the same area…
Is there another way or workaround to solve the problem with the
“Unaligned memory accesses” ???
Thank you!