Memory Access

Hello,

I have a problem accessing shared memory.

if I allocate the memory by the following way:

__shared__ char shared_mem[8192];

and access it (the first 4 Bytes) that way:

unsigned int test1 = *(unsigned int *)(shared_mem);

Ill get the nvcc compiler error:

“Error: Unaligned memory accesses not supported”

I dont know why this error occours, because the access is aligned to

4 Byte and starts at the total offset of the allocated memory…

I figured out that I can handle this problem allocating the shared memory

as an array of 4 Byte values:

__shared__ unsigned int shared_mem[8192/4];

as a result, I dont get the compile error and for the first test

everything seems to work fine…

BUT!

Now I have another problem…

in contrast the the upper solution with the char array, the profiler now tells

me that cuda has to serialize the wrap threads for memory acces…

But it should be a broadcast because all threads access the same area…

Is there another way or workaround to solve the problem with the

“Unaligned memory accesses” ???

Thank you!

hi micha,

unsigned int test1 = *(unsigned int *)(shared_mem);

as nvcc says this access is definitly not 4byte aligned, because shared_mem % 4 != 0.

yes, now shared_mem is 4byte-aligened as every int.

hmm depends on how u access memory right now …

but serialization seems quite propable with 8bit-access.

how could u compare to former code, when nvcc didnt compile? ;)

yours,

moik

Hi mikemoik,

thanks…
seems like you can only handle with 4 byte aligned arrays in shmem…
is there no chance to handle with char arrays and access them only in 4 byte aligned way?