Arrays in shared mem How to handle non-4byte elements ?

Programming guide shows how to declare arrays in the dynamically allocated shared mem:

extern __shared__ char array[];

__device__ void func() // __device__ or __global__ function


    short* array0 = (short*)array;

    float* array1 = (float*)&array0[128];

    int* array2 = (int*)&array1[64];


Everything looks reasonable. I declare four arrays in this manner: one array of floats and three arrays of integers. When three arrays of integers are actually integers (i.e. each element is int) everything works fine, but when I declare three arrays of ints as arrays of unsigned chars (in order to minimize shared mem usage) kernel crashes. This happens on device only, emulation mode does not crash.

Programming guide says a lot about bank conflicts and mem bandwidth, also it mentions that each access to the shared mem is 32 bit long - but there is no info on possible issues when working with data elements smaller than 4 bytes.

Why non-ints fail ? I have an impression that reading or writing to shared mem always affects 4 bytes regardless the fact that arrays consist of unsigned chars - so each read reads 4 chars instead of one as well as each write ruins three neighbours of the char that is actually should be rewritten.

Well … if someone else will face it.
Solution is extremely simple: each array start must be aligned on 4.

THanks for that post!

to be precise, every type must be aligned on its own size

Thank you for the clarification. However, this is what I observe: alignment on four not only makes things correct but gives significant speedup. On the other hand, alignment on it’s own size is a minimal requirement …