Error 'Unaligned memory accesses not supported' in a dynamic allocation

Hi, I have this dynamic allocation of shared memory on my kernel code:

[codebox]extern shared char array3;

global void aKernel(int* d_a, int* d_b, …){

__shared__ int s_a;

__shared__ int s_b;

if(threadIdx.x == 0){

	s_a = *(d_a+0);

	s_b = *(d_b+0);



float* s_c = (float*)array3;

float* s_d = (float*)&s_c[s_a];

float* s_e = (float*)&s_d[s_a];

float* s_f = (float*)&s_e[s_a];

float* s_g = (float*)&s_f[s_a/s_b];

float* s_h = (float*)&s_g[s_a/s_b];


The kernel call code is:

[codebox]memory = sizeof(float)s_a3+sizeof(float)*(s_a/s_b)*3;

aKernel<<< grid, block, memory >>>(d_a, d_b, …);[/codebox]

Then I get the following error on compile:

1>./c:\cuda\include\device_functions.h(1330): Error: Unaligned memory accesses not supported

When not using dynamic allocation in that kernel, no compilation errors appear and the execution runs successfully. I have other kernels in my code using the same technique to allocate shared memory, but none of them presents this problem. I think the problematic line is probably this: ‘float* s_c = (float*)array3;’.

Could someone please help me to solve this? Thanks.

Just a suggestion that may work…

If you are using all floats (which it appears you are), could you try changing to a float shared array? I think I have attempted this before but I’m not sure if it worked correctly or not…

extern __shared__ float array3[];

Yes, all the vectors allocated were floats. I tried your solution and it worked perfectly. I still don’t know why the error appeared when accessing the extern char array in that kernel and worked for the rest, though.

Thank you very much!