Hi, I have this dynamic allocation of shared memory on my kernel code:
[codebox]extern shared char array3;
global void aKernel(int* d_a, int* d_b, …){
__shared__ int s_a;
__shared__ int s_b;
if(threadIdx.x == 0){
s_a = *(d_a+0);
s_b = *(d_b+0);
}
__syncthreads();
float* s_c = (float*)array3;
float* s_d = (float*)&s_c[s_a];
float* s_e = (float*)&s_d[s_a];
float* s_f = (float*)&s_e[s_a];
float* s_g = (float*)&s_f[s_a/s_b];
float* s_h = (float*)&s_g[s_a/s_b];
…
}[/codebox]
The kernel call code is:
[codebox]memory = sizeof(float)s_a3+sizeof(float)*(s_a/s_b)*3;
aKernel<<< grid, block, memory >>>(d_a, d_b, …);[/codebox]
Then I get the following error on compile:
1>./c:\cuda\include\device_functions.h(1330): Error: Unaligned memory accesses not supported
When not using dynamic allocation in that kernel, no compilation errors appear and the execution runs successfully. I have other kernels in my code using the same technique to allocate shared memory, but none of them presents this problem. I think the problematic line is probably this: ‘float* s_c = (float*)array3;’.
Could someone please help me to solve this? Thanks.