Hi! I want to use 128*32=4096 float values in dynamic shared memory for smem_b, but I find out I have to use 8192 but not 4096, otherwise will have incorrect value!
extern __shared__ __align__(16 * 1024) float smem[];
float* smem_b = smem;
float* smem_a = (float*)&smem_b[4096];//Here! Must use 8192! Otherwise will be incorrect!
Also, does the smem_b here still only take up 128*32*4/1024=16KB? Or…use 8192 will use 32KB?? Thank you!!!
Thank you!!!