I need to allocate two dimensional arrays dynamically in shared memory in my kernel. I tried the following code where I declare my arrays as one dimensional arrays and then I locate the elements in these arrays using appropriate indexing. However, the additional operations required in indexing the elements this way is hurting the performance of my kernel (for some inputs it takes double the time with static memory allocation). Anybody knows another way to allocate two dimensional arrays dynamically in shared memory without hurting the performance?
extern shared float array;
float * sData = (float*) &array;
int * shared_index = (int*) &array[block_size*matrix_block];