proper size for dynamic shared memory

Hello ,

I wanted to ask if Iam using the dynamic shared size properly becaause running my code ,it shows illegal address and I have to use a large size ,for example 500 in order to work.

``````typedef struct
{
float X,Y;
int Value;

} Points;
``````

In the kernel:

``````int x = ( blockIdx.x * blockDim.x ) + threadIdx.x;
int y = ( blockIdx.y * blockDim.y ) + threadIdx.y;

extern __shared__ Points points[ ];
float distX , distY;

if ( x < blockDim.x * gridDim.x && y < blockDim.y * gridDim.y )
{
for ( int i = 0; i < NbOfNodes; i++ )
{

//Calculate distances for all the points
for ( int j = 0; j < NbOfNodes; j++ )
{

distX = points[ j ].X - x;
distY = points[ j ].Y - y;
...
``````

So , my shared memory size should be : ThreadsPerBlockX + ThreadsPerBlockY * BlocksPerGridX , right?

I am receiving the illegal address at distX.

In the calling function:

``````dim3 BlocksDim ( BlocksPerGridX , BlocksPerGridY );

for ( int i = 0; i < NbOfNodes; i++ )
{
//Call kernel
mykernel<<< BlocksDim,ThreadsPerBlock, DynamicSharedSize * sizeof(Points) >>>(
...
``````

Any ideas?

Thanks!

Shared memory is allocated per thread block, so this address computation using gridDim looks suspicious:

``````points[ gridDim.x * threadIdx.y + threadIdx.x ]
``````

Hmm …right! I missed that!Thank you!

So, I did that because I want to use 16 threads per block ( for x and y dimens) and 16 or 32 blocks per grid according to the number of elements.

The problem is that if I switchover the thread per blocks and blocks per grid dimensions ( hence use 32 thread and 16 blocks ) , I have worse performance than the opposite…

Hmm… This didn’t work unfortunately…

What is the correct size that I have to use?

ANy ideas?
Thanks!

I forgot to mention that the input data values ( X and Y ) are in the range 0 - 256 .

Hmm. I changed to :

``````points[ NbOfNodes * threadIdx.y + threadIdx.x ].X 	    = X[ i ];
``````

and it seems to work.