Guess Optimum way for this code?

Hi,

Please tell me what could be the optimum way for this sample code?

global void StoreTable(signed short int* Table)
{
// Table size is 256 * 511
Table[i] = 8;
__syncthreads();
}

My table size 256511 (widthheight);

--------- Can you guess what could be the calling cinfiguration function for this?

I tried it with

dim3 blockSize(16, 16);
dim3 gridSize( 256/16, 511/16 );
StoreTable<<<gridSize,blockSize,0>>>(table);

but taking 4000+ ms.

please help me?

Me again,

Forgot to say I am using NVIDIA Quadro NVS 290.

Chk following code…

global void StoreTable(signed short int* Table)

{

    // Table size is 256 * 511	

 i  = BLOCK_SIZE*bi + ti;

   Table[i] = 8;

   __syncthreads();

}

count = 256 * 511

BLOCK_SIZE = 32

GRID_SIZE = count / BLOCK_SIZE

dim3 dimBlock(BLOCK_SIZE,1,1);

dim3 dimGrid(GRID_SIZE,1,1);

thanks Tushar.

the time is 2719ms.

but when I do it sequentially …

for (register int i= 1; Imax_Imin < 256; ++i)

{

	for (register j= 0; I_Imin < 511; ++j)

	{

		Table[i* 511 +j] = 8;

	}

} It takes 300+ms.

so by using threds it should be atleast below 300ms.

how can I get this?