Hi,
Please tell me what could be the optimum way for this sample code?
global void StoreTable(signed short int* Table)
{
// Table size is 256 * 511
Table[i] = 8;
__syncthreads();
}
My table size 256511 (widthheight);
--------- Can you guess what could be the calling cinfiguration function for this?
I tried it with
dim3 blockSize(16, 16);
dim3 gridSize( 256/16, 511/16 );
StoreTable<<<gridSize,blockSize,0>>>(table);
but taking 4000+ ms.
please help me?
Me again,
Forgot to say I am using NVIDIA Quadro NVS 290.
Tushar
3
Chk following code…
global void StoreTable(signed short int* Table)
{
// Table size is 256 * 511
i = BLOCK_SIZE*bi + ti;
Table[i] = 8;
__syncthreads();
}
count = 256 * 511
BLOCK_SIZE = 32
GRID_SIZE = count / BLOCK_SIZE
dim3 dimBlock(BLOCK_SIZE,1,1);
dim3 dimGrid(GRID_SIZE,1,1);
thanks Tushar.
the time is 2719ms.
but when I do it sequentially …
for (register int i= 1; Imax_Imin < 256; ++i)
{
for (register j= 0; I_Imin < 511; ++j)
{
Table[i* 511 +j] = 8;
}
} It takes 300+ms.
so by using threds it should be atleast below 300ms.
how can I get this?