Hi,
I might be missing something here (or too tired :) ) I have a GTX280 and use the following code:
If I calculate the GFLOPS correctly I get : 26 * 26 * 3 * 4 * 256 * 2700 * 3
which stands for : dimGrid.x * dimGrid.y * TimeLoops * threads * TraceIndex * operations in the inside loop
This takes ~87ms → that means I get ~180GFLOPS ???
Any suggestions are more then welcomed…
dim3 mydim( 26, 26 * 3 );
kernel<<< mydim, 256 >>>( pOut);
…
[codebox]
global void kernel( float *pOut )
{
float f1 = 0.0f;
for ( int iCurrentTimeLoop = 0; iCurrentTimeLoop < 4; iCurrentTimeLoop++ )
{
for ( int iTraceIndex = 0; iTraceIndex < 2700; iTraceIndex++ )
{
f1 += iTraceIndex;
}
pOut[ threadIdx.x ] += f1;
}
[/codebox]