Hi, I am fairly new to OpenCL and have been attempting to implement a DSP algorithm
to compare its performance on different GPU’s compared to the standard CPU implementation.
Though I have achieved a massive performance gain, what I find strange is that I get the same
gain on GT240 as a much faster GTX 480. My program executes two kernels, and while the one
speeds up on the GTX 480 the other slows down.
GT240: Kernel 1: 226us, Kernel 2: 103us
GTX 480: Kernel 1: 35us, Kernel 2: 293us.
Below is the code for Kernel 2, which is almost 3 times slower on the bigger card.
_
_kernel void max_curve_fit_gpu (__global float* fCorrelationResult,
const int iNumAngles,
const int iTotalBins,
__global float* fDirection_rad,
const int iBatchIndex)
{
const int iBinNum = get_global_id(0);
const int iCorrBatchOffset = iBatchIndex*(iNumAngles*iTotalBins) + iBinNum*iNumAngles;
const int iResultBatchOffset = iBatchIndex*iTotalBins;
// Find the max for this bin
float fMax = 0;
int iMaxIndex = 0;
for (int iAngle=0; iAngle<iNumAngles; iAngle++)
{
if (fMax < fCorrelationResult[iCorrBatchOffset + iAngle])
{
fMax = fCorrelationResult[iCorrBatchOffset + iAngle];
iMaxIndex = iAngle;
}
}
// Do the curve fit
float fPrev, fNext, fA, fB, fAxis;
fPrev = fCorrelationResult[iCorrBatchOffset + (iMaxIndex-1)%iNumAngles];
fNext = fCorrelationResult[iCorrBatchOffset + (iMaxIndex+1)%iNumAngles];
fB = (fPrev - fNext)*0.5f;
fA = (fNext + fPrev) - fMax*2.0f;
fAxis = fB / fA;
fDirection_rad[iResultBatchOffset + iBinNum] = iMaxIndex + fAxis;
}
Can somebody please point out what could be causing this?