Hello!
Well I wanted to test my GTX460 against my Corei5-750. For the CPU I compiled OpenCV with all the optimizations available for thsi processor(except TBB). The timing is done in this code segment:
#pragma omp parallel for
for(int j = 0; j<nIterations; j++){
cvSmooth( img, dst, CV_MEDIAN, 5, 5);
}
For the GTX 460 I used NPP. Since in CUDA all calls are asynchronous, I’m timing this code:
for(int j = 0; j<nIterations; j++){
eStatusNPP = nppiFilterMax_8u_C1R(oDeviceSrc.data(), oDeviceSrc.pitch(),
oDeviceDst.data(), oDeviceDst.pitch(),
oSizeROI, oMaskSize, oAnchor);
}
oDeviceDst.copyTo(oHostDst.data(), oHostDst.pitch());
In the CPU case all 4 cores go up to 100% as expected, in the GPU case one core goes to 100%, i think this is because it’s always sending the command to the GPU right?
But for my surprise… but libraries take the same time (~260 seconds) to make the processing!!!
nIterations = 100000
Image Size: 2048x1024
I’m using Linux Ubuntu 64 bits with the 260.19.06 driver.
nppGetGpuName() returns GeForce GTX 460 and with nvidia-settings i’m watching the GPU is at performance level 3 while processing.
Do I’m missing somethign with NPP?