I am currently writing a small application to compare different GPU sorting algorithms. My code for measuring looks like this:
[codebox]unsigned int timer;
cutCreateTimer(&timer);
cudaThreadSynchronize();
cutStartTimer(timer);
cudppSort(sortplan, ddata2, ddata1, size);
cudaThreadSynchronize();
cutStopTimer(timer);
*timerValue = cutGetTimerValue(timer);[/codebox]
This method works for several other algorithms but somehow not for the CUDPP sort function. I always get a time of 0.0, as if cudaThreadSynchronize() was not called. Am I missing something obvious? My system runs openSUSE 11.1 and has a Tesla c870 card with the most recent drivers & CUDA 2.2.