I am currently writing a small application to compare different GPU sorting algorithms. My code for measuring looks like this:
[codebox]unsigned int timer;
cudppSort(sortplan, ddata2, ddata1, size);
*timerValue = cutGetTimerValue(timer);[/codebox]
This method works for several other algorithms but somehow not for the CUDPP sort function. I always get a time of 0.0, as if cudaThreadSynchronize() was not called. Am I missing something obvious? My system runs openSUSE 11.1 and has a Tesla c870 card with the most recent drivers & CUDA 2.2.