Measuring the running time of CUDPP sort functions

I am currently writing a small application to compare different GPU sorting algorithms. My code for measuring looks like this:

[codebox]unsigned int timer;

cutCreateTimer(&timer);

cudaThreadSynchronize();

cutStartTimer(timer);

cudppSort(sortplan, ddata2, ddata1, size);

cudaThreadSynchronize();

cutStopTimer(timer);

*timerValue = cutGetTimerValue(timer);[/codebox]

This method works for several other algorithms but somehow not for the CUDPP sort function. I always get a time of 0.0, as if cudaThreadSynchronize() was not called. Am I missing something obvious? My system runs openSUSE 11.1 and has a Tesla c870 card with the most recent drivers & CUDA 2.2.