Problem in Time frame?


I have a function, Which is on CPU. and its calling environment is like this.

void CPUFun( … )
//allocating nearly 15 pointers with cudaMalloc();
//calling GPU functions 7 GPU functions here…

// ALL GPU function configs are <<<1,1>>>

GPUFun1 <<<1,1>>> ( … );
… upto …
GPUFun7<<<1,1>>> ( … );

int main()
// call CPUFun()
STARTTIME; // a macro
CPUFun( … );
STOPTIME; // a macro
Printf(" CPUFun() execution time", time );

among 7 GPU functions, 6 GPU functions are taking not more than 1ms execution time and one GPU function is taking 1.5 ms execution time.

but When I print the CPUFun() execution time, it is showing that 30+ ms.

Is this CPUFun() execution fine?
I think, theCPUFun() execution time should be not more than 10ms.
please help in this…

To get accurate timing measurements, use cudaThreadSynchronize() before stopping the timer.