Hi,
I want to get the kernel computation time.
unsigned int timer1;
cutCreateTimer(&timer1);
cutStartTimer(timer1);
Muld<<<dimGrid, dimBlock>>>(Ad, Bd, wA, wB, Cd);
cudaMemcpy(C, Cd, size, cudaMemcpyDeviceToHost);
cutStopTimer(timer1);
Since the kernel call is asynchronous I have to put cutStopTimer behind cudaMemcpy.
Is there another way to get the kernel time?
Best,
Yixun