Hi everyone,
A quick question here, I just finished coding on a matrix matrix multiplication using Cuda, now I am hoping to count the “Flop” to get some idea on the performance of my implementation. So I was just wondering if there is any tool that allows me to do it?
Thanks a lot!
Hi everyone,
A quick question here, I just finished coding on a matrix matrix multiplication using Cuda, now I am hoping to count the “Flop” to get some idea on the performance of my implementation. So I was just wondering if there is any tool that allows me to do it?
Thanks a lot!
What about time measuring? With CUDA asynchronous events you can mesaure execution time with a clock cycle resolution.
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, 0);
// LAUNCH KERNEL
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
float et;
cudaEventElapsedTime(&et, start, stop);
cudaEventDestroy(start);
cudaEventDestroy(stop);
et contains the elapsed time in milliseconds. Lower value means better algorithm :)