I wrote a stopwatch class that I use to time the performance of the CUDA part of my application. It might be usefull to others so I’d like to share it.
Where without the class you have to do:
cudaEvent_t start, stop;
float t;
cudaEventCreate(&start);
cudaEventRecord(start, 0);
stuffYouWantToTime();
cudaEventCreate(&stop);
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
cudaEventElapsedTime(&t, start, stop);
printf("time: %1.2f\n", t);
with the class you can do:
#include "cudastopwatch.h"
cudaStopWatch sw(10); //create the class and allow 10 stacked timers
sw.start();
stuffYouWantToTime();
printf("time: %1.2f\n", sw.stop());
and you can stack the timers like this:
#include "cudastopwatch.h"
cudaStopWatch sw(10); //create the class and allow 10 stacked timers
sw.start();
sw.start();
firstPartYouWantToTime();
printf("first part: %1.2f\n", sw.stop());
sw.start();
secondPartYouWantToTime();
printf("second part: %1.2f\n", sw.stop());
printf("overall: %1.2f\n", sw.stop());
I tested it on WXP64.
There’s still room for improvement as at the moment every stop() call synchronizes the GPU. A future version may work in a way that it only synchronizes when the user explicitly wishes to in order to avoid unnecessary busy waiting and to fully exploit the asynchronous behavior of the new API.
cudastopwatch.h (1.77 KB)