How are other folks out there handling the problem of timing and profiling multi-gpu programs?
I’ve just learned after searching these forums that the cutil timers aren’t thread safe, which explains the random segfaults and glibc memory corruption errors I was getting when I added timers to my multi-gpu implementation. However I didn’t manage to find any good solutions. What’s the alternative?
Is there a recommended high performance timing library out there that’s pretty easy to use? I could probably get away with using the standard time.h clock() functionality for kernel timing, however I’d also like to profile all of my cuda memcpys and those are normally so short that clock doesn’t have enough resolution to time it properly.
Is there a way to get the cuda profiler to generate multiple log files, one for each GPU? It wouldn’t be as nice as having an immediate print out of the time for each iteration in my program but at least I could write a script to mine the results I’m looking for.
I’m using CUDA 2.3 on 64-bit linux.