How are other folks out there handling the problem of timing and profiling multi-gpu programs?
I’ve just learned after searching these forums that the cutil timers aren’t thread safe, which explains the random segfaults and glibc memory corruption errors I was getting when I added timers to my multi-gpu implementation. However I didn’t manage to find any good solutions. What’s the alternative?
Is there a recommended high performance timing library out there that’s pretty easy to use? I could probably get away with using the standard time.h clock() functionality for kernel timing, however I’d also like to profile all of my cuda memcpys and those are normally so short that clock doesn’t have enough resolution to time it properly.
Is there a way to get the cuda profiler to generate multiple log files, one for each GPU? It wouldn’t be as nice as having an immediate print out of the time for each iteration in my program but at least I could write a script to mine the results I’m looking for.
A great example of why not to use cutil, I guess (tmurray will be thrilled)…
I think the linux version of cutil uses getimeofday for its timers, which isn’t a thread safe function. If you are using OpenMP or pthreads, then you best bet is probably to use native POSIX timers, which are thread safe.
I’ll read up on handling events manually and following eyal’s suggestion, thanks.
How do you setup multiple log files for each GPU? I was looking at the profile config and all I saw was a single CUDA_PROFILE_LOG environment variable. Or did you mean that you have a separate log file for each gpu using your own event timers?
Visual Profiler supports profiling multiple GPU programs. The profiler output for each GPU will be shown under a different context.
If you are using driver level profiling set the CUDA_PROFILE_LOG environment variable with a ‘%d’ so that different log files are generated for each context:
export CUDA_PROFILE_LOG=cuda_profile_%d.txt
Look at the document “CUDA_Profiler_2.3.txt” included under the “doc” directory in the CUDA 2.3 toolkit for more details.
Did anyone here as already experimented the %d trick on CUDA_PROFILE_LOG ?
It didn’t work for me as it always return 0 and it seems to me that devices owerwrite each other values.
I am using cuda 2.3 with driver 190.42 on a linux x86/64 cluster, tesla S1070 plugged on 2 nodes so you see only 2 device available per node.