Profiling multi GPU programs


Is there a way to get cuda profile info for all contexts of a multithreaded, multigpu application?
When previously developing an MPI application I’ve reset CUDA_PROFILE_LOG with getenv() prior to initialising the Cuda context, but I can’t do this in a multithreaded app as the environment is global data.
Ideally, I’d like to be able to set the profile file programmatically via the API, is this possible?