We are using multiple Tesla C2050 GPUs on multiple MPI processes. Every process uses a single GPU during analysis. Is there anyway to profile processes spawn from MPI? Using Visual profiler in the standard way (Process -> mpiexec.exe, arguments -> -n 2 executable.exe) gives no result.
We are using CUDA 5 on Windows platform.