Did any one tried before to use cuda profiler with a Multithreading application. What I got Exactly is:
Application that uses QT threads to run two threads. Each one of them create its own context but both of the contexts are on the same device (using driver API Version 4.0). I can’t use the GUI profiler because the application run with MPI also ( but this another issue). I can use the command line profiler which gather information from one thread only (the main thread) and completely ignore the second thread!
What is the problem you face with using Visual profiler for this MPI based application?
Regarding using command line profiler - make sure that you use “%d” in COMPUTE_PROFILE_LOG name. (If COMPUTE_PROFILE_LOG environment variable is not defined the profiler will log data to ‚ “cuda_profile_%d.log”). This will generate separate profiler output files for each context - with ‘%d’ substituted by the context number. Refer the Compute Command Line Profiler User Guide for details. However note that profiler counters can be collected only for one active context at a time. But other profiling data can be collected for all contexts.
I think my problem is related to the usage of Multiple context from two different threads. So, the profiler is reporting counters from one of them only!