Reading the visual profiler document [1], there are some questions unanswered. I have run a program with 16 threads (cpu utilization is about 1600 for one process) and nvidia-smi shows 1 gpu process.
In visual profiler, I see
Context 1 (compute and streams)
Context 2 (compute and streams)
First context is nearly nothing! while the second context has some data. When I open streams, I see stream 30 ~ Stream 60.
How can I understand such numbers? Why there are two contexts? why there isn’t Stream 29?!
[1] http://docs.nvidia.com/cuda/profiler-users-guide/index.html