I am working on an application that uses the Robot Operating System (ROS). Some of the nodes of the app use the GPU, and I want to profile the whole execution of one of them. The app works correctly, but when I execute the node with nvprof it seems that it gets stuck even before starting the node itself, and I do not get any kernel information. This node generates other processes, which some of them execute CUDA kernels. I tried with ‘–profile-all-processes’ and ‘–profile-child-processes’, with and without sudo, but I obtain the same results. I also used ‘–profile-from-start off’, delimiting the start and the end of the profiling to the start and end of the node, within the code. In this case, the profiling starts with the node but it does not get even my own kernels, and the GPU does not consume the same memory than in normal execution. Again, it seems that the execution gets stuck at some point, like if nvprof would not allow the execution to go through the generated processes.
I’m using some closed source libraries, so the profile of my kernels is not enough. This application is inside a Docker container, so I cannot use NVVP or NSight. I tried to use nvprof with simpler programs inside the container, and it works fine. All executions with nvprof explained above were about 24 hours, so time is not the problem.
Thanks in advance.