How to get full profiling with Nsight system for a particular process

debjit.adak · May 21, 2024, 3:33am

Hi,

I am using deepstream-6.4, trying to profile my custom pipeline with all element, and figure it out where is the bottleneck and fix it. I create a topic with deepstream as well I attach the topic also here for better understanding .
I have a server.py —> this will read config of multiple cameras and it will create a process for building the pipeline.
My problem is I’m not able see my pipeline element how much time it’s taken.
THIS IS THE COMMAND I USED -------> /opt/nvidia/nsight-systems/2024.3.1/bin/nsys profile --trace=cuda,cudnn,cublas,osrt,nvtx --python-backtrace=cuda --python-sampling=true -d 120 --delay=60 python3 server.py "

I will share the profile report as well.
Nvidia_forums_nsight_system.zip (1.2 MB)

What I want is to check pipeline how much time does it taking for each element. I need this help quickly.

deepstream topic link :- Profiling Nsight system with deepstream-6.4

I would like to say go through above topic once to understand properly.

dofek · May 21, 2024, 11:44am

Hi debjit.adak,
Nsight Systems captures traces of the target app and its child processes.
You wrote:

THIS IS THE COMMAND I USED -------> /opt/nvidia/nsight-systems/2024.3.1/bin/nsys profile --trace=cuda,cudnn,cublas,osrt,nvtx --python-backtrace=cuda --python-sampling=true -d 120 --delay=60 python3 server.py "

The command line above is good, with one caveat. To collect Python backtrace for CUDA API you must enable the CUDA backtrace collection feature by adding --cudabacktrace=all. Note that this feature may incur significant overhead. You can opt to collect backtraces only for specific types of CUDA API calls. See cli profile command switch options for more details.

Looking into the report file that you attached it seems:

The profile command line is different from the one in your post. See attached screenshot.
CUDA API calls and NVTX annotations were not collected.

I suggest you try to capture a report with the command line you posted above, possibly with the modification I suggested.

Doron

debjit.adak · May 21, 2024, 12:07pm

@dofek I added what you said, Now my command looks like "/opt/nvidia/nsight-systems/2024.3.1/bin/nsys profile --trace=cuda,cudnn,cublas,osrt,nvtx --cudabacktrace=all --python-backtrace=cuda --python-sampling=true -d 120 --delay=60 python3 server.py "
and attaching the report as well. My question is in this report I don’t see cuda and NVTX any information.
new_report.zip (10.1 MB)

I don’t have idea why CUDA API calls and NVTX annotations are not collecting. Check now, new report with correct command and suggest…
Can you help me with it.

dofek · May 21, 2024, 12:29pm

Are the child processes created using fork()?
Can you try to add the line
multiprocessing.set_start_method('spawn')
somewhere at the beginning of your code and try to profile it?

debjit.adak · May 21, 2024, 4:13pm

No ! in server.py creating main process under this pipeline creation is a child process. for process I’m using python multiprocessing.

multiprocessing.set_start_method(‘spawn’)
This if I try to add in my code as you said beginning But “RuntimeError: context has already been set” This error is coming !

Guy_Sz · May 22, 2024, 8:04am

Hi @debjit.adak
Can you please make sure that you put the line “multiprocessing.set_start_method(‘spawn’)” at the very top of your Python code? It should be before any imports (besides “multiprocessing” of course).
Also, you can try to change the line to “multiprocessing.set_start_method(‘spawn’, force=True)”

debjit.adak · May 22, 2024, 4:21pm

@Guy_Sz @dofek

I have done the same thing ‘’‘’ multiprocessing.set_start_method(‘spawn’, force=True) “”" what you said to add. But the same error is kind of same “” RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module: ""

Guy_Sz · May 22, 2024, 7:27pm

Right, my bad. The set_start_method() must be called inside the if __name__ == '__main__' clause. Can you try that?

If you see the previous error: “RuntimeError: context has already been set”, it means that the start method had already been set. Note that this can happen in a kind of implicit way, for example multiprocessing.get_start_method() will set the start method as a side effect. Also, datasets.load_dataset() sets the start method.
So, the call to set_start_method() must be before all that.

utkrishtp · September 23, 2024, 12:10pm

@Guy_Sz
Hey, I am also facing the same issue where nsys is not profiling the cudakernela and apis in the subprocesses being launched.
But given my setup and requirements, we have to use fork() and cannot use spawn().
I am using the below command, but not helping me:

nsys profile --trace cuda -o arxiv_gpu_shm --force-overwrite true  --trace-fork-before-exec true python node_clas.py --dataset ogbn-arxiv --epoch 1

I have tried with two nsys versions as below:

NVIDIA Nsight Systems version 2024.2.1.106-242134037904v0
NVIDIA Nsight Systems version 2022.1.3.3-1c7b5f7

Attached file for ref:
arxiv_gpu_shm.nsys-rep.zip (489.4 KB)

Topic		Replies	Views
Profiling Nsight system with deepstream-6.4 DeepStream SDK cudnn	13	493	May 21, 2024
nsys CUDA trace works for threads, but not for subprocesses Profiling Linux Targets	3	2393	May 13, 2019
Profiling DCGan Tutorial Spins forever Nsight Compute	13	1195	June 7, 2020
Nsight-system failed to start profiling Profiling x86 Windows Targets	9	2528	October 12, 2021
Call stack is visible/captured only for some CUDA kernels (broken backtraces) Profiling Linux Targets	5	1599	December 29, 2022
Parallel Nsight CUDA Programming and Performance	0	651	May 18, 2011
Profile command cannot be used more than once with the same agent Profiling Linux Targets	6	1529	July 23, 2020
Segmentation Fault: Running nvidia systems on Ubuntu Profiling Linux Targets nsight	8	1619	September 2, 2021
Updated Nsight Systems and lost CUDA API trace Profiling Embedded Targets	11	2264	February 1, 2022
Nsight Systems does not collect CUDA events Profiling Linux Targets	21	9525	January 11, 2023

How to get full profiling with Nsight system for a particular process

Related topics