Profiling Nsight system with deepstream-6.4

Please provide complete information as applicable to your setup.

**• Hardware Platform ------> GPU
• DeepStream Version ------> 6.4
• TensorRT Version --------> 8.5
**• NVIDIA GPU Driver Version ----------> 545

I’m using my custom pipeline with Nvidia deepstream, Where I’m reading the camera config inside(server.py) it I am calling my pipeline.py as a python process , I’m not getting element wise report, and how to check where is my bottleneck. I will attach the report file also. and paste which command I run " /opt/nvidia/nsight-systems/2024.3.1/bin/nsys profile --trace=cuda,cudnn,cublas,osrt,nvtx --python-backtrace=cuda --python-sampling=true -d 120 --delay=60 python3 server.py "

When I am running deepstream-python_apps rtsp-in_out same command I able to see nvinfer communication and NVTX lot of information ! I will attach this report also. command I ran :- /opt/nvidia/nsight-systems/2024.3.1/bin/nsys profile --trace=cuda,cudnn,cublas,osrt,nvtx --python-backtrace=cuda --python-sampling=true -d 120 --delay=60 python3 deepstream_test1_rtsp_in_rtsp_out.py -i rtsp://10.90.6.161:8554/cam192247

I really want to understand How I can effectively profile my custom deepstream pipeline , and figure out how fine performance bottleneck and how fist I can debug it.
Nvidia_forums_nsight_system.zip (18.7 MB)

My custom pipeline report :- report3.nsys-rep
deepstream-rtsp-in-out report :- sample_app.nsys-rep

Can you share your pipeline?

The Nsight log may be too trivial to be analyzed if we don’t know how your pipeline looks like and how do you configure the elements in the pipeline.

Is the pipeline for live sources such as RTSP stream, camera devices,…? What is your expected performance with the pipeline and why?


I am attaching my pipeline image for your reference, I am using RTSP camera stream, I am checking How many camera I can run with laptop 3070 GPU with 25FPS processing and figure out where is the bottleneck and How I can fixed it.

1.There are more than necessary "nvstreammux"s and "nvstreamdemux"s in the pipeline. The pipeline can be simplified.
2. Seems you are using a model with batch size 8. Have you measured the model’s performance in 3070? The “trtexec” tool can help you to measure.
3. For the RTSP streams, the RTSP server quality and network bandwidth may also impact the pipeline performance.

Nvstreammux and nvstreamdemux is necessary ( we are trying parallel pipeline with python )
I need help form you about Nsight system profiling !
network is good coz Im using lan connection same place. model performance is also fine.

1st check my profile report and give some suggestion about Why in our custom pipeline Not able to see element time and NVTX.

I don’t understand why you say so, I can see the NVTX and time for elements in the report file you sent here.

What I said is sample_app.nsys-rep is working fine. This is deepstream sample rtsp-in-out code.
another I shared that is our custom code … which I explain my starting conversation. please go through report3.nsys-rep this is created with our custom code. here I am not able to see element and NVTX

Seems the pipeline is not working. The biggest loading is python3. Please check your pipeline.py to make sure it can run.

@Fiona.Chen pipeline.py is calling internally under this biggest python3. server.py I’m reading the config and uri for each stream and it’s calling pipeline.py class internally. pipeline is running and giving me frame also and inferring also , I able to access metadata and process feed as well. My question is How I can able to see How my pipeline is working and how much time does it taking and other profiling, I need a help about this !

Do you know whether your “pipeline.py” is in the same process of your “server.py”? Nsight profile tool can only profiling the process you appoint(python3 serevr.py). Maybe you can raise topic to Nsight forum to check whether there is any method to profile the specific process. Latest Developer Tools/Nsight Systems topics - NVIDIA Developer Forums

No , inside server.py for creating pipeline I am running pipeline.py class as a process. Okey, I will create a topic in nsight system and refer this topic as a better understanding.

I’m attaching the topic here as well. @Fiona.Chen
TOPIC :- How to get full profiling with Nsight system for a particular process

It has nothing to do with DeepStream, you can get required data with the python deepstream-test3 sample.

I want to profile my pipeline ! I am already accessing the require data . I asked about nsight system, How I can profile my pipeline.