i have multiple rtsp sources and multiple services(cv models) , my goal is to maximize the number of cameras and services i can run on a single jetson
this is what i am doing:
rtspsrc → nvv4l2decoder → tee ─┬──-> service_mux_A (batch=N) → nvinfer_A → probe
└──-> service_mux_B (batch=N) → nvinfer_B → probe
1 decoder per camera (shared across all services on that camera)
1 nvinfer per service type (shared across all cameras)
nvstreammux batch-size capped at 1 (TRT engines built with maxBatchSize=1)
CPU RGBA output from nvvideoconvert (no NVMM) to avoid VIC exhaustion
New camera hot-adds a decoder and connects its tee to all active service muxes simultaneously
can this be optimized to get a better throughput in any way???
Based on the title and content of your topic, it looks like it may receive better visibility and feedback in a different category. We took the liberty of moving it for you.
If this was an incorrect assessment, please send me a direct message.
Disclaimer: this moderation suggestion and message were generated with AI assistance.
Do you have multiple RTSP streams(cameras) to be added to the ineference pipeline dynamically? The sample /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-server can handle such case. Do you know the maximum number of the streams(cameras)?
What is the relationship bwteeen the two models “nvinfer_A” and “nvinfer_B”? If the two models will both infer all input video streams, only one “nvstreammux” is enough.
Where and how do you implement “CPU RGBA output from nvvideoconvert (no NVMM)” in the pipeline?
What do you do in the “probe”?
What do you mean by “get a better throughput”? The FPS value?
yes i have multiple rtsp streams to be added dynamically.
i dont know the maximum number of streams, thats what i want to know, i want the maximum possible streams.
both models are different, and the number of streams both of them infer is also dynamic , i add / remove streams during runtime
this is how i use nvvideoconvert
rtspsrc → nvv4l2decoder(num-extra-surfaces=0) → tee
tee → queue(leaky, max=2) → nvstreammux(batch=N) → nvinfer → nvvideoconvert(compute-hw=1)
→ capsfilter(video/x-raw,RGBA)
better throughput is number of streams i can add
also:
how does batch_size in the infer config matter to the number of streams the engine can handle?
i had an issue of “failed in mem copy”, which got fixed by using copy-hw=1, scaling-compute-hw=1, is this a correct fix?
then i rebuilt the engines with batch_size=32, and on test , upto 32-33 rtsp streams got added on the same infer, then i got this error “libnvrm_gpu.so: NvRmGpuLibOpen failed, error=6”, so is batch_size the maximum number of streams i can add? if yes can i change the batch_size at runtime without having to rebuild the engine pre-run?
From the DeepStream pipeline view, the maximum streams number it can support depends on the slowest part in the pipeline. You need to find out the bottleneck in your pipeline by yourself.
The model TRT engine performance with different batch size can be measured by the TensorRT tool “trtexec”
What did you do with the RGBA data after “nvvideoconvert” ?
Can you elaborate it clearly? The streams will be added/removed dynamically, but we want to know whether the two models inference on exactly the same streams at the same moment. E.G. when there are 5 streams added to the pipeline, will the model A inference on stream 1,2,3 while model B inference on stream 3, 4, 5? Or both model A and model B will infer on stream 1,2,3,4,5?
The nvinfer batch size is the TensorRT model engine batch size. If your model is built to batch size 32 engine, that means the engine can infer at most 32 frames at one time. If you build the batch size 1 model engine, you need to infer 32 times with the engine for 32 frames. Most models we have tried show that to infer 32 frames with batch size 32 engine for one time is faster than infer 32 frames with batch size 1 engine for 32 times. We don’t know about your models, you may need to measure the models by yourself.
It works.
No. I think I have explained the maximum number of streams depends on your pipeline.
yes both models inference on exactly the same streams at the same moment, both model A and model B will infer on stream 1,2,3,4,5
yes but i got this error “libnvrm_gpu.so: NvRmGpuLibOpen failed, error=6” exactly when 33rd stream is added with the batch_size=32 engine for both the models tested seperately , and the ram didnt actually exhaust, my models are Yolov11n and RF-DETRs .
what does this error mean “libnvrm_gpu.so: NvRmGpuLibOpen failed, error=6”
if i load the next streams after 32 streams on another infer of the same model , will that work and increase the number of streams?
how will the config parameter “interval” change the infer in my usecase, and what is the best suggested interval , considering all my streams are running at 20fps
It seems the fd exhaust. Please try “ulimit -n 4096”
Do you mean to add another pipeline? I think I have said the pipeline capability is decided by the slowest part, if the second pipeline shares the same resources, nothing will be changed.
The “interval” parameter is to skip the inference on some batches. If the bottleneck is the GPU loading of your models, it may help to improve the throughput. Every model is different, the different batch size TensorRT engines for the same model are different. The same model runs on different GPUs are different. The value is decided by your pipeline and GPU loading, you need to measure it by yourself.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.