Running n concurrent instance of deepstream on a multi-GPU machine

• Hardware Platform (Jetson / GPU) GPU Tesla T4
• DeepStream Version SDK 5.0.1

Hello,

I am trying to run 12 deepstream pipelines in parallel (note that I am not using the feature of deepstream by which you can set up multiple sources because for each deepstream I am using different weight files and configs). I have 4 GPUs on a machine, I assign 3 deepstream to each GPU. However, the problem is that the number of concurrent deepstream pipelines cannot go above 9, i.e. the 10th pipeline kills one of the previous pipelines and then starts.

I was wondering if there is a way to increase this limit?

Thank you

1 do your have 4 telsa t4? please redirect your logs to file, then check if there is any exit printing.
2 I will reproduce on my machine.

Hi @fanzh , thanks for your response. Yes, I do have 4 tesla T4. I will do so and update.

Thanks

hi @MGh any udpate from you side? please monitor CPU and memory at the same time. and please share your configuaration file.
I have only one tesla t4, I started four ./deepstream-app -c …/…/…/…/samples/configs/deepstream-app/source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt, no one exited.
source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt (5.6 KB)

Hi @fanzh ,

I have been looking into the log but didn’t find out the reason. I have attached the log which shows ds_cg2 and ds_cfg3 are terminated after running all 12 deepstream. (I have named deepstream ith instance ds_cfgi).
The config file is attached as well.
golden_ds.txt (4.2 KB)
nohup.out (144.4 KB)

1 from your log, the information is not enough. how do you know ds_cg2 and ds_cfg3 are terminated ?
2 I run ten deepstreama-app on my T4, there was no app exited. here is the configuration file and test script,
source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt (5.6 KB)
test.sh (1.6 KB)
top.sh (149 Bytes)
please use top.sh to monitor the memory when testing, I suppose it is a memory related.

Hi @fanzh ,

I monitored the CPU usage, and it turned out that was the issue. I increased the number of cores and memory and it is working now. Thanks a lot for the pointers!!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.