Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) Tesla V100-PCIE
• DeepStream Version 6.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 220.127.116.11
• NVIDIA GPU Driver Version (valid for GPU only) 470.129.06
• Issue Type( questions, new requirements, bugs) Segmentation Fault
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) Running multiple processes in parallel using the same deep stream python pipeline
Hi We are trying to utilise all the GPU RAM available and would like to have multiple AI Job workers which will accept jobs using a queue. We have created 5 different processes using the same deep stream pipeline all of them listening on a queue for a job. When a job is available one of the process will get the job and the inferencing will happen on that, if there are more jobs other idle pipeline processes will get the job and they start working up on it in parallel.
However we saw that the processes are crashing with segmentation fault when they are running in parallel and doing inferencing at the same time. Would like to understand if there is a work around running multiple deepstream pipeline processes in parallel.
Please note that the GPU RAM and CPU RAM both were monitored during the testing none of them even reached 60% their maximum capacity before getting the segmentation fault, so it definitely is not related to RAM is getting overloaded. Request your help on this.
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
the information is not enough, please provide simplified code to reproduce this issue, including input and configuration file .
I have shared the required files using personal message option. Request your help on this issue.
The media pipeline is based on deepstream sample 3 application.
We had run the code by just keeping the detector model and it gave us segmentation fault, which would indicate the issue is in the detector part of the code. When we removed our detector and used people net and tried running multiple parallel instances of the pipeline, the pipeline got stuck consistently. Would you be able to suggest a method on how to have multiple parallel pipelines running in parallel to do the jobs as and when needed.
I am not sure if the Tao-based samples would help us with the problem statement we are solving. I would like to reiterate what we are trying. We have a P40 based machine on azure. It has 23 Gigs of GPU RAM, our deep stream application is utilising somewhere close to 1.5 Gigs. Since there is much more we can do with this GPU than having a single deepstream pipeline handling a AI inference JOB of an average 15Seconds of Video, we wanted to run multiple deep-stream python processes in parallel on the same GPU. This is where we are encountering a segmentation fault, To recreate this issue we tried with the yolov3 detector and vehicle classification model as sgie in the pipeline. We saw that the segmentation fault occurs randomly. Since it is related to scaling of the solution, it would be great if you could provide insights into handling this problem.
sorry for the late reply, need to fix this issue first. you can use this gstreamer command to test your model.
$cd //opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-image-meta-test && gst-launch-1.0 filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams//sample_720p.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! mux.sink_0 nvstreammux name=mux batch-size=1 width=1920 height=1080 nvbuf-memory-type=3 ! nvinfer config-file-path=./ds_image_meta_pgie_config.txt ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! nvdsosd ! nvvideoconvert ! nvv4l2h264enc ! h264parse ! qtmux ! filesink location=./out.mp4
you can get ds_image_meta_pgie_config.txt parameters description in this link: Gst-nvinfer — DeepStream 6.1.1 Release documentation
Sure, let us try this and get back to you
Do you still need support for this topic? Or should we close it? Thanks.
Request you to keep the ticket open. We were experimenting and could not get time to continue the thread.
OK, keep thread open and kindly update status once you have, thanks.
There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.