Error when using Triton Server for Inference on deepstream-imagedata-example

shettyashwath2010 · July 10, 2020, 2:12pm

Please provide complete information as applicable to your setup.

• Hardware Platform: GPU
• DeepStream Version : 5
• TensorRT Version
• NVIDIA GPU Driver Version : 440.64.00

Hey , So I was able to run the deepstream-ssd-parser python app(deepstream_python_apps/deepstream_ssd_parser.py at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub) on a server running gcloud,
The next thing I wanted to was use the triton server for inference in the deepstream imagedata example (deepstream_python_apps/deepstream_imagedata-multistream.py at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub) . So I made the nescessary changes by adding the helper libraries and changing the nvinfer element to nvinferserver and parsing output using pgie_probe. For the config file I used the same one as the ssd-parser example

Wierdly the pipeline works for the first 718 frames and then exactly at the 719th frame throws this error

E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2020-07-10 10:22:44.807278: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1
Aborted (core dumped)

It seems to be something to do with cuda and tensorflow but I don’t get why the ssd-parser example was working then

What are your suggestions to combat this issue?

mchi · July 13, 2020, 2:43am

can you elaborate what your did step by step instead of roughly mentioned “nescessary changes” so that we can easily repo and look into the issue?

shettyashwath2010 · July 13, 2020, 10:42am

Sure I am sorry for not attaching code before
Main Code References : ssd example : (deepstream_python_apps/deepstream_ssd_parser.py at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub)
imagedata-example : deepstream_python_apps/deepstream_imagedata-multistream.py at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub)

This is the code I am using : deepstream-ssd-image.py - Google Drive

Step 1 : **This pipeline code is almost exactly the same as the pipeline for the imagedata example **
**the only change I made here was to add the pgie probe from the deepstream ssd example and change inference element to infer server. **
The config I am using is exactly the same as the one in the ssd example

Step 2 : tiler`probe function code :
here the changes I made from the image data example are only in cases of accessing the class name of corresponding object ids and vice-versa as classes for both models are different

the error I get occurs on the exact same frame everytime

shettyashwath2010 · July 13, 2020, 10:52am

In an attempt to solve this problem what I tried was working from bottom to up.
I started off with the ssd-example you guys provided and iteratively added features from the image-data example.

This code succesfully processes all frames of the video but runs into to following error whenever I try to process image using opencv

this function call : n_frame=pyds.get_nvds_buf_surface(hash(gst_buffer),frame_meta.batch_id)
leads to a segmentation fault

the code for this app : deepstream-ssd-parser-multi-src.py - Google Drive

I have the filter for RGBA frames in my pipeline , but still I run into this issue

What do you think could be the possible reasons for this?

mchi · July 13, 2020, 2:00pm

As I asked, can you give step by step repo?
where to put your file? what’s the command to run for the repo? Can you do me the favor ?

shettyashwath2010 · July 13, 2020, 3:14pm

Regarding your questions about where to put the file
I’m running the code on a docker container of a vm instance in google cloud.
Please download those two file I had shared
1: deepstream-ssd-image.py(deepstream-ssd-image.py - Google Drive )
2: deepstream-ssd-parser-multi-src.py(deepstream-ssd-parser-multi-src.py - Google Drive)
and place them in the folder
sources/python/apps/deepstream-ssd -parser (folder was generated for me when I extracted the python bindings )

also add sample_720p.h264 from the samples/streams folder to this folder

once in the folder

please run : python3 deepstream-ssd-image.py file:///opt/nvidia/deepstream/deepstream-5.0/sources/python/apps/deepstream-ssd-parser/sample_720p.h264 frames
to create issue 1(Illegal Access)

and run
python3 deepstream-ssd-parser-multi-src.py file:///opt/nvidia/deepstream/deepstream-5.0/sources/python/apps/deepstream-ssd-parser/sample_720p.h264 frames

to create issue 2(Segmentation Fault)

shettyashwath2010 · July 14, 2020, 11:13am

Hey I think I know what is causing both issues
This piece of code
if not is_aarch64():
# Use CUDA unified memory in the pipeline so frames
# can be easily accessed on CPU in Python.
mem_type = int(pyds.NVBUF_MEM_CUDA_UNIFIED)
streammux.set_property(“nvbuf-memory-type”, mem_type)
nvvidconv.set_property(“nvbuf-memory-type”, mem_type)
nvvidconv1.set_property(“nvbuf-memory-type”, mem_type)
tiler.set_property(“nvbuf-memory-type”, mem_type)

if it is included leads to illegal access and if not causes the segmentation fault
I guess it has to be included as without this the frames cannot be read using the get_nvds_buf_surface function without this

But do you know why it may be causing the illegal access error?

mchi · July 14, 2020, 11:27am

which line does it crash?

if is_aarch64() is true, it indicates it’s Jetson/ARM64 platform.
So for your GPU platform, this include is needed

shettyashwath2010 · July 14, 2020, 11:39am

if the above code snippet is included it does not crash at a specific line but rather at a frame.

It will process a certain amount of frames properly and then after a specific frame it will crash throwing this error

2020-07-14 10:57:25.769295: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2020-07-14 10:57:25.769393: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1
Aborted (core dumped)

This error will always occur at the same frame and the specific frame number depends on which video is being processed which is very wierd.

mchi · July 14, 2020, 2:11pm

Hi @shettyashwath2010
could you tune a smaller or bigger tf_gpu_memory_fraction in ssd tf configure file and see if it helps this issue?

shettyashwath2010 · July 14, 2020, 2:18pm

hey I have played around with that parameter in an attempt to solve this problem before

I tried values like 1,0.01,0.05 and also 0 it gave the same error at exactly the same frame

shettyashwath2010 · July 16, 2020, 10:48am

hey do you have any other suggestions to try or any guesses for why this issue is occuring. Is there any more information I can give to help you?

I found a similar issue in TensorFlow CUDA-related crash when training simple network · Issue #25866 · tensorflow/tensorflow · GitHub , where they said it could either be hardware issue or it could be issue with cuda version and suggested to downgrade cuda

as a last-ditch effort I am trying to reinstall deepstream5 with cuda-11 firt then cuda 10. Will CUDA 10.0 or 11 not work with deepstream?

mchi · July 17, 2020, 3:09am

Hi @shettyashwath2010,
Thanks for the repo steps!
I can reproduce the issue on dGPU platform. I’ll try on Jetson later, and then see how to debug this issue.
And, thanks for the TF link.

BTW, in this post, someone suggested to try new TF.

DS5.0 does not support CUDA-11 or CUDA 10.

Thanks!

shettyashwath2010 · July 17, 2020, 4:06am

Hey I am working with dgpu platform only. So if you are able to find solution for that it’s perfect.

Thanks for the link ill try updating tensorflow, but something I’ve noticed is the ds-triton docker container does not contain the tf library in python ( it says module not found when I try to import), I see it is present in the form of a .so file Could you advice me how to update in such cases?
If I install the newer version of tf in python3 will the change be reflected on deep stream ( I doubt this will work)

mchi · July 18, 2020, 3:36am

please check the change in attached file and try again with below command for sample_720p.h264
/opt/nvidia/deepstream/deepstream-5.0/sources/python/apps/deepstream-ssd-parser# rm -rf frames; rm ~/.cache/gstreamer-1.0/*
/opt/nvidia/deepstream/deepstream-5.0/sources/python/apps/deepstream-ssd-parser# python3 deepstream-ssd-image.py file:///opt/nvidia/deepstream/deepstream-5.0/sources/python/apps/deepstream-ssd-parser/sample_720p.h264 frames

deepstream-ssd-image.py-v_ok.txt (26.8 KB)

shettyashwath2010 · July 18, 2020, 5:30am

Hey, could I know what do you think was the technical reason behind the error?
I briefly looked at the changes you made, it seems to be primarily in the resolution of the streammux and tiler.
Is it something I would have to change according to the video source I am using?

Also if you don’t mind me asking could I know any resources you suggest looking into so that I can debug errors like these on my own ?

mchi · July 18, 2020, 8:53am

Thanks for chasing this!

After further debug, I found that juts changing “streammux.set_property(‘height’, 1080)” to “streammux.set_property(‘height’, 1088)” or some other values, e.g. 1096, the crash is gone.
And, “streammux.set_property(‘height’, 1080)” works on Jetson platform, so seems there is issue in tf preprocessing on dGPU platfrom.

You can verify this WAR and use it temporarily.
We will check internall if this is a known issue.

Thanks!

shettyashwath2010 · July 18, 2020, 9:09am

ohhh got it. Thank you so much for helping in debugging.
I have tested your solution and it works perfectly fine for 1080 streams too

mchi · July 23, 2020, 3:47pm

Hi @shettyashwath2010
This can be also solved by removing below line, since tiler connects to nvvidconv, leave the mem-type to be negotiated by these two plugins.

tiler.set_property(“nvbuf-memory-type”, mem_type)

shettyashwath2010 · July 24, 2020, 12:29pm

ohh great. This also works for me too.

Also I have some queries regarding efficiently communicating between processes using deepstream? Would you recommend creating another thread for this?. I am trying to send the entire frame and data accessed in the probe function to another parallelly running process (using python multiprocessing library) . I have tried using shared memory through multiprocessing queues and server process managers and all of them heavily bottleneck the fps. What would you suggest trying?