eglCreateStreamKHR Error When Running TF-TRT

doruk.sonmez1 · January 7, 2022, 10:16pm

Hi all,

I have a problem when running a TF-TRT model which is ssd_mobilenet_v1 in a Docker container on JetPack 4.6.1. I was able to use one container of mine which includes the following components in every release starting from JetPack 4.3 to JetPack 4.5.1:

nvcr.io/nvidia/l4t-base:r32.4.4
CUDA 10.0
cuDNN 7.1.6
TensorRT 5.1.6
TensorFlow 1.13.1
Protobuf 3.8.0

Here is the error I get when my code tries to convert a saved_model.pb to TensorRT plan and just before my script opens the camera for inference:

F tensorflow/contrib/tensorrt/log/trt_logger.cc:42] DefaultLogger Assertion failed: eglCreateStreamKHR != nullptr
dla/eglUtils.cpp:57
Aborting...

Aborted (core dumped)

I already have my env variables set as the follows:

LD_LIBRARY_PATH=/usr/local/cuda-10.0/targets/aarch64-linux/lib::/usr/lib/aarch64-linux-gnu/tegra/:/usr/lib/aarch64-linux-gnu/tegra/

PATH=/usr/local/cuda-10.0/bin/nvcc:/usr/local/cuda-10.0/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# dpkg -l | grep TensorRT
ii  graphsurgeon-tf                      5.1.6-1+cuda10.0                      arm64        GraphSurgeon for TensorRT package
hi  libnvinfer-dev                       5.1.6-1+cuda10.0                      arm64        TensorRT development libraries and headers
ii  libnvinfer-samples                   5.1.6-1+cuda10.0                      all          TensorRT samples and documentation
hi  libnvinfer5                          5.1.6-1+cuda10.0                      arm64        TensorRT runtime libraries
hi  python3-libnvinfer                   5.1.6-1+cuda10.0                      arm64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev               5.1.6-1+cuda10.0                      arm64        Python 3 development package for TensorRT
ii  tensorrt                             5.1.6.1-1+cuda10.0                    arm64        Meta package of TensorRT
ii  uff-converter-tf                     5.1.6-1+cuda10.0                      arm64        UFF converter for TensorRT package

But I can not figure out what NVIDIA changed with the latest JetPack release that could brake a isolated, containerized application. A container should remain an isolated environment and I should be able to use it since there is no major changes between JetPack 4.5.1 and JetPack 4.6.1 that could prevent me doing this. Is there any chance there is a bug with the DLA codes in the new release since the source of the error points that direction?

AastaLLL · January 10, 2022, 2:58am

Hi,

You will need to use the same OS version between container and Jetson base.
Please upgrade your Xavier into JetPack4.6 and run the container again.

Thanks.

doruk.sonmez1 · January 10, 2022, 5:27am

Hi,

I am already using JetPack 4.6.1 on AGX Xavier. I also built my container on top of nvcr.io/nvidia/l4t-base:r32.6.1 so that it looks like the following:

nvcr.io/nvidia/l4t-base:r32.6.1
CUDA 10.0
cuDNN 7.1.6
TensorRT 5.1.6
TensorFlow 1.13.1
Protobuf 3.8.0

Now I get the following error:
tensorflow.python.framework.errors_impl.NotFoundError: /usr/lib/aarch64-linux-gnu/libnvinfer.so.5: undefined symbol: NvMediaDlaGetMaxOutstandingRequests

But if I build the same stack using nvcr.io/nvidia/l4t-base:r32.5.1 and run it on JetPack 4.5.1, eveything works like a charm. I know there is an upgrade for the DLA that causes the NvMediaDlaGetMaxOutstandingRequests error but as you can see everything gets stuck on JetPack 4.6.1.

To summerize, there is a lose-lose situation:

If I totally isolate my container from JetPack 4.6.1 host by editing /etc/nvidia-container-runtime/host-files-for-container.d/l4t.csv, the error is:

F tensorflow/contrib/tensorrt/log/trt_logger.cc:42] DefaultLogger Assertion failed: eglCreateStreamKHR != nullptr
dla/eglUtils.cpp:57
Aborting...

Aborted (core dumped)

If I don’t isolate the container and use only the /etc/nvidia-container-runtime/host-files-for-container.d/l4t.csv as it is from the mappings, I end up with the following error:

tensorflow.python.framework.errors_impl.NotFoundError: /usr/lib/aarch64-linux-gnu/libnvinfer.so.5: undefined symbol: NvMediaDlaGetMaxOutstandingRequests

Thanks.

doruk.sonmez1 · January 10, 2022, 2:38pm

Okay I tried too many combinations to fit the pieces each other but the Jetpack 4.6.1 is only making everything much harder and complicated. Whatever the new updates are, they are totally against the idea of CONTAINERIZATION.

I had an issue before with the new TensorFlow-TensorRT releases as ı described in this topic 1 year ago: TensorRT 6.0.1 performs worse than TensorRT 5.1.6 on Jetson AGX Xavier. And you explained why newer TensorFlow versions is much slower. Therefore I continued to use it across all JetPacks released since then.

But what we have right now is exactly why there are Docker and containers but I can not even use them because NVDLA -which I have never even intended to use it in my container- got some version upgrades. I just want to be able to use my container with TensorFlow 1.13.1.

AastaLLL · January 12, 2022, 6:57am

Hi,

The error comes from the dependency between OS and library.
To use r32.6.1, you will need to upgrade the TensorRT into v8.0.

It’s more recommended to use pure TensorRT rather than TensorFlow (even with integrated TRT).
We have tested TensorRT performance for every JetPack release.
And all the required libraries can be installed through the SDK manager directly.

Thanks.

system · February 2, 2022, 5:33am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
INTERNAL_ERROR: Assertion failed: eglCreateStreamKHR != nullptr Jetson AGX Xavier	11	2139	October 18, 2021
NvMediaDlaGetMaxOutstandingRequests Error When Using Docker Container Jetson Xavier NX tensorrt , cuda , tensorflow , docker	5	521	February 2, 2022
Parameter check failed while using dla Jetson AGX Xavier	4	812	June 3, 2019
Software from container can't find libnvinfer.so.6 Jetson AGX Xavier tensorrt	5	843	October 18, 2021
Gaze Demo for Jetson/L4T - Xavier AGX (L4T R32.4.2) Not Working Jetson AGX Xavier jetson-inference , docker	4	1410	October 18, 2021
Could not build docker image in DL4AGX DRIVE AGX Xavier General tensorrt	6	794	June 7, 2020
Issues when using DLA with TensorRT 7.1.3 compared to TensorRT 6.0.1 Jetson AGX Xavier nvbugs , dla	12	2117	January 12, 2022
trt-yolo-app compile error on TX2 DeepStream SDK	8	1489	October 12, 2021
TF-TRT conversion is broken on 32.7.1 Jetson AGX Xavier tensorflow , docker	11	1656	April 6, 2022
Error running TensorRT TensorRT	3	1416	October 12, 2021

eglCreateStreamKHR Error When Running TF-TRT

Related topics