eglCreateStreamKHR Error When Running TF-TRT

Hi all,

I have a problem when running a TF-TRT model which is ssd_mobilenet_v1 in a Docker container on JetPack 4.6.1. I was able to use one container of mine which includes the following components in every release starting from JetPack 4.3 to JetPack 4.5.1:

Here is the error I get when my code tries to convert a saved_model.pb to TensorRT plan and just before my script opens the camera for inference:

F tensorflow/contrib/tensorrt/log/trt_logger.cc:42] DefaultLogger Assertion failed: eglCreateStreamKHR != nullptr
dla/eglUtils.cpp:57
Aborting...

Aborted (core dumped)

I already have my env variables set as the follows:

LD_LIBRARY_PATH=/usr/local/cuda-10.0/targets/aarch64-linux/lib::/usr/lib/aarch64-linux-gnu/tegra/:/usr/lib/aarch64-linux-gnu/tegra/

PATH=/usr/local/cuda-10.0/bin/nvcc:/usr/local/cuda-10.0/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

# dpkg -l | grep TensorRT
ii  graphsurgeon-tf                      5.1.6-1+cuda10.0                      arm64        GraphSurgeon for TensorRT package
hi  libnvinfer-dev                       5.1.6-1+cuda10.0                      arm64        TensorRT development libraries and headers
ii  libnvinfer-samples                   5.1.6-1+cuda10.0                      all          TensorRT samples and documentation
hi  libnvinfer5                          5.1.6-1+cuda10.0                      arm64        TensorRT runtime libraries
hi  python3-libnvinfer                   5.1.6-1+cuda10.0                      arm64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev               5.1.6-1+cuda10.0                      arm64        Python 3 development package for TensorRT
ii  tensorrt                             5.1.6.1-1+cuda10.0                    arm64        Meta package of TensorRT
ii  uff-converter-tf                     5.1.6-1+cuda10.0                      arm64        UFF converter for TensorRT package

But I can not figure out what NVIDIA changed with the latest JetPack release that could brake a isolated, containerized application. A container should remain an isolated environment and I should be able to use it since there is no major changes between JetPack 4.5.1 and JetPack 4.6.1 that could prevent me doing this. Is there any chance there is a bug with the DLA codes in the new release since the source of the error points that direction?

Hi,

You will need to use the same OS version between container and Jetson base.
Please upgrade your Xavier into JetPack4.6 and run the container again.

Thanks.

Hi,

I am already using JetPack 4.6.1 on AGX Xavier. I also built my container on top of nvcr.io/nvidia/l4t-base:r32.6.1 so that it looks like the following:

Now I get the following error:
tensorflow.python.framework.errors_impl.NotFoundError: /usr/lib/aarch64-linux-gnu/libnvinfer.so.5: undefined symbol: NvMediaDlaGetMaxOutstandingRequests

But if I build the same stack using nvcr.io/nvidia/l4t-base:r32.5.1 and run it on JetPack 4.5.1, eveything works like a charm. I know there is an upgrade for the DLA that causes the NvMediaDlaGetMaxOutstandingRequests error but as you can see everything gets stuck on JetPack 4.6.1.

To summerize, there is a lose-lose situation:

  • If I totally isolate my container from JetPack 4.6.1 host by editing /etc/nvidia-container-runtime/host-files-for-container.d/l4t.csv, the error is:
F tensorflow/contrib/tensorrt/log/trt_logger.cc:42] DefaultLogger Assertion failed: eglCreateStreamKHR != nullptr
dla/eglUtils.cpp:57
Aborting...

Aborted (core dumped)
  • If I don’t isolate the container and use only the /etc/nvidia-container-runtime/host-files-for-container.d/l4t.csv as it is from the mappings, I end up with the following error:
tensorflow.python.framework.errors_impl.NotFoundError: /usr/lib/aarch64-linux-gnu/libnvinfer.so.5: undefined symbol: NvMediaDlaGetMaxOutstandingRequests

Thanks.

Okay I tried too many combinations to fit the pieces each other but the Jetpack 4.6.1 is only making everything much harder and complicated. Whatever the new updates are, they are totally against the idea of CONTAINERIZATION.

I had an issue before with the new TensorFlow-TensorRT releases as ı described in this topic 1 year ago: TensorRT 6.0.1 performs worse than TensorRT 5.1.6 on Jetson AGX Xavier. And you explained why newer TensorFlow versions is much slower. Therefore I continued to use it across all JetPacks released since then.

But what we have right now is exactly why there are Docker and containers but I can not even use them because NVDLA -which I have never even intended to use it in my container- got some version upgrades. I just want to be able to use my container with TensorFlow 1.13.1.

Hi,

The error comes from the dependency between OS and library.
To use r32.6.1, you will need to upgrade the TensorRT into v8.0.

It’s more recommended to use pure TensorRT rather than TensorFlow (even with integrated TRT).
We have tested TensorRT performance for every JetPack release.
And all the required libraries can be installed through the SDK manager directly.

Thanks.