Tensorrt python api bug when used simultaneously with pyds and gst bindings

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU: Tesla T4 (AWS’s g4dn-xlarge)
• DeepStream Version 7.0 (nvcr.io/nvidia/deepstream:7.0-samples-multiarch)
• TensorRT Version 8.6.1 (from nvcr.io/nvidia/deepstream:7.0-samples-multiarch)
• NVIDIA GPU Driver Version (valid for GPU only) 535.171.04
• Issue Type( questions, new requirements, bugs) bug
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) build attached dockerfile, run the attached test within the docker container

When trying to cast the trt tensor shape (ICudaEngine.get_tensor_shape()) to tuple, or iterate over it, application crashes with Aborted (core dumped).

Attaching minimal-case script to reproduce, along with a docker container within I did the test (renamed to .log to pass forum filters).
ds70-py310-trt-test.Dockerfile.log (1.3 KB)
trt-test2.py.log (1.1 KB)

Steps to reproduce:

  • build the container (eg docker buildx build -f ds70-py310-trt-test.Dockerfile -t ds70-py310-trt-test:local .)
  • run the container (eg docker run --rm -it --gpus all ds70-py310-trt-test:local)
  • run the scipt within the container (python3 trt-test2.py)
    I’m getting following output:
...

BINDING i=0, getting tensor name...
tname='input'
BINDING i=0, getting tensor shape...
tshape=(1, 3, 224, 224)
BINDING i=0, casting tensor shape to tuple...
Aborted (core dumped)

Note, the container contains the https://github.com/onnx/models/raw/main/validated/vision/classification/mobilenet/model/mobilenetv2-10.onnx which I used for testing.

Note, the script will build the engine file using trtexec first, and then run the test.

Note - bug dissapears when we remove import pyds or from gi.repository import Gst, but this obviously is not a solution - our code which manages deepstream and dispatches GPU-resident buffers from deepstream into trt (and other places, incl. cupy ops) is a python app, so we need both of those bindings.

This bug reminds of an another one I posted some time ago here:

Moving to TensorRT forum for better support, thanks.

1 Like

Has anyone had the opportunity to examine this issue?

Is there any additional information I could or should provide?

We are investigating this problem. Thanks

1 Like

Hi @weary.gunfighter , could you try to update the pybind11 version for pyds and reinstall the whl?

$cd /opt/nvidia/deepstream/deepstream/sources/deepstream_python_apps/3
rdparty/pybind11
$git checkout master
$cd ../../bindings/build/
$make
$python3 -m pip install --force-reinstall pyds-1.1.11-py3-none-linux_x86_64.whl

Ok, tried that and it didn’t work, nothing has changed…

Details:

I’ve removed these lines from my dockerfile:

RUN wget https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/releases/download/v1.1.11/pyds-1.1.11-py3-none-linux_x86_64.whl &&\
    python3 -m pip install pyds-1.1.11-py3-none-linux_x86_64.whl

and instead I’m doing this:

RUN mkdir pyds && cd pyds && \
    git clone -b v1.1.11 https://github.com/NVIDIA-AI-IOT/deepstream_python_apps.git && \
    cd deepstream_python_apps && \
    git submodule update --init 3rdparty/pybind11 && (cd 3rdparty/pybind11 && git checkout master) && \
    cd bindings && mkdir build && cd build && \
    cmake .. && make -j$(nproc) && \
    python3 -m pip install pyds-1.1.11-py3-none-linux_x86_64.whl

Reran the container, still having same error:

...
BINDING i=0, getting tensor name...
tname='input'
BINDING i=0, getting tensor shape...
tshape=(1, 3, 224, 224)
BINDING i=0, casting tensor shape to tuple...
Aborted (core dumped)

Also I’ve double-checked that I’m using pyds compiled with the newest (git master branch) pybind11.

Could you try to rebuild the whl and reinstall that in your docker?

$cd ../../bindings/build/
$make clean
$make
$python3 -m pip install --force-reinstall pyds-1.1.11-py3-none-linux_x86_64.whl

I’ve tried this initially and it haven’t worked.
Checked again just now to be sure, I’ve rebuilt the pyds, also re-confirmed that I’m using new pybind and that the installed pyds is the same one that I’ve built with this new pybind11 (note the checksums, they are the same):

root@36d2a6008307:/opt/nvidia/deepstream/deepstream-7.0/pyds/deepstream_python_apps/3rdparty/pybind11# git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
root@36d2a6008307:/opt/nvidia/deepstream/deepstream-7.0/pyds/deepstream_python_apps/3rdparty/pybind11# git describe
v2.11.0-114-gab955f15

root@36d2a6008307:/opt/nvidia/deepstream/deepstream-7.0/pyds/deepstream_python_apps/3rdparty/pybind11# cd ../../bindings/build/

root@36d2a6008307:/opt/nvidia/deepstream/deepstream-7.0/pyds/deepstream_python_apps/bindings/build# md5sum pyds.so 
da416d4b9c1955be2c0d0e8c5039e798  pyds.so

root@36d2a6008307:/opt/nvidia/deepstream/deepstream-7.0/pyds/deepstream_python_apps/bindings/build# python3 -m pip show -f pyds
Name: pyds
Version: 1.1.11
Summary: Install precompiled DeepStream Python bindings extension
Home-page: nvidia.com
Author: NVIDIA
Author-email: 
License: UNKNOWN
Location: /usr/local/lib/python3.10/dist-packages
Requires: pgi, PyGObject
Required-by: 
Files:
  pyds-1.1.11.dist-info/INSTALLER
  pyds-1.1.11.dist-info/METADATA
  pyds-1.1.11.dist-info/RECORD
  pyds-1.1.11.dist-info/REQUESTED
  pyds-1.1.11.dist-info/WHEEL
  pyds-1.1.11.dist-info/direct_url.json
  pyds-1.1.11.dist-info/top_level.txt
  pyds.so

root@36d2a6008307:/opt/nvidia/deepstream/deepstream-7.0/pyds/deepstream_python_apps/bindings/build# md5sum /usr/local/lib/python3.10/dist-packages/pyds.so
da416d4b9c1955be2c0d0e8c5039e798  /usr/local/lib/python3.10/dist-packages/pyds.so

Also confirmed that python is indeed loading this compiled pyds:

root@28974718835e:/opt/nvidia/deepstream/deepstream-7.0# strace -e trace=openat python3 trt-test2.py |& grep pyds.so
openat(AT_FDCWD, "/usr/local/lib/python3.10/dist-packages/pyds.so", O_RDONLY|O_CLOEXEC) = 3

Anyway, I tend to test using docker since thanks to the isolation it provides I’m sure I have a clean environment.
Pasting the updated dockerfile for reference - note, I don’t even have a prebuild pyds in this container, only the version compiled using master-branch pybind11.

FROM nvcr.io/nvidia/deepstream:7.0-samples-multiarch

SHELL ["/bin/bash", "-c"]

ARG DEBIAN_FRONTEND=noninteractive
ARG TZ=Etc/UTC

RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

RUN apt update && apt install --no-install-recommends --no-install-suggests -y \
    tzdata wget curl sed grep less git git-lfs vim # build-essential equivs

RUN /opt/nvidia/deepstream/deepstream-7.0/user_additional_install.sh

RUN apt install --no-install-recommends --no-install-suggests -y \
   python3-gi python3-dev python3-gst-1.0 python-gi-dev git meson \
   python3 python3-pip python3.10-dev cmake g++ build-essential libglib2.0-dev \
   libglib2.0-dev-bin libgstreamer1.0-dev libtool m4 autoconf automake libgirepository1.0-dev libcairo2-dev \
   libgstreamer-plugins-bad1.0-dev libgstreamer-plugins-base1.0-dev libgstreamer-plugins-good1.0-dev

RUN mkdir pyds && cd pyds && \
    git clone -b v1.1.11 https://github.com/NVIDIA-AI-IOT/deepstream_python_apps.git && \
    cd deepstream_python_apps && \
    git submodule update --init 3rdparty/pybind11 && (cd 3rdparty/pybind11 && git checkout master) && \
    cd bindings && mkdir build && cd build && \
    cmake .. && make -j$(nproc) && \
    python3 -m pip install pyds-1.1.11-py3-none-linux_x86_64.whl

# ======= Install tensorrt python bindings
RUN apt install python3-libnvinfer=8.6.1.6-1+cuda12.0

# ======= Test
RUN wget https://github.com/onnx/models/raw/main/validated/vision/classification/mobilenet/model/mobilenetv2-10.onnx
COPY trt-test2.py trt-test2.py

I used your dockerfile. But when I tried to reinstall the whl, it would report error. So I just changed the FROM to FROM nvcr.io/nvidia/deepstream:7.0-triton-multiarch.

$docker buildx build -f ds70-py310-trt-test.Dockerfile -t ds70-py310-trt-test:local .
$docker run --rm -it --gpus all ds70-py310-trt-test:local
$cd pyds/deepstream_python_apps/bindings/build/
$ make clean
$ make
$ python3 -m pip install --force-reinstall pyds-1.1.11-py3-none-linux_x86_64.whl

It won’t crash. Could you try that?

Ok, I think I know what is happening.

Force-reinstalling pyds has a side effect - it also updates its dependencies.
And I see that PyGObject is being updated from PyGObject 3.42.1 to PyGObject-3.48.2
After this everything works just fine.

Note, this pip install --force-reinstall pyds-1.1.11-py3-none-linux_x86_64.whl is NOT possible using the samples-multiarch base image, because the installation of PyGObject-3.48.2 fails there (I bet on some build-deps missing). So previously, instead of force-reinstalling I just did pip uninstall pyds followed by pip install pyds-1.1.11-py3-none-linux_x86_64.whl. This HASN’T updated the pygobject library, and so my test kept crashing.

Just made another test with prebuild pyds and updated pygobject, everything works fine.
So no need upgrading pybind11 in pyds, I can use prebuild pyds…
Pasting the dockerfile for reference.

FROM nvcr.io/nvidia/deepstream:7.0-triton-multiarch

SHELL ["/bin/bash", "-c"]

ARG DEBIAN_FRONTEND=noninteractive
ARG TZ=Etc/UTC

RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

RUN apt update && apt install --no-install-recommends --no-install-suggests -y \
    tzdata wget curl sed grep less git git-lfs vim # build-essential equivs

RUN /opt/nvidia/deepstream/deepstream-7.0/user_additional_install.sh

RUN apt install --no-install-recommends --no-install-suggests -y \
   python3-gi python3-dev python3-gst-1.0 python-gi-dev git meson \
   python3 python3-pip python3.10-dev cmake g++ build-essential libglib2.0-dev \
   libglib2.0-dev-bin libgstreamer1.0-dev libtool m4 autoconf automake libgirepository1.0-dev libcairo2-dev \
   libgstreamer-plugins-bad1.0-dev libgstreamer-plugins-base1.0-dev libgstreamer-plugins-good1.0-dev

RUN wget https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/releases/download/v1.1.11/pyds-1.1.11-py3-none-linux_x86_64.whl &&\
    python3 -m pip install pyds-1.1.11-py3-none-linux_x86_64.whl &&\
    python3 -m pip install --upgrade pygobject

# ======= Install tensorrt python bindings
RUN apt install python3-libnvinfer=8.6.1.6-1+cuda12.0

# ======= Test
RUN wget https://github.com/onnx/models/raw/main/validated/vision/classification/mobilenet/model/mobilenetv2-10.onnx
COPY trt-test2.py trt-test2.py
1 Like

Glad to hear that. We will add this to the FAQ for others to refer to.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.