Unable to Import PyTorch

• Hardware Platform (Jetson / GPU) GPU
• Docker Container Version 5.1-21.02-triton
• NVIDIA GPU Driver Version (valid for GPU only) 465.19.01
• Issue Type( questions, new requirements, bugs) Bug
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

docker run --rm -it --gpus all nvcr.io/nvidia/deepstream:5.1-21.02-triton
pip3 install torch
python3 -c “import torch”

I am unable to import PyTorch properly on the Deepstream Triton container. The error message I get is as follows:

Traceback (most recent call last):
File “<string>”, line 1, in <module>
File “/usr/local/lib/python3.6/dist-packages/torch/__init__.py”, line 197, in
from torch._C import * # noqa: F403
ImportError: /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so: undefined symbol: _ZNK3c104Type14isSubtypeOfExtERKSt10shared_ptrIS0_EPSo

From what I observed during my trial and error, this seems to be due to different installed PyTorch versions: v1.9.0 from pip, and an older version used by the PyTorch backend in Triton.
How can I solve this issue? I believe the ideal solution will be to isolate the 2 PyTorch versions from each other, but I need some guidance on how to do this.

1 Like

I did some more testing, and I found that removing /opt/tritonserver/lib/pytorch/ from the LD_LIBRARY_PATH environment variable will allow torch to be imported properly. However, this breaks the nvinferserver Python Gstreamer plugin.
To reproduce the problem:

docker run --rm -it --gpus all nvcr.io/nvidia/deepstream:5.1-21.02-triton
~# pip3 install torch
~# apt-get update && apt-get install -y python3-gi python3-dev python3-gst-1.0
~# LD_LIBRARY_PATH=/opt/tritonserver/lib:/usr/src/tensorrt/lib:/opt/jarvis/lib/:/opt/kenlm/lib/:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
~# python3
>>> import sys
>>> sys.path.append(‘/opt/nvidia/deepstream/deepstream/lib’)
>>> import pyds
>>> import torch
>>> import gi
>>> gi.require_version(‘Gst’, ‘1.0’)
>>> from gi.repository import GObject, Gst
>>> GObject.threads_init()
>>> Gst.init(None)
>>> Gst.ElementFactory.make(“nvinfer”, “infer-test”) # This returns a GstElement properly
>>> Gst.ElementFactory.make(“nvinferserver”, “inferserver-test”)

The last line fails to return a GstElement, instead giving this warning:

GStreamer-WARNING **: 03:30:16.719: Failed to load plugin ‘/usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_inferserver.so’: /opt/tritonserver/lib/pytorch/libtorchvision.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev

At this point of time I’m fairly convinced that these errors are due to PyTorch version incompatibilities. A possible workaround would be to pip install the same version of PyTorch as Triton, but I can’t find the version number of Triton’s PyTorch.

Below steps works on my side as the attached screenshot

$ docker run --gpus all -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix -v /home/$user/:/home/$user/ -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream nvcr.io/nvidia/deepstream:5.1-21.02-devel
# apt-get update
# pip3 install torch
# pip3 install numpy

The nvinferserverplugin is only available in the Triton Deepstream container. The container you are using is the devel container.
Additionally, the last line in your screenshot, Gst.ElementFactory.make(“nvinferserver”, “inferserver-test”)actually fails. That command should return a proper Gstreamer Object like the preceding command, but instead it returned a NoneType Object, indicating a failure (Gst.ElementFactory.make documentation).

1 Like

Got, I can reproduce the issue.
Will check and get back to you.

yes, agree with you, it’s caused by PyTorch version incompatibilities.
After installing torch, remove “/opt/tritonserver/lib/pytorch/” from the LD_LIBRARY_PATH, torch can then work, otherwise it will links to the lib under /opt/tritonserver/lib/pytorch/ and the failed due to incompatibilities. But, after changing LD_LIBRARY_PATH, nvinferserver plugin can’t work then.

According to Release Notes :: NVIDIA Deep Learning Triton Inference Server Documentation, triton is using a dedicated Pytorch repo: triton-inference-server/pytorch_backend, so the incompatibility may be expected.

May I know why you need torch in DS docker?

# pip3 install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio===0.8.1 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
# export LD_LIBRARY_PATH=/usr/src/tensorrt/lib:/opt/jarvis/lib/:/opt/kenlm/lib/:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
# python3 -c "import torch; print(torch.__version__)"
1.8.1+cpu
1 Like

I’m currently integrating a custom Python-based tracker into my pipeline, and that tracker requires PyTorch to generate the feature vector to do its tracking.

@azy , as @mchi pointed out, it’s pytorch version mismatch issue.
nvcr.io/nvidia/deepstream:5.1-21.02-triton is based on nvcr.io/nvidia/tritonserver:20.11-py3, Refer to the it’s column of the Frameworks Support Matrix, Pytorch is build on 1.8.0a0+17f8c32. The latest pytorch version you are trying is 1.9.x which is newer than triton’s prebuilt version. You need re-install older torch version 1.8.0

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
python3 -c "import torch; print(torch.__version__)"
1.8.0+cu111

If still see the issue, maybe try
export LD_LIBRARY_PATH=/usr/local/lib/python3.6/dist-packages/torch/lib:$LD_LIBRARY_PATH

1 Like

Thanks for the reply @windy60j34, but unfortunately, the version incompatibility issues are still present.

Without changing LD_LIBRARY_PATH, I still get the same undefined symbol error as the previous posts when I try to import torch.
If I change LD_LIBRARY_PATH, I can import torch successfully, but I cannot create an nvinferserver GStreamer Object.

Interestingly, if I defer import torch to after (unsuccessfully) creating the nvinferserver Object, I get a segmentation fault.

Here’s the Dockerfile and test code I’m using for your reference:
Dockerfile (263 Bytes)
deepstream-pytorch-test.py (471 Bytes)

I have also tried building the PyTorch wheel from the source code included in the nvcr.io/nvidia/pytorch:20.11-py3 NGC image, but I still get the same error after installing it on the nvcr.io/nvidia/deepstream:5.1-21.02-triton container.

triton is building pytorch from source and might shared some static libs. It might still have problem. Maybe you can try DeepStream 6.0 EA is based on docker pull nvcr.io/nvidia/tritonserver:21.02-py3 which moved all torch backends out from libtritonserver.so and deferred pytorch binary load only when pytorch models are really in use. it might solve this problem. But 6.0EA image might be for partner’s only.

Yes, I think this may solve the problem. I was playing around with some of the more recent pure-Triton containers, and I could import torch properly. I wasn’t able to try running Python Deepstream on it, however, as those containers used Python 3.8, but the current version of Deepstream requires Python 3.6.

I’ll give the early access release a shot if I can get it.

@windy60j34 I just got access to the early access, but the same problem appears to still be present. From what I observe, Triton still loads the pip PyTorch libraries instead of its own libraries if I run import torch. The loading is just delayed to when the pipeline is playing instead of at creation-time.

Is there a place I can give feedback on this?

1 Like

It sounds like pip Pytorch libraries and triton’s prebuilt pytorch libs could not co-exist together. Are you using torch-models with triton inference, meanwhile doing some extra pytorch process in your python-app?
meanwhile, could you file a bug for triton Issues · triton-inference-server/server · GitHub

I’m integrating a custom DeepSort tracker in my pipeline. This pipeline is written in Python and uses PyTorch to get the image feature vector.

is it a libtorch model? some workaround might be convert it to a onnx model, then use triton inference on onnx model. your pipeline script continue running with pytorch.

Do you mean doing the feature extraction in a nvinfer block, then adapting the custom tracker to use those features?

If so, unfortunately, it isn’t trivial to make such a change, and I think it’s quite a messy workaround.

We tried changing the feature extractor from a PyTorch model to a TensorFlow model, but it appears to have the same problem. We are unable to import the Python tensorflow package within the same address space as the Triton process.

so, import pytorch is still needed for you, right?

Ideally yes, I need to be able to import pytorch for my tracker.

In the meantime, we will try to explore your solution, i.e. running the embedding in a nvinfer block before using those embeddings in the nvtracker block.

However, I cannot find any reference material on how to access the embeddings tensor data from within the custom tracker library. Is there some guide I can follow?