Triton Inference Server not supporting PyTorch v1.6?

wensher.ong · November 12, 2020, 3:41pm

• Hardware Platform (Jetson / GPU)
GPU

• DeepStream Version
5.0 (image : nvcr.io/nvidia/deepstream:5.0-20.07-triton)

• JetPack Version (valid for Jetson only)
N/A

• TensorRT Version
7.0.0-1+cuda10.2

• NVIDIA GPU Driver Version (valid for GPU only)
455.32.00

• Issue Type( questions, new requirements, bugs)
I am trying to run a Pytorch model (built on torch v1.6 and torchvision v0.5) on Triton Inference Server but encountered the following issue :

E1112 14:33:37.104733 11675 model_repository_manager.cc:840] failed to load 'yolov5' version 1: Internal: load failed for libtorch model -> 'yolov5': version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at ../caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at ../caffe2/serialize/inline_container.cc:132)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6c (0x7fa1f293236c in /opt/tensorrtserver/lib/pytorch/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x27b4 (0x7fa24945de74 in /opt/tensorrtserver/lib/pytorch/libtorch_cpu.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::unique_ptr<caffe2::serialize::ReadAdapterInterface, std::default_delete<caffe2::serialize::ReadAdapterInterface> >) + 0x6d (0x7fa24945f7cd in /opt/tensorrtserver/lib/pytorch/libtorch_cpu.so)
frame #3: <unknown function> + 0x2bdc4ff (0x7fa24a41b4ff in /opt/tensorrtserver/lib/pytorch/libtorch_cpu.so)
frame #4: torch::jit::load(std::unique_ptr<caffe2::serialize::ReadAdapterInterface, std::default_delete<caffe2::serialize::ReadAdapterInterface> >, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x6d (0x7fa24a41ae5d in /opt/tensorrtserver/lib/pytorch/libtorch_cpu.so)
frame #5: torch::jit::load(std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x79 (0x7fa24a41b2f9 in /opt/tensorrtserver/lib/pytorch/libtorch_cpu.so)
frame #6: <unknown function> + 0x2964d3 (0x7fa2851474d3 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #7: <unknown function> + 0x297410 (0x7fa285148410 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #8: <unknown function> + 0x290f55 (0x7fa285141f55 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #9: <unknown function> + 0x114b7d (0x7fa284fc5b7d in /opt/tensorrtserver/lib/libtrtserver.so)
frame #10: <unknown function> + 0x116195 (0x7fa284fc7195 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #11: <unknown function> + 0xbd66f (0x7fa2ec8c666f in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #12: <unknown function> + 0x76db (0x7fa2ecb996db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #13: clone + 0x3f (0x7fa2feab288f in /lib/x86_64-linux-gnu/libc.so.6)

Triton Server doesn’t support PyTorch v1.6??
Based on my investigation, this appear because from PyTorch 1.6 onwards, torch.save or torch::jit::save saves the weight in serialized zip file. In Triton server, it will try to load the PyTorch model using torch::jit::load which only load models saved from torch::jit::save. As a workaround to pass this stage, I need to use Pytorch v1.5 + torchvision v0.5.0 to save my model before Triton able to read the weights.

Having tried the above workaround, why we have to downgrade our model to using PyTorch v1.5 when the docker image supports PyTorch v1.6?

Does it has something to do with TensorRT not supporting Pytorch v1.6?
Is there any working sample config files for PyTorch running on Triton Inference Server from Nvidia? I would appreciate if Nvidia could provide us at least a sample from PyTorch to show that it actually works with Triton server. Please share it with us if you have it. Thanks!

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
Target model : Yolov5
Weights/Checkpoint : yolov5s.pt (Run the download_weights.sh in here
Config files : config_yolo.zip (3.7 KB)

Credits to : YoloV5

mchi · November 14, 2020, 2:18pm

Triton supports pytorch v1.6, you can find Triton support matrix in https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html

And, you may could use Triton NGC docker - Release Notes :: NVIDIA Deep Learning Triton Inference Server Documentation

wensher.ong · November 15, 2020, 1:57pm

Yes, I did refer to the support matrix and in fact, 20.07 supports PyTorch v1.6 already. However, I am still getting the issue whereby the load failed for libtorch model -> 'yolov5': version_ <= kMaxSupportedFileFormatVersion thrown by libtorch_cpu (and possibly related libs).

Going deeper, I check the model checkpoint (zip serialized) created by different version of Pytorch using archive manager:
Pytorch v1.6: version file has a value of 3
PyTorch v1.5: version file has a value of 2

I did the folllowing check in the docker image nvcr.io/nvidia/deepstream:5.0-20.07-triton and nvcr.io/nvidia/deepstream:5.0.1-20.09-triton. Both returned the following result:

root@2cdf241a77b9:/opt# ls -l | grep -r "kMaxSupportedFileFormatVersion"
tensorrtserver/include/torch/caffe2/serialize/inline_container.h:constexpr uint64_t kMaxSupportedFileFormatVersion = 0x2L;
Binary file tensorrtserver/lib/pytorch/libtorch_cpu.so matches

wensher.ong · November 17, 2020, 12:51pm

OK. I found out why. Apparently the docker image from the NGC Container is quite misleading, and I missed out the clue that has been there all the while.

In nvcr.io/nvidia/deepstream:5.0-20.07-triton, the actual triton inference server is of v20.03 , not v20.07

wensher.ong@ows-workstation:~/Workspace/vision-inference$ docker run --gpus all -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY nvcr.io/nvidia/deepstream:5.0-20.07-triton

===============================
==   DeepStreamSDK 5.0       ==
===============================

*** LICENSE AGREEMENT ***
By using this software you agree to fully comply with the terms and conditions
of the License Agreement. The License Agreement is located at
/opt/nvidia/deepstream/deepstream-5.0/LicenseAgreement.pdf. If you do not agree
to the terms and conditions of the License Agreement do not use the software.


=============================
== Triton Inference Server ==
=============================

NVIDIA Release 20.03 (build 11042949)

Copyright (c) 2018-2019, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
find: File system loop detected; '/usr/bin/X11' is part of the same file system loop as '/usr/bin'.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

In nvcr.io/nvidia/deepstream:5.0-20.09-triton, the actual triton inference server is of v20.03, not 20.09

wensher.ong@ows-workstation:~/Workspace/vision-inference$ docker run --gpus all -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=:1  nvcr.io/nvidia/deepstream:5.0.1-20.09-triton

===============================
==   DeepStreamSDK 5.0       ==
===============================

*** LICENSE AGREEMENT ***
By using this software you agree to fully comply with the terms and conditions
of the License Agreement. The License Agreement is located at
/opt/nvidia/deepstream/deepstream-5.0/LicenseAgreement.pdf. If you do not agree
to the terms and conditions of the License Agreement do not use the software.


=============================
== Triton Inference Server ==
=============================

NVIDIA Release 20.03 (build 11042949)

Copyright (c) 2018-2019, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
find: File system loop detected; '/usr/bin/X11' is part of the same file system loop as '/usr/bin'.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

So, the question is, will NVIDIA be releasing the latest deepstream image compiled together with latest Triton Inference server?

mchi · November 17, 2020, 1:56pm

Hi @wensher.ong,
Thanks for debugging this by yourself!

DeepStream release is not so frequently as Triton, so it’s expected there are some version diff.
Until next DS release, there will not be Triton upgrade in DS.
Can you work with current Triton version in the DS docker?

Thanks!

wensher.ong · November 17, 2020, 2:03pm

@mchi no problem. Thanks for confirming this!

May I know when will be the next DS release? I could not work with the current Triton version in the said DS docker. There is a layer which is supported only in PyTorch 1.6, but not PyTorch 1.5. Do you think it is possible to provide us the user with at least a DS Docker with the latest Triton (supporting at least PyTorch 1.6 and latest TRT)?

Or else, could you guide us to upgrade Triton Inference Server in the DS Docker to the latest Triton Inference Server? Would that be reasonable as a workaround?

mchi · November 17, 2020, 2:33pm

Will check and get back to you later.

wensher.ong · November 24, 2020, 3:28am

Hi @mchi, are there any updates to this issue/request?

wensher.ong · November 29, 2020, 8:19am

For future reference, I found out that this particular repo by NVIDIA helps to generate the sample config.pbtxt for ResNet50. This can help to verify if your settings on config.pbtxt for Pytorch model is correct, using deployer.py

mchi · November 29, 2020, 12:28pm

Thanks for sharing!
Confirmed that we will use Triton 20.11 (pytorch 1.8) in DS future release.

wensher.ong · November 30, 2020, 12:37am

Superb @mchi ! Great to hear that! Do you happen to have any estimated date of release for us to look forward too?

mchi · December 2, 2020, 3:24pm

Hi @wensher.ong,
It will be in Q1 2021.

wensher.ong · December 3, 2020, 3:07am

Alright thanks for the update! Cheers.

Topic		Replies	Views
Triton container is based on Ubuntu 20.04, all the others are based on Ubuntu 18.04 DeepStream SDK	7	1056	December 7, 2021
Utilizing Inference server for multi-batch processing with deepstream DeepStream SDK gstreamer , inference-server-triton , deepstream61	11	1178	October 19, 2023
Triton server for squad model on P100 with TensorRT 6.0 Triton Inference Server - archived	0	899	June 23, 2020
Deepstream and Triton containers DeepStream SDK deepstream	5	32	September 30, 2024
Unable to Import PyTorch DeepStream SDK pytorch	22	7430	December 23, 2021
Installing Triton Server on Lenovo SE70 with Xavier NX Jetson Xavier NX inference-server-triton	20	1029	April 22, 2024
DeepStream 6.0.1 Triton GRPC memory leak DeepStream SDK nvbugs	23	2812	September 2, 2022
How to set up Deepstream 6.3 in a docker container DeepStream SDK gstreamer , docker , deepstream	6	1364	January 2, 2024
Deepstream with triton is stuck and not outputting anything DeepStream SDK inference-server-triton , inception	5	1036	September 19, 2022
Custom Detection parser error with nvinferserver and custom python model with > 1 streams DeepStream SDK inference-server-triton , gpu , deepstream	18	1134	September 4, 2023

Triton Inference Server not supporting PyTorch v1.6?

Related topics