Triton Inference Server not supporting PyTorch v1.6?

• Hardware Platform (Jetson / GPU)
GPU

• DeepStream Version
5.0 (image : nvcr.io/nvidia/deepstream:5.0-20.07-triton)

• JetPack Version (valid for Jetson only)
N/A

• TensorRT Version
7.0.0-1+cuda10.2

• NVIDIA GPU Driver Version (valid for GPU only)
455.32.00

• Issue Type( questions, new requirements, bugs)
I am trying to run a Pytorch model (built on torch v1.6 and torchvision v0.5) on Triton Inference Server but encountered the following issue :

E1112 14:33:37.104733 11675 model_repository_manager.cc:840] failed to load 'yolov5' version 1: Internal: load failed for libtorch model -> 'yolov5': version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at ../caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at ../caffe2/serialize/inline_container.cc:132)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6c (0x7fa1f293236c in /opt/tensorrtserver/lib/pytorch/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x27b4 (0x7fa24945de74 in /opt/tensorrtserver/lib/pytorch/libtorch_cpu.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::unique_ptr<caffe2::serialize::ReadAdapterInterface, std::default_delete<caffe2::serialize::ReadAdapterInterface> >) + 0x6d (0x7fa24945f7cd in /opt/tensorrtserver/lib/pytorch/libtorch_cpu.so)
frame #3: <unknown function> + 0x2bdc4ff (0x7fa24a41b4ff in /opt/tensorrtserver/lib/pytorch/libtorch_cpu.so)
frame #4: torch::jit::load(std::unique_ptr<caffe2::serialize::ReadAdapterInterface, std::default_delete<caffe2::serialize::ReadAdapterInterface> >, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x6d (0x7fa24a41ae5d in /opt/tensorrtserver/lib/pytorch/libtorch_cpu.so)
frame #5: torch::jit::load(std::istream&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x79 (0x7fa24a41b2f9 in /opt/tensorrtserver/lib/pytorch/libtorch_cpu.so)
frame #6: <unknown function> + 0x2964d3 (0x7fa2851474d3 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #7: <unknown function> + 0x297410 (0x7fa285148410 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #8: <unknown function> + 0x290f55 (0x7fa285141f55 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #9: <unknown function> + 0x114b7d (0x7fa284fc5b7d in /opt/tensorrtserver/lib/libtrtserver.so)
frame #10: <unknown function> + 0x116195 (0x7fa284fc7195 in /opt/tensorrtserver/lib/libtrtserver.so)
frame #11: <unknown function> + 0xbd66f (0x7fa2ec8c666f in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #12: <unknown function> + 0x76db (0x7fa2ecb996db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #13: clone + 0x3f (0x7fa2feab288f in /lib/x86_64-linux-gnu/libc.so.6)
  1. Triton Server doesn’t support PyTorch v1.6??
    Based on my investigation, this appear because from PyTorch 1.6 onwards, torch.save or torch::jit::save saves the weight in serialized zip file. In Triton server, it will try to load the PyTorch model using torch::jit::load which only load models saved from torch::jit::save. As a workaround to pass this stage, I need to use Pytorch v1.5 + torchvision v0.5.0 to save my model before Triton able to read the weights.

Having tried the above workaround, why we have to downgrade our model to using PyTorch v1.5 when the docker image supports PyTorch v1.6?

  1. Does it has something to do with TensorRT not supporting Pytorch v1.6?
  2. Is there any working sample config files for PyTorch running on Triton Inference Server from Nvidia? I would appreciate if Nvidia could provide us at least a sample from PyTorch to show that it actually works with Triton server. Please share it with us if you have it. Thanks!

• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
Target model : Yolov5
Weights/Checkpoint : yolov5s.pt (Run the download_weights.sh in here
Config files : config_yolo.zip (3.7 KB)

Credits to : YoloV5

Triton supports pytorch v1.6, you can find Triton support matrix in https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html

And, you may could use Triton NGC docker - https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel_20-10.html#rel_20-10

Yes, I did refer to the support matrix and in fact, 20.07 supports PyTorch v1.6 already. However, I am still getting the issue whereby the load failed for libtorch model -> 'yolov5': version_ <= kMaxSupportedFileFormatVersion thrown by libtorch_cpu (and possibly related libs).

Going deeper, I check the model checkpoint (zip serialized) created by different version of Pytorch using archive manager:
Pytorch v1.6: version file has a value of 3
PyTorch v1.5: version file has a value of 2

I did the folllowing check in the docker image nvcr.io/nvidia/deepstream:5.0-20.07-triton and nvcr.io/nvidia/deepstream:5.0.1-20.09-triton. Both returned the following result:

root@2cdf241a77b9:/opt# ls -l | grep -r "kMaxSupportedFileFormatVersion"
tensorrtserver/include/torch/caffe2/serialize/inline_container.h:constexpr uint64_t kMaxSupportedFileFormatVersion = 0x2L;
Binary file tensorrtserver/lib/pytorch/libtorch_cpu.so matches

OK. I found out why. Apparently the docker image from the NGC Container is quite misleading, and I missed out the clue that has been there all the while.

In nvcr.io/nvidia/deepstream:5.0-20.07-triton, the actual triton inference server is of v20.03 , not v20.07

wensher.ong@ows-workstation:~/Workspace/vision-inference$ docker run --gpus all -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY nvcr.io/nvidia/deepstream:5.0-20.07-triton

===============================
==   DeepStreamSDK 5.0       ==
===============================

*** LICENSE AGREEMENT ***
By using this software you agree to fully comply with the terms and conditions
of the License Agreement. The License Agreement is located at
/opt/nvidia/deepstream/deepstream-5.0/LicenseAgreement.pdf. If you do not agree
to the terms and conditions of the License Agreement do not use the software.


=============================
== Triton Inference Server ==
=============================

NVIDIA Release 20.03 (build 11042949)

Copyright (c) 2018-2019, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
find: File system loop detected; '/usr/bin/X11' is part of the same file system loop as '/usr/bin'.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

In nvcr.io/nvidia/deepstream:5.0-20.09-triton, the actual triton inference server is of v20.03, not 20.09

wensher.ong@ows-workstation:~/Workspace/vision-inference$ docker run --gpus all -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=:1  nvcr.io/nvidia/deepstream:5.0.1-20.09-triton

===============================
==   DeepStreamSDK 5.0       ==
===============================

*** LICENSE AGREEMENT ***
By using this software you agree to fully comply with the terms and conditions
of the License Agreement. The License Agreement is located at
/opt/nvidia/deepstream/deepstream-5.0/LicenseAgreement.pdf. If you do not agree
to the terms and conditions of the License Agreement do not use the software.


=============================
== Triton Inference Server ==
=============================

NVIDIA Release 20.03 (build 11042949)

Copyright (c) 2018-2019, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
find: File system loop detected; '/usr/bin/X11' is part of the same file system loop as '/usr/bin'.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

So, the question is, will NVIDIA be releasing the latest deepstream image compiled together with latest Triton Inference server?

Hi @wensher.ong,
Thanks for debugging this by yourself!

DeepStream release is not so frequently as Triton, so it’s expected there are some version diff.
Until next DS release, there will not be Triton upgrade in DS.
Can you work with current Triton version in the DS docker?

Thanks!

@mchi no problem. Thanks for confirming this!

May I know when will be the next DS release? I could not work with the current Triton version in the said DS docker. There is a layer which is supported only in PyTorch 1.6, but not PyTorch 1.5. Do you think it is possible to provide us the user with at least a DS Docker with the latest Triton (supporting at least PyTorch 1.6 and latest TRT)?

Or else, could you guide us to upgrade Triton Inference Server in the DS Docker to the latest Triton Inference Server? Would that be reasonable as a workaround?

Will check and get back to you later.

1 Like