TLT-Converter Cuda 11.3

Hi,

I am trying to get my models to run on the latest TRITON server image on “ngc.nvidia.com”. This one runs on Cuda 11.3, the tlt-converter theoretically only works for Cuda 11.1.

I have successfully used the 11.1 converter for Cuda 11.2, however with Cuda 11.3 it sometimes gives the following error: RuntimeError: CUDA error: an illegal memory access was encountered
The other times it runs correctly except for the fact that the maximum confidence the detections can have are 73% where the exact same model on the earlier version had 100% confidence for those test cases.

Would it be possible to update the tlt-converter to the next version?
Thank you in advance.

Can you share the exact TRITON server image name?

nvcr.io/nvidia/tritonserver:21.04-py3 which is TRITON server version 2.9.0.

Can you have a quick check which TRT version is installed inside the docker?

TensorRT 7.2.3.4

See Release Notes :: NVIDIA Deep Learning Triton Inference Server Documentation , could you try to use Triton Inference Server Release 20.11 version?
Its CUDA/Cudnn/TRT version can match Overview — Transfer Learning Toolkit 3.0 documentation

CUDA/CUDNN TensorRT Platform
10.2/8.0 7.2 cuda102-cudnn80-trt72
11.0/8.0 7.2 cuda110-cudnn80-trt72
11.1/8.0 7.2 cuda111-cudnn80-trt72
10.2/8.0 7.1 cuda102-cudnn80-trt71
11.0/8.0 7.1 cuda110-cudnn80-trt71

Yes then it works fine, even release version 21.03 works okay. However I wanted to use some of the other backends prebuild in the 21.04 version. I can use the older version and build the backends myself but I thought to check if it was possible to have the tlt-converter updated.

I will sync with internal team for your request.

Glad to know that 21.03 is also working.

From Release Notes :: NVIDIA Deep Learning Triton Inference Server Documentation,

For 21.04, it is

Can you share the full command you have run? The full log is also expected.

So that is once the converter has already run and the model has been loaded by TRITON. It only happens with the converted models, and hasn’t happened for a while so I don’t have the logs.

The clearer problem is that between the 21.03 and the 21.04 versions the model performs worse.

After checking, below version of tlt-converter can be working at nvcr.io/nvidia/tritonserver:21.04-py3

CUDA/CUDNN TensorRT Platform
11.1/8.0 7.2 cuda111-cudnn80-trt72

So, we need not a newer version of tlt-converter.

Sadly enough that does not fix the problem. I have build the TRITON software v2.9.0 onto the l4t-base:r32.5.0 which runs CUDA 10.2 and here the models have the correct confidence of 100%.
Running the models in TRITON v2.9.0 on CUDA 11.3 only gives a confidence of 73%.

Which means that it has something to do with either the TensorRT version or the CUDA version.

What test did you run? Can you share the full command and full log?
To narrow down, how about the confidence result when you run with TLT 3.0-dp docker?

I used the exact same models on within 3 different TRITON Docker containers with the exact same client software and test images.

I tested 2 different models:

  • A DSSD model which goes from 100% accuracy to 73% on individual bounding boxes.
  • A Classification model which just doesn’t work anymore (completely random results) where before it gives 95% + accuracy.

The models were converted using the following commands. With the right tlt-converter for the right image.

  • tlt-converter -e ./Classifier-224x224-mobilenet_v1.engine -k 12345 -d 3,224,224 -o predictions/Softmax -i nchw -m 64 ./Classifier-224x224-mobilenet_v1.etlt
  • tlt-converter -e ./DSSD-400x640-mobilenet_v1.engine -k 12345 -d 3,640,400 -o NMS ./DSSD-400x640-mobilenet_v1.etlt

And the following Docker images:

  1. A custom build Aarch64 image using nvcr.io/nvidia/l4t-base:r32.5.0 with TRITON v2.9.0, CUDA 10.2, TensorRT 7.1
  2. nvcr.io/nvidia/tritonserver:21.03-py3 with TRITON v2.8.0, CUDA 11.2.1, TensorRT 7.2.2.3
  3. nvcr.io/nvidia/tritonserver:21.04-py3 with TRITON v2.9.0, CUDA 11.3.0, TensorRT 7.2.3.4

The images are all run using the following command: tritonserver --model-store=/Workdir/Models --strict-model-config=false --exit-on-error=false --log-verbose=false

The 2 mentioned models work well in case 1 and 2, but when using them in case 3 something happens to the models.

The models are trained using the TLT 3.0-dp Docker image and run inference correctly. The accuracy’s correspond with the accuracy’s returned by the TRITON server in case 1 and 2.

Which log would you like to have? The converters, the TRITON servers or something else?

Thanks for the details. BTW, is it a must for you to run inference with Tritonserver? More, did you ever run inference with the command tlt dssd inference or with deepstream?

For your current regression result in tritonserver:21.04-py3, can you try to downgrade the TensorRT in it? Refer to https://developer.nvidia.com/nvidia-tensorrt-7x-download#trt722

Hi,

Yes it is a must to run it with the TRITON server. I have run it with the tlt dssd inference to test the network but this does not give me the possibilities the TRITON server has. The same goes for Deepstream.

I will try to downgrade the TensorRT version Thank you.

I have been trying to downgrade the TensorRT version but run into a problem installing the new version.

I am using the following Dockerfile:

FROM nvcr.io/nvidia/tritonserver:21.04-py3

RUN dpkg -l | grep TensorRT

RUN apt-get purge -y "libnvinfer*" libnvonnx*

WORKDIR /Workdir/

COPY nv-tensorrt-repo-ubuntu1804-cuda11.1-trt7.2.2.3-ga-20201211_1-1_amd64.deb /Workdir/

RUN dpkg -i nv-tensorrt-repo-ubuntu1804-cuda11.1-trt7.2.2.3-ga-20201211_1-1_amd64.deb

RUN apt-key add /var/nv-tensorrt-repo-cuda11.1-trt7.2.2.3-ga-20201211/7fa2af80.pub && apt update 
RUN apt install -y tensorrt

But the last step ends in the following error:

Step 8/9 : RUN apt install -y tensorrt
 ---> Running in c214e97cc9aa

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Reading package lists...
Building dependency tree...
Reading state information...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 tensorrt : Depends: libnvinfer-samples (= 7.2.2-1+cuda11.1) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
The command '/bin/sh -c apt install -y tensorrt' returned a non-zero code: 100

I have not been able to find how to fix this, could you give me any pointers? Thanks.

Could you please try to login the nvcr.io/nvidia/tritonserver:21.04-py3 docker directly , then puge the old TRT and install the new TRT?

I tried that too but end up with the exact same problem.

The following packages have unmet dependencies:
 tensorrt : Depends: libnvinfer-samples (= 7.2.2-1+cuda11.1) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

Any advice what to do now? Thanks.