Tao-converted .plan model running in triton-server turned to bad accurate

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
Ubuntu 20 x64, RTX 3060 12g.
• Network Type (Classification)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I’ve retrained a classification model from custom data with varies of resolutions (have not resized before training, is it necessary?), the Evaluate and visualize results are all good, most of the test data can be correctly classified:

Confusion Matrix
[[408   9]
 [  1 408]]
Classification Report
                  precision    recall  f1-score   support

         bicycle       1.00      0.98      0.99       417
electric_bicycle       0.98      1.00      0.99       409

        accuracy                           0.99       826
       macro avg       0.99      0.99      0.99       826
    weighted avg       0.99      0.99      0.99       826

classification_spec.cfg:

model_config {
  arch: "resnet",
  n_layers: 18
  # Setting these parameters to true to match the template downloaded from NGC.
  use_batch_norm: true
  all_projections: true
  freeze_blocks: 0
  freeze_blocks: 1
  input_image_size: "3,224,224"
}
train_config {
  train_dataset_path: "/workspace/tao-experiments/data/split/train"
  val_dataset_path: "/workspace/tao-experiments/data/split/val"
  pretrained_model_path: "/workspace/tao-experiments/classification/pretrained_resnet18/pretrained_classification_vresnet18/resnet_18.hdf5"
  optimizer {
    sgd {
    lr: 0.01
    decay: 0.0
    momentum: 0.9
    nesterov: False
  }
}
  batch_size_per_gpu: 64
  n_epochs: 80
  n_workers: 16
  preprocess_mode: "caffe"
  enable_random_crop: True
  enable_center_crop: True
  label_smoothing: 0.0
  mixup_alpha: 0.1
  # regularizer
  reg_config {
    type: "L2"
    scope: "Conv2D,Dense"
    weight_decay: 0.00005
  }

  # learning_rate
  lr_config {
    step {
      learning_rate: 0.006
      step_size: 10
      gamma: 0.1
    }
  }
}
eval_config {
  eval_dataset_path: "/workspace/tao-experiments/data/split/test"
  model_path: "/workspace/tao-experiments/classification/output/weights/resnet_080.tlt"
  top_k: 3
  batch_size: 256
  n_workers: 8
  enable_center_crop: True
}

I then export it to .eltl model file, and volume mapped it into tritonserver:22.02-py3 docker along with the tao-converter, then, in docker, execute with command to convert .etlt to .plan:

root@7c298352fe92:/models/tao-converter-x86-tensorrt8.0# ./tao-converter /models/tao-converter-x86-tensorrt8.0/models/classification/export/final_model.etlt -k tlt_encode -d 3,224,224 -o predictions/Softmax -t fp16 -e /models/ele_two_vehicle_net_tao/1/model.plan

the execute result is:

[INFO] [MemUsageChange] Init CUDA: CPU +458, GPU +0, now: CPU 469, GPU 3692 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 469 MiB, GPU 3692 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 623 MiB, GPU 3736 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +809, GPU +350, now: CPU 1565, GPU 4086 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +126, GPU +58, now: CPU 1691, GPU 4144 (MiB)
[INFO] Local timing cache in use. Profiling results in this builder pass will not be stored.
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 1 output network tensors.
[INFO] Total Host Persistent Memory: 58448
[INFO] Total Device Persistent Memory: 23211520
[INFO] Total Scratch Memory: 1024
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 70 MiB, GPU 640 MiB
[INFO] [BlockAssignment] Algorithm ShiftNTopDown took 0.408682ms to assign 4 blocks to 33 nodes requiring 44957697 bytes.
[INFO] Total Activation Memory: 44957697
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2870, GPU 4710 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2870, GPU 4718 (MiB)
[INFO] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +22, GPU +23, now: CPU 22, GPU 23 (MiB)

and I refer tao-toolkit-triton-apps to generate config.pbtxt and labels.txt with content:

name: "ele_two_vehicle_net_tao"
platform: "tensorrt_plan"
max_batch_size : 1
input [
  {
    name: "input_1"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
  }
]
output [
  {
    name: "predictions/Softmax"
    data_type: TYPE_FP32
    dims: [2, 1, 1]
    label_filename: "labels.txt"
  }
]
dynamic_batching { }

and

bicycle
electric-bicycle

by using triton client python sample with the testing images that actually copied from training dataset:

python3 image_client.py -m ele_two_vehicle_net_tao ~/Pictures/data/train/electric_bicycle/

I can see the infer result is pretty bad, over half (total image 200) of it were wrongly recognized as bicycle, could you help to check:

  • why running the tao-converter says Some tactics do not have sufficient workspace memory... as I can see still a lot GPU memory there.

  • why the classification accuracy is low?

thanks.

Could you let tao-toolkit-triton-apps/download_and_convert.sh at main · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub generate tensort engine?

i basically referred the command there to generate the converted .plan, except -c and -m parameter were omitted, could you elaborate more on this?

You mentioned that you use tritonserver:22.02-py3 to generate .plan file.
But according to tao-toolkit-triton-apps/Dockerfile at main · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub , it is nvcr.io/nvidia/tritonserver:21.10-py3 docker.

This is different.

So, could you run try either of below methods?

method 1: try with 21.10 docker to generate .plan file

method 2:

Can you run default github successfully without any modification?

I’m now running triton-server docker from tao-toolkit-triton-apps.

I manually use the docker build-in tao-converter to do the convert by:

root@449f5d2ec140:/tao_triton# tao-converter ele_2class_classification/export/final_model.etlt -k nvidia_tlt -d 3,224,224 -o predictions/Softmax -m 16 -e /model_repository/electri_bicycle_net_tao/1/model.plan

and then I manually generated the config.pbtxt and labels.txt as the content was the same as my original first post.
Restarted the docker, can see my model was correctly loaded, by using the triton-client: image_client.py again, can see the testing result is still bad:

400 electric-bicycle images which copied from part of training dataset split, 240 were wrongly recognized as bicycle

for further make sure the model’s accuracy, I was using the same 400 images, to test with tao:

tao classification inference -e $SPECS_DIR/classification_retrain_spec.cfg \
                          -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \
                          -k $KEY -b 32 -d $DATA_DOWNLOAD_DIR/split/compare_test/electric_bicycle \
                          -cm $USER_EXPERIMENT_DIR/output_retrain/classmap.json

from the output result.csv, 400 images were 99% correctly recognized as electric-bicycle.

What docker did you trigger? Can you share “$ docker ps” ?

the sudo docker ps here is:

ONTAINER ID   IMAGE                                      COMMAND                  CREATED          STATUS          PORTS                                                           NAMES
a014d704c8e9   nvcr.io/nvidia/tao/triton-apps:21.11-py3   "/opt/tritonserver/n…"

while when I start the docker: sudo bash scripts/start_server.sh, it shows something like below:

Sending build context to Docker daemon  119.2MB
Step 1/7 : FROM nvcr.io/nvidia/tritonserver:21.10-py3
 ---> 5c99e9b6586e
...
...
Successfully built f33160171d35
Successfully tagged nvcr.io/nvidia/tao/triton-apps:21.11-py3

...
...
Running the server on 0

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 21.10 (build 28453983)
...
...

Could you upload all the logs when you run server and client?
You can attach as files. Thanks.

the server running logs (I comment out extra models downloading from start_server.sh and download_and_convert.sh since it is quite slow and stuck here, no other changes):
triton_runing_console_logs.txt (61.9 KB)

the client running output (the testing images are all electric-bicycle):
triton_client_result.txt (23.3 KB)

I will check further.

Could you try to run standalone script as well?
See Inferring resnet18 classification etlt model with python - #9 by Morganh
and Inferring resnet18 classification etlt model with python - #40 by Morganh

that could take me a while, do I need to share my .etlt model, and some sample data?

Yes, if possible, you can share the tlt model, etlt model, key and some sample data.
You can send me via private message.

Hi Morgan, any updates on this? thanks.

Oh, very sorry for that, I have not tried your model yet.

As mentioned above, could you try to run standalone script as well?
See Inferring resnet18 classification etlt model with python - #9 by Morganh
and Inferring resnet18 classification etlt model with python - #40 by Morganh

I just need to docker exec -it xxxx bash into my triton docker (triton-apps:21.11-py3), and copy in the python samples from TensorRT Python Samples, and then run the inference against my classification model, correct? No need to clone the whole TensorRT source for build and install.

No, just login the tao docker and try to run below standalone python script. Not needed to copy other python samples or TRT source.
Inferring resnet18 classification etlt model with python - #9 by Morganh
or Inferring resnet18 classification etlt model with python - #40 by Morganh

my training machine and triton server are 2 differct standalone machines, the bad accurate happens in triton server.

those python scripts(Inferring resnet18 classification etlt model with python - #9 by Morganh) are suppose to run in my training machine? this is my docker ps in training machine:

CONTAINER ID   IMAGE                                                     COMMAND                  CREATED       STATUS       PORTS     NAMES
47f180137f6c   nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3   "install_ngc_cli.sh …"   13 days ago   Up 13 days             dazzling_carver
2abe608d35af   nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3   "install_ngc_cli.sh …"   13 days ago   Up 13 days             gallant_hawking
f3b53fb42965   nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3   "install_ngc_cli.sh …"   13 days ago   Up 13 days             reverent_dijkstra

after docker exec -it 47f180137f6c bash, there’s only bin folder there:

root@47f180137f6c:/usr/src/tensorrt# ls
bin

Above-mentioned standalone scripts can run on either of your machines. You can as below.
$ python xxx.py

The script will run inference against a tensorrt engine. The tensorrt engine is already generated by the triton server. You can directly login the triton server and then run standalone inference script.

I prefer to run it in triton server docker, this is the docker ps in my triton server:

CONTAINER ID   IMAGE                                      COMMAND                  CREATED      STATUS      PORTS                                                           NAMES
a014d704c8e9   nvcr.io/nvidia/tao/triton-apps:21.11-py3   "/opt/tritonserver/n…"   5 days ago   Up 5 days   0.0.0.0:8000-8002->8000-8002/tcp, :::8000-8002->8000-8002/tcp   stupefied_grothendieck

I entered the docker with sudo docker exec -it a014d704c8e9 bash, checked the:

  • /usr/src
    2 folders of cudnn_samples_v8 and tensorrt

  • /usr/src/tensorrt
    1 folder of bin

then manually copied in the single file of caffe_resnet50.py under /usr/src/tensorrt, run python3 caffe_resnet50.py, an error say:

Traceback (most recent call last):
File “caffe_resnet50.py”, line 26, in
import tensorrt as trt
ModuleNotFoundError: No module named ‘tensorrt’

ps: the post was using the trt file, while i only have the .etlt model.