Tao-converted .plan model running in triton-server turned to bad accurate

music1913 · March 16, 2022, 5:11am

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
Ubuntu 20 x64, RTX 3060 12g.
• Network Type (Classification)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I’ve retrained a classification model from custom data with varies of resolutions (have not resized before training, is it necessary?), the Evaluate and visualize results are all good, most of the test data can be correctly classified:

Confusion Matrix
[[408   9]
 [  1 408]]
Classification Report
                  precision    recall  f1-score   support

         bicycle       1.00      0.98      0.99       417
electric_bicycle       0.98      1.00      0.99       409

        accuracy                           0.99       826
       macro avg       0.99      0.99      0.99       826
    weighted avg       0.99      0.99      0.99       826

classification_spec.cfg:

model_config {
  arch: "resnet",
  n_layers: 18
  # Setting these parameters to true to match the template downloaded from NGC.
  use_batch_norm: true
  all_projections: true
  freeze_blocks: 0
  freeze_blocks: 1
  input_image_size: "3,224,224"
}
train_config {
  train_dataset_path: "/workspace/tao-experiments/data/split/train"
  val_dataset_path: "/workspace/tao-experiments/data/split/val"
  pretrained_model_path: "/workspace/tao-experiments/classification/pretrained_resnet18/pretrained_classification_vresnet18/resnet_18.hdf5"
  optimizer {
    sgd {
    lr: 0.01
    decay: 0.0
    momentum: 0.9
    nesterov: False
  }
}
  batch_size_per_gpu: 64
  n_epochs: 80
  n_workers: 16
  preprocess_mode: "caffe"
  enable_random_crop: True
  enable_center_crop: True
  label_smoothing: 0.0
  mixup_alpha: 0.1
  # regularizer
  reg_config {
    type: "L2"
    scope: "Conv2D,Dense"
    weight_decay: 0.00005
  }

  # learning_rate
  lr_config {
    step {
      learning_rate: 0.006
      step_size: 10
      gamma: 0.1
    }
  }
}
eval_config {
  eval_dataset_path: "/workspace/tao-experiments/data/split/test"
  model_path: "/workspace/tao-experiments/classification/output/weights/resnet_080.tlt"
  top_k: 3
  batch_size: 256
  n_workers: 8
  enable_center_crop: True
}

I then export it to .eltl model file, and volume mapped it into tritonserver:22.02-py3 docker along with the tao-converter, then, in docker, execute with command to convert .etlt to .plan:

root@7c298352fe92:/models/tao-converter-x86-tensorrt8.0# ./tao-converter /models/tao-converter-x86-tensorrt8.0/models/classification/export/final_model.etlt -k tlt_encode -d 3,224,224 -o predictions/Softmax -t fp16 -e /models/ele_two_vehicle_net_tao/1/model.plan

the execute result is:

[INFO] [MemUsageChange] Init CUDA: CPU +458, GPU +0, now: CPU 469, GPU 3692 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 469 MiB, GPU 3692 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 623 MiB, GPU 3736 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +809, GPU +350, now: CPU 1565, GPU 4086 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +126, GPU +58, now: CPU 1691, GPU 4144 (MiB)
[INFO] Local timing cache in use. Profiling results in this builder pass will not be stored.
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 1 output network tensors.
[INFO] Total Host Persistent Memory: 58448
[INFO] Total Device Persistent Memory: 23211520
[INFO] Total Scratch Memory: 1024
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 70 MiB, GPU 640 MiB
[INFO] [BlockAssignment] Algorithm ShiftNTopDown took 0.408682ms to assign 4 blocks to 33 nodes requiring 44957697 bytes.
[INFO] Total Activation Memory: 44957697
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2870, GPU 4710 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2870, GPU 4718 (MiB)
[INFO] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +22, GPU +23, now: CPU 22, GPU 23 (MiB)

and I refer tao-toolkit-triton-apps to generate config.pbtxt and labels.txt with content:

name: "ele_two_vehicle_net_tao"
platform: "tensorrt_plan"
max_batch_size : 1
input [
  {
    name: "input_1"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
  }
]
output [
  {
    name: "predictions/Softmax"
    data_type: TYPE_FP32
    dims: [2, 1, 1]
    label_filename: "labels.txt"
  }
]
dynamic_batching { }

and

bicycle
electric-bicycle

by using triton client python sample with the testing images that actually copied from training dataset:

python3 image_client.py -m ele_two_vehicle_net_tao ~/Pictures/data/train/electric_bicycle/

I can see the infer result is pretty bad, over half (total image 200) of it were wrongly recognized as bicycle, could you help to check：

why running the tao-converter says Some tactics do not have sufficient workspace memory... as I can see still a lot GPU memory there.
why the classification accuracy is low?

thanks.

Morganh · March 16, 2022, 5:16pm

Could you let https://github.com/NVIDIA-AI-IOT/tao-toolkit-triton-apps/blob/main/scripts/download_and_convert.sh generate tensort engine?

music1913 · March 16, 2022, 7:35pm

i basically referred the command there to generate the converted .plan, except -c and -m parameter were omitted, could you elaborate more on this?

Morganh · March 17, 2022, 1:12am

You mentioned that you use tritonserver:22.02-py3 to generate .plan file.
But according to tao-toolkit-triton-apps/Dockerfile at main · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub , it is nvcr.io/nvidia/tritonserver:21.10-py3 docker.

This is different.

So, could you run try either of below methods?

method 1: try with 21.10 docker to generate .plan file

method 2:

glone above github
In tao-toolkit-triton-apps/download_and_convert.sh at main · NVIDIA-AI-IOT/tao-toolkit-triton-apps · GitHub, replace /tao_models/dashcamnet_model/resnet18_dashcamnet_pruned.etlt and /tao_models/dashcamnet_model/dashcamnet_int8.txt with yours. Modify “-d” , “-k” as well.

Morganh · March 17, 2022, 1:43am

Can you run default github successfully without any modification?

music1913 · March 17, 2022, 3:04am

I’m now running triton-server docker from tao-toolkit-triton-apps.

I manually use the docker build-in tao-converter to do the convert by:

root@449f5d2ec140:/tao_triton# tao-converter ele_2class_classification/export/final_model.etlt -k nvidia_tlt -d 3,224,224 -o predictions/Softmax -m 16 -e /model_repository/electri_bicycle_net_tao/1/model.plan

and then I manually generated the config.pbtxt and labels.txt as the content was the same as my original first post.
Restarted the docker, can see my model was correctly loaded, by using the triton-client: image_client.py again, can see the testing result is still bad:

400 electric-bicycle images which copied from part of training dataset split, 240 were wrongly recognized as bicycle

for further make sure the model’s accuracy, I was using the same 400 images, to test with tao:

tao classification inference -e $SPECS_DIR/classification_retrain_spec.cfg \
                          -m $USER_EXPERIMENT_DIR/output_retrain/weights/resnet_$EPOCH.tlt \
                          -k $KEY -b 32 -d $DATA_DOWNLOAD_DIR/split/compare_test/electric_bicycle \
                          -cm $USER_EXPERIMENT_DIR/output_retrain/classmap.json

from the output result.csv, 400 images were 99% correctly recognized as electric-bicycle.

Morganh · March 17, 2022, 8:59am

What docker did you trigger? Can you share “$ docker ps” ?

music1913 · March 17, 2022, 9:11am

the sudo docker ps here is:

ONTAINER ID   IMAGE                                      COMMAND                  CREATED          STATUS          PORTS                                                           NAMES
a014d704c8e9   nvcr.io/nvidia/tao/triton-apps:21.11-py3   "/opt/tritonserver/n…"

while when I start the docker: sudo bash scripts/start_server.sh, it shows something like below:

Sending build context to Docker daemon  119.2MB
Step 1/7 : FROM nvcr.io/nvidia/tritonserver:21.10-py3
 ---> 5c99e9b6586e
...
...
Successfully built f33160171d35
Successfully tagged nvcr.io/nvidia/tao/triton-apps:21.11-py3

...
...
Running the server on 0

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 21.10 (build 28453983)
...
...

Morganh · March 17, 2022, 9:29am

Could you upload all the logs when you run server and client?
You can attach as files. Thanks.

music1913 · March 17, 2022, 9:34am

the server running logs (I comment out extra models downloading from start_server.sh and download_and_convert.sh since it is quite slow and stuck here, no other changes):
triton_runing_console_logs.txt (61.9 KB)

the client running output (the testing images are all electric-bicycle):
triton_client_result.txt (23.3 KB)

Morganh · March 17, 2022, 10:00am

I will check further.

Could you try to run standalone script as well?
See Inferring resnet18 classification etlt model with python - #9 by Morganh
and Inferring resnet18 classification etlt model with python - #40 by Morganh

music1913 · March 17, 2022, 10:10am

that could take me a while, do I need to share my .etlt model, and some sample data?

Morganh · March 17, 2022, 10:13am

Yes, if possible, you can share the tlt model, etlt model, key and some sample data.
You can send me via private message.

music1913 · March 23, 2022, 1:29am

Hi Morgan, any updates on this? thanks.

Morganh · March 23, 2022, 1:53am

Oh, very sorry for that, I have not tried your model yet.

As mentioned above, could you try to run standalone script as well?
See Inferring resnet18 classification etlt model with python - #9 by Morganh
and Inferring resnet18 classification etlt model with python - #40 by Morganh

music1913 · March 23, 2022, 3:09am

I just need to docker exec -it xxxx bash into my triton docker (triton-apps:21.11-py3), and copy in the python samples from TensorRT Python Samples, and then run the inference against my classification model, correct? No need to clone the whole TensorRT source for build and install.

Morganh · March 23, 2022, 3:15am

No, just login the tao docker and try to run below standalone python script. Not needed to copy other python samples or TRT source.
Inferring resnet18 classification etlt model with python - #9 by Morganh
or Inferring resnet18 classification etlt model with python - #40 by Morganh

music1913 · March 23, 2022, 3:42am

my training machine and triton server are 2 differct standalone machines, the bad accurate happens in triton server.

those python scripts(Inferring resnet18 classification etlt model with python - #9 by Morganh) are suppose to run in my training machine? this is my docker ps in training machine:

CONTAINER ID   IMAGE                                                     COMMAND                  CREATED       STATUS       PORTS     NAMES
47f180137f6c   nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3   "install_ngc_cli.sh …"   13 days ago   Up 13 days             dazzling_carver
2abe608d35af   nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3   "install_ngc_cli.sh …"   13 days ago   Up 13 days             gallant_hawking
f3b53fb42965   nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3   "install_ngc_cli.sh …"   13 days ago   Up 13 days             reverent_dijkstra

after docker exec -it 47f180137f6c bash, there’s only bin folder there:

root@47f180137f6c:/usr/src/tensorrt# ls
bin

Morganh · March 23, 2022, 4:35am

Above-mentioned standalone scripts can run on either of your machines. You can as below.
$ python xxx.py

The script will run inference against a tensorrt engine. The tensorrt engine is already generated by the triton server. You can directly login the triton server and then run standalone inference script.

music1913 · March 23, 2022, 6:36am

I prefer to run it in triton server docker, this is the docker ps in my triton server:

CONTAINER ID   IMAGE                                      COMMAND                  CREATED      STATUS      PORTS                                                           NAMES
a014d704c8e9   nvcr.io/nvidia/tao/triton-apps:21.11-py3   "/opt/tritonserver/n…"   5 days ago   Up 5 days   0.0.0.0:8000-8002->8000-8002/tcp, :::8000-8002->8000-8002/tcp   stupefied_grothendieck

I entered the docker with sudo docker exec -it a014d704c8e9 bash, checked the:

/usr/src
2 folders of cudnn_samples_v8 and tensorrt
/usr/src/tensorrt
1 folder of bin

then manually copied in the single file of caffe_resnet50.py under /usr/src/tensorrt, run python3 caffe_resnet50.py, an error say:

Traceback (most recent call last):
File “caffe_resnet50.py”, line 26, in
import tensorrt as trt
ModuleNotFoundError: No module named ‘tensorrt’

ps: the post was using the trt file, while i only have the .etlt model.

Topic		Replies	Views
`Error No Op registered for NMSDynamic_TRT...` when trying to run Trition inference server with a SSD model TAO Toolkit jetson	12	1220	October 12, 2023
Triton Server Error with TAO FasterRCNN model: Validation failed: libNamespace == nullptr TAO Toolkit	10	35	February 20, 2025
Generation of Triton Inference Server configuration for TensorRT exported model of TAO classification (resnet) TAO Toolkit tensorrt , inference-server-triton , tao	7	2622	June 23, 2022
Regarding when we execute triton server on jetson orin getting an error unable to load model DeepStream SDK cuda	19	662	July 30, 2024
DeepStream 6.0.1 Triton GRPC memory leak DeepStream SDK nvbugs	23	2736	September 2, 2022
Tao-converter [ERROR] Failed to parse the model, please check the encoding key to make sure its correct TAO Toolkit deepstream	70	1684	July 10, 2023
Cannot convert FasterRCNN TLT model to trt engine TAO Toolkit	9	1105	October 12, 2021
Custom Detection parser error with nvinferserver and custom python model with > 1 streams DeepStream SDK inference-server-triton , gpu , deepstream	18	1082	September 4, 2023
The effect is very poor when converted to trt TAO Toolkit tensorrt , ubuntu	61	1294	September 11, 2023
Cannot use TensorRT model exported by NVIDIA TAO TAO Toolkit	8	1119	May 17, 2022

Tao-converted .plan model running in triton-server turned to bad accurate

Related topics