Classification accuracy dropped a lot with triton server

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
Ubuntu PC x64, RTX 3090
• Network Type
(resnet50, 4 class private dataset, Classification)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)

!tao info

Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 2.0
toolkit_version: 3.21.11
published_date: 11/08/2021

• Training spec file(If have, please share here)
Attached
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I’m using TAO (based on my private dataset) to trained a resnet50, 4 class classification model, for simplify the question here, let’s only focus on the accuracy for one of the class: electric_bicycle .

Within the TAO training process, the accuracy in train and re-train are all looks promising(all >0.9):

  • Evaluate trained models(electric_bicycle accuracy is 0.91):



    Total params: 38,163,396
    Trainable params: 37,734,276
    Non-trainable params: 429,120


    Found 2637 images belonging to 4 classes.
    2023-06-20 07:19:43,683 [INFO] main: Processing dataset (evaluation): /workspace/tao-experiments/data/split/test
    Evaluation Loss: 0.6608355530383327
    Evaluation Top K accuracy: 0.9863481228668942
    Found 2637 images belonging to 4 classes.
    2023-06-20 07:19:59,059 [INFO] main: Calculating per-class P/R and confusion matrix. It may take a while…
    Confusion Matrix
    [[ 118 20 8 73]
    [ 10 410 54 42]
    [ 2 28 1051 27]
    [ 20 45 45 684]]
    Classification Report
    precision recall f1-score support

      background       0.79      0.54      0.64       219
         bicycle       0.82      0.79      0.80       516
    electric_bicycle       0.91      0.95      0.93      1108
          people       0.83      0.86      0.84       794
    
        accuracy                           0.86      2637
       macro avg       0.83      0.79      0.80      2637
    weighted avg       0.86      0.86      0.85      2637
    
  • Evaluate retrained models(electric_bicycle accuracy is 0.92):



    Total params: 37,640,100
    Trainable params: 37,280,836
    Non-trainable params: 359,264


    Found 2637 images belonging to 4 classes.

    2023-06-20 11:39:11,175 [INFO] main: Processing dataset (evaluation): /workspace/tao-experiments/data/split/test
    Evaluation Loss: 0.6501948667332038
    Evaluation Top K accuracy: 0.9901403109594236
    Found 2637 images belonging to 4 classes.
    2023-06-20 11:39:30,047 [INFO] main: Calculating per-class P/R and confusion matrix. It may take a while…
    Confusion Matrix
    [[ 122 14 11 72]
    [ 10 414 44 48]
    [ 3 25 1053 27]
    [ 25 40 32 697]]
    Classification Report
    precision recall f1-score support

      background       0.76      0.56      0.64       219
         bicycle       0.84      0.80      0.82       516
    electric_bicycle       0.92      0.95      0.94      1108
          people       0.83      0.88      0.85       794
    
        accuracy                           0.87      2637
       macro avg       0.84      0.80      0.81      2637
    weighted avg       0.86      0.87      0.86      2637
    

For further validating the accuracy, at the visualize inferences stage in TAO, I was using the image data just simply copied from split folder of test, and with all 1108 images with ground truth of electric_bicycle, after check the generated file: result.csv, I noticed only 967 images were classified correcltly, that say, the accuracy dropped to 0.873.

I kept moving on to exported the model and deployed into my triton server, the server info is:

Sending build context to Docker daemon  1.371GB
Step 1/7 : FROM nvcr.io/nvidia/tritonserver:21.10-py3
 ---> 5c99e9b6586e
Step 2/7 : RUN wget https://nvidia.box.com/shared/static/7u2ocnwenwgrsx1yq8vv4hkfr0dg1rtm -O     /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.0.3
 ---> Using cache
 ---> c2d30fd54a9c
...
...
Successfully built f33160171d35
Successfully tagged nvcr.io/nvidia/tao/triton-apps:21.11-py3
Running the server on 0
=============================
== Triton Inference Server ==
=============================

NVIDIA Release 21.10 (build 28453983)
...

here is the .pbtxt file:

name: "elenet_four_classes_230620_tao"
platform: "tensorrt_plan"
max_batch_size : 16
input [
  {
    name: "input_1"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 224, 224 ]
  }
]
output [
  {
    name: "predictions/Softmax"
    data_type: TYPE_FP32
    dims: [4, 1, 1]
    label_filename: "labels.txt"
  }
]
dynamic_batching { }

Testing used the same image data (all 1108 images with ground truth of electric_bicycle ) again, with the command:

python3 tao_client.py -m elenet_four_classes_230620_tao --mode Classification -u x.x.x.x:18000 --output_path ~/Downloads/tao_triton_test_output ~/Downloads/test/electric_bicycle/

after checked the tao_triton_test_output/results.txt, only 727 were correctly classified (to electric_bicycle), that say, the accuracy is only 0.656.

Questions:

  1. why does my test in visualize inferences show accuracy of 0.873(dropped from 0.9+)?
  2. why does triton server test show accuracy is only 0.656?
    classification_retrain_spec.cfg (1.1 KB)
    classification_spec.cfg (1.2 KB)
  1. Could you use latest docker to run tao inference again to check the result?
    $ docker run --runtime=nvidia -it nvcr.io/nvidia/tao/tao-toolkit:4.0.1-tf1.15.5 /bin/bash
    Then,
    # classification_tf1 inference xxx

More info can be found in Image Classification (TF1) - NVIDIA Docs

Also, you can also run standalone inference, please refer to Inferring resnet18 classification etlt model with python - #10 by jazeel.jk and Inferring resnet18 classification etlt model with python - #41 by Morganh

  1. For triton inference, please refer to Tao-converted .plan model running in triton-server turned to bad accurate - #47 by Morganh

I followed the Tao-converted .plan model running in triton-server turned to bad accurate - #47 by Morganh by change this single line:
elif FLAGS.mode.lower() == "multitask_classification" or FLAGS.mode.lower() == "classification":
I can see the accuracy and inferenced confid value get huge improved, is this a bug or? I noticed the repo of tao-toolkit-triton-apps didn’t apply this changes.

Yes, will apply the change.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.