Wrong Classification Results on triton for models which are trained on TLT-V2

Hi,

Problem Description :

Model : Resnet18
Class : Table_Cleaning, Others.
Dataset size : 10K (5K each class) 70% Train, 10% Val, 20% Test.

I am trying to run images classification on a model which I have trained on Transfer Learning toolkit V2 and getting good results (not getting biased result) on my test data-set using tlt-infer but the same model gives me completely biased result towards single class on Triton with same test data-set.

Environment

Hardware Platform (GPU): NVIDIA V100
Triton_Container : nvcr.io/nvidia/tritonserver:20.03.1-py3
Client SDK to test triton: nvcr.io/nvidia/tritonserver:20.03-py3-clientsdk

How can I avoid biased result on triton ?
I have used same configuration generated through the command mention below :
docker run --gpus all --rm -p8000:8000 -p8001:8001 -v/home/ubuntu/triton_models:/models nvcr.io/nvidia/tritonserver:20.03.1-py3 trtserver --model-store=/models --log-verbose=1 --strict-model-config=false

Please let me know if any other details are needed.

Could you please paste a screenshot here?

Log is also appreciated.

Hi Morganh,

Thanks for the reply.
below are the logs :

oot@ip-172-31-41-68:/workspace# /workspace/install/bin/image_client -m Cleaning_Model -c 1 -s INCEPTION /workspace/images/testing_img/ -b 64
Request 0, batch size 64
Image ‘/workspace/images/testing_img//10ch2_2.jpg’:
1 (OTHERS) = 0.995663
Image ‘/workspace/images/testing_img//10ch3_2.jpg’:
1 (OTHERS) = 0.995663
Image ‘/workspace/images/testing_img//11ch2_2.jpg’:
1 (OTHERS) = 0.995663
Image ‘/workspace/images/testing_img//11ch3_2.jpg’:
1 (OTHERS) = 0.995663
Image ‘/workspace/images/testing_img//12ch2_1.jpg’:
1 (OTHERS) = 0.995663

Could you remove “-b 64” and run it against one image? More, you can set “-c 2” to print more class.
For example,
$ /workspace/install/bin/image_client -m Cleaning_Model -c 2 -s INCEPTION /workspace/images/testing_img/10ch2_2.jpg

Yes Morganh,

I have also tested with these inputs (without -b and -c 2).
but getting same results.

$ /workspace/install/bin/image_client -m Cleaning_Model -c 2 -s INCEPTION /workspace/images/testing_img/10ch2_2.jpg

Please test against your training images too.

Yes also test on trained images too but same results.

Could you attach the full log when you test on training images?
Please attach the log when the triton server is up too.

Yes will attach the logs for training images too.
The same problem I am facing in modified DS application where I am only running classification but getting same result as given by Triton inference server (biased result) on the same model.

According to your comments,

  1. tlt-infer: good
  2. deepstream: get biased result
  3. Triton server: get biased result

Right? For deepstream, please refer to Issue with image classification tutorial and testing with deepstream-app - #12 by Morganh, there are two ways for classification model inference. You can have a try.

Right.
For Deep stream Application we have modified the classification plugin (We give custom b-box for classification, No use of detector) and test it over another model like (Mask and no_Mask classification) it was working fine on that model which was also trained on TLT-V2 but the problem I am facing with the model(cleaning model having class cleaning and others) as mentioned above.

I had trained the model with 2-3 times with modification in dataset but all time getting good result on tlt-infer on test data but wrong on Triton and on deep stream.

For deepstream, did you use trt engine directly or use etlt model? Could you share all the config files?
For Triton, how did you generate the trt engine?

My DS container running on 2080TI so for that I have generated trt-engine(int8,gp16 and fp32) file using transfer learning toolkit.

!tlt-converter $USER_EXPERIMENT_DIR/export/final_model.etlt
-k $KEY
-c $USER_EXPERIMENT_DIR/export/final_model_int8_cache.bin
-o predictions/Softmax
-d 3,224,224
-i nchw
-m 64 -t int8
-e $USER_EXPERIMENT_DIR/export/final_model.trt
-b 64

And for triton server I am using V100 so I have generated trt(int8,gp16 and fp32) file on V100.

To narrow down, have you run with the fp16 or fp32 trt engine in deepstream or Triton? How about the result comparing to int8 trt engine?

I have only tested on fp32 and fp16 but the results were same(biased).

Thanks for the info. According to your comments, may I conclude that

  1. For classification model (Mask and no_Mask classification)
    You can get good result in tlt-infer, deepstream or Triton server

  2. For classification model (class cleaning and others)
    You can only get good result in tlt-infer, but get biased result in deepstream and Triton server.

  1. For mask no_mask classification I am getting good result on tlt-infer and deepstream.
  2. For model (class cleaning and others) only getting good result in tlt-infer but not on deepstream and on triton.

Logs of cleaning model on triton with training dataset.
Image ‘/workspace/images/cleaning_images//cleaning_93.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 344, batch size 1
Image ‘/workspace/images/cleaning_images//cleaning_94.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 345, batch size 1
Image ‘/workspace/images/cleaning_images//cleaning_95.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 346, batch size 1
Image ‘/workspace/images/cleaning_images//cleaning_96.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 347, batch size 1
Image ‘/workspace/images/cleaning_images//cleaning_97.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 348, batch size 1
Image ‘/workspace/images/cleaning_images//cleaning_98.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 349, batch size 1
Image ‘/workspace/images/cleaning_images//cleaning_99.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 350, batch size 1
Image ‘/workspace/images/cleaning_images//others_0.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 351, batch size 1
Image ‘/workspace/images/cleaning_images//others_1.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 352, batch size 1
Image ‘/workspace/images/cleaning_images//others_10.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 353, batch size 1
Image ‘/workspace/images/cleaning_images//others_100.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 354, batch size 1
Image ‘/workspace/images/cleaning_images//others_101.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 355, batch size 1
Image ‘/workspace/images/cleaning_images//others_102.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 356, batch size 1
Image ‘/workspace/images/cleaning_images//others_103.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138
Request 357, batch size 1
Image ‘/workspace/images/cleaning_images//others_104.jpg’:
1 (OTHERS) = 0.514862
0 (CLEANING) = 0.485138

again it is biased toward OTHERS classes.

It does not make sense. Why all the 1st class show the same prediction value?
To narrow down, could you please try to config your classification model (Mask and no_Mask classification) too?

I don’t know why It is giving the same prediction value.
Yes will update you for Mask No_Mask.

Can you please suggest the problem is with the training/model or there are any other conversion issues or anything else.?

For your triton inference result, I have no idea yet. Previously, I recall at least it will show different prediction values when it run inference against different test images.