Hello,
I succesfully completed the example notebook included in TAO Toolkit CV samples to train a Resnet18 classifier on a custom dataset and export it to TensorRT, but the performance is suffering a big drop only in some classes when deploying in Triton Server and consuming with gRPC (code and output is detailed below).
HW/SW details
• Hardware: RTX3070MaxQ
• Network Type: Classification (Resnet18)
• TAO Version: toolkit_version: 3.22.02 docker-tag: /nvidia/tao/tao-toolkit-tfv3.21.11-tf1.15.5-py3
• Triton Server: nvcr.io/nvidia/tritonserver:21.08-py3
Output of TAO inference
After training, pruning and retraining the model performs very well in test set:
tao classification evaluate -e $SPECS_DIR/classification_retrain_spec.cfg -k $KEY
Found 641 images belonging to 6 classes.
2022-06-09 01:21:32,951 [INFO] __main__: Calculating per-class P/R and confusion matrix. It may take a while...
Confusion Matrix
[[184 0 0 0 0 0]
[ 0 121 0 0 0 0]
[ 0 0 38 0 0 0]
[ 0 0 0 108 1 0]
[ 0 0 0 0 93 0]
[ 0 0 0 0 0 96]]
Classification Report
precision recall f1-score support
0_empty_deck_not_manipulating_net 1.00 1.00 1.00 184
1_empty_deck_manipulating_net 1.00 1.00 1.00 121
2_capture_in_deck_manipulating_net 1.00 1.00 1.00 38
3_capture_in_deck_no_human_activity 1.00 0.99 1.00 109
4_capture_in_deck_classification 0.99 1.00 0.99 93
5_capture_in_deck_almost_all_classified 1.00 1.00 1.00 96
accuracy 1.00 641
macro avg 1.00 1.00 1.00 641
weighted avg 1.00 1.00 1.00 641
2022-06-08 22:21:52,965 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
Exporting to TensorRT
The original code callibrates and exports to INT8. This is modified to export to FP32 to prevent INT8 affecting the accuracy:
tao converter $TAO_EXPERIMENTS_DOCKER_DIR/classification/export/final_model.etlt \
-k $KEY \
-o predictions/Softmax \
-d 3,224,224 \
-i nchw \
-m 64 -t fp32 \
-e $TAO_EXPERIMENTS_DOCKER_DIR/classification//export/final_model_fp32.trt \
-b 64
Serving with Trtion Server
The exported file final_model_fp32.trt
is renamed to to model.plan
and served letting Triton find the configuration automatically. This seems OK as no error is reported.
export TRITON_SERVER_IMAGE="nvcr.io/nvidia/tritonserver:21.08-py3"
docker run --gpus 1 --rm \
--shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v"$PWD/model_repository":/models \
$TRITON_SERVER_IMAGE /bin/bash -c "tritonserver --model-repository=/models --strict-model-config=false --grpc-infer-allocation-pool-size=16 --log-verbose=1"
Python testcase
The model is consumed with a Python gRPC client with the same test set of 641 images used with the tao tool. Image is being preprocessed as suggested in this post with Keras: preprocess_input(image, mode='caffe', data_format='channels_first')
. I’m not sure if BGR to RGB is required, but it doesn’t seem to have much impact as both enabled/disabled the accuracy is under 1% for some classes, when its near 100% for all classes with TAO inference tool.
import cv2
import numpy as np
import tritonclient.grpc as grpcclient
import tensorrt as trt
from keras.applications.imagenet_utils import preprocess_input
class TritonImageClassifierClient:
def __init__(self,hostname,port, model_name, input_layer_name,output_layer_name,
input_cell_size, input_format="FP32", input_channels=3):
self.model_name = model_name
self.input_layer_name = input_layer_name
self.output_layer_name = output_layer_name
self.input_cell_size = input_cell_size
self.input_channels = input_channels
self.input_format = input_format
self.triton_client = grpcclient.InferenceServerClient(url=f"{hostname}:{port}")
self.inputs = []
self.outputs = []
self.inputs.append(grpcclient.InferInput(self.input_layer_name,
[
1,
self.input_channels,
self.input_cell_size,
self.input_cell_size
], self.input_format ) )
self.outputs.append(grpcclient.InferRequestedOutput(self.output_layer_name))
def predict_proba(self,image):
image = self.__preprocess_image(image)
image = np.expand_dims(image, axis=0)
self.inputs[0].set_data_from_numpy(image)
result = self.triton_client.infer( model_name=self.model_name,
inputs=self.inputs,
outputs=self.outputs,
headers={} )
return result.as_numpy(self.output_layer_name)
def predict(self,image):
return np.argmax(self.predict_proba(image))
def __preprocess_image(self,image):
# Reads as BGR
image = cv2.resize(image, (self.input_cell_size, self.input_cell_size))
# Not sure if convertion is needed. Doesn't seem to impact in the results.
#image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = image.transpose([2, 0, 1]).astype(trt.nptype(trt.float32))
image = preprocess_input(image, mode='caffe', data_format='channels_first')
return image
from glob import glob
import os
triton_client = TritonImageClassifierClient(
hostname = "localhost",
port = 8001,
model_name = "clasificador1",
input_layer_name = "input_1",
output_layer_name = "predictions/Softmax",
input_cell_size = 224,
input_format="FP32", input_channels=3)
img_dir = "../../data/tmp/tao-experiments/data/split/test/"
category_dirs = {}
for dirname in os.listdir(img_dir):
class_idx = int(dirname.split("_")[0])
category_dirs[class_idx] = dirname
Testcase without BGR to RGB conversion and confusion matrix:
confusion_matrix = np.zeros(shape=(6,6))
for class_idx, dirname in category_dirs.items():
test_images = glob(f"{img_dir}/{dirname}/*.jpeg")
total_samples = len(test_images)
print(f"Class {class_idx}. Category: {dirname} Total samples {total_samples}")
for image_filename in test_images:
image = cv2.imread(image_filename)
predicted_class = triton_client.predict(image)
confusion_matrix[class_idx][predicted_class]+=1
accuracy = confusion_matrix[class_idx][predicted_class] / total_samples
print(f" Accuracy: {accuracy}")
confusion_matrix.astype(int)
Output:
Class 0. Category: 0_empty_deck_not_manipulating_net Total samples 184
Accuracy: 0.5706521739130435
Class 1. Category: 1_empty_deck_manipulating_net Total samples 121
Accuracy: 0.71900826446281
Class 5. Category: 5_capture_in_deck_almost_all_classified Total samples 96
Accuracy: 0.052083333333333336
Class 4. Category: 4_capture_in_deck_classification Total samples 93
Accuracy: 0.7849462365591398
Class 3. Category: 3_capture_in_deck_no_human_activity Total samples 109
Accuracy: 0.9357798165137615
Class 2. Category: 2_capture_in_deck_manipulating_net Total samples 38
Accuracy: 0.9736842105263158
array([[ 9, 0, 105, 0, 70, 0],
[ 0, 87, 28, 1, 5, 0],
[ 0, 0, 37, 0, 1, 0],
[ 0, 0, 3, 102, 4, 0],
[ 0, 0, 16, 4, 73, 0],
[ 5, 0, 30, 0, 61, 0]])
Testcase with BGR to RGB conversion and confusion matrix:
confusion_matrix = np.zeros(shape=(6,6))
for class_idx, dirname in category_dirs.items():
test_images = glob(f"{img_dir}/{dirname}/*.jpeg")
total_samples = len(test_images)
print(f"Class {class_idx}. Category: {dirname} Total samples {total_samples}")
for image_filename in test_images:
image = cv2.imread(image_filename)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
predicted_class = triton_client.predict(image)
confusion_matrix[class_idx][predicted_class]+=1
accuracy = confusion_matrix[class_idx][predicted_class] / total_samples
print(f" Accuracy: {accuracy}")
confusion_matrix.astype(int)
Output:
Class 0. Category: 0_empty_deck_not_manipulating_net Total samples 184
Accuracy: 0.532608695652174
Class 1. Category: 1_empty_deck_manipulating_net Total samples 121
Accuracy: 0.5371900826446281
Class 5. Category: 5_capture_in_deck_almost_all_classified Total samples 96
Accuracy: 0.14583333333333334
Class 4. Category: 4_capture_in_deck_classification Total samples 93
Accuracy: 1.0
Class 3. Category: 3_capture_in_deck_no_human_activity Total samples 109
Accuracy: 0.8440366972477065
Class 2. Category: 2_capture_in_deck_manipulating_net Total samples 38
Accuracy: 1.0
array([[98, 0, 47, 0, 39, 0],
[ 0, 65, 9, 0, 47, 0],
[ 0, 0, 38, 0, 0, 0],
[ 0, 0, 2, 92, 15, 0],
[ 0, 0, 0, 0, 93, 0],
[14, 0, 9, 0, 68, 5]])
I have already checked TensorRT versions and both Triton and the TAO toolkit have TensorRT 8.0.1.6.
Triton Server is not giving any error.
Any suggestion?
Thanks in advance,
Nicolás