When I finish training the accuracy is high 98%, but when I generate the trt engine, the accuracy drops to 0.68.
And when I deploy on deepstream the accuracy is also very low.
• Hardware (NVIDIA RTX A5000 )
• Network Type (Classification_tf2/resnet backbone)
• Training spec file:
dataset:
train_dataset_path: “/home/data/train”
val_dataset_path: “/home/data/val”
preprocess_mode: ‘torch’
num_classes: 3
augmentation:
enable_center_crop: False
enable_random_crop: False
train:
qat: False
#checkpoint: ‘/workspace/tao-experiments/pretrained_classification_tf2_vefficientnet_b0’
batch_size_per_gpu: 32
num_epochs: 50
optim_config:
optimizer: ‘sgd’
lr_config:
scheduler: ‘cosine’
learning_rate: 0.0005
soft_start: 0.05
reg_config:
type: ‘L2’
scope: [‘conv2d’, ‘dense’]
weight_decay: 0.00005
results_dir: ‘/home/experiments1/train’
model:
backbone: ‘efficientnet-b0’
input_width: 256
input_height: 256
input_channels: 3
evaluate:
dataset_path: “/home/data/val”
checkpoint: “/home/experiments1/train/efficientnet-b0_010.tlt”
top_k: 1
batch_size: 16
n_workers: 8
results_dir: ‘/home/experiments1/val’
export:
checkpoint: “/home/experiments1/train/efficientnet-b0_006.tlt”
onnx_file: ‘/home/experiments1/export/efficientnet-b0_006.onnx’
results_dir: ‘/home/experiments1/export’
gen_trt_engine:
onnx_file: ‘/home/experiments1/export/efficientnet-b0.onnx’
trt_engine: ‘/home/experiments1/export/efficientnet-b0_fp32.engine’
results_dir: ‘/home/experiments1/export’
tensorrt:
max_workspace_size: 4
max_batch_size: 64
data_type: “fp32”
• training file:
docker run -it --rm --runtime=nvidia --gpus ‘“device=1,2,3,4”’ --ipc=host --pid=host --ulimit memlock=-1 --ulimit stack=67108864
-v /home:/home
nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf2.11.0
classification_tf2 train -e /home/experiment_spec.yaml --gpus 4 --gpu_index 0 1 2 3
• export file:
docker run -it --rm --runtime=nvidia --gpus ‘“device=9”’ --ipc=host --pid=host --ulimit memlock=-1 --ulimit stack=67108864
-v /home:/home
nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf2.11.0
classification_tf2 export -e /home/experiment_spec.yaml --gpus 1
• trt engine generator file:
docker run -it --rm --runtime=nvidia --gpus ‘“device=7”’ --ipc=host --pid=host --ulimit memlock=-1 --ulimit stack=67108864
-v /home:/home
nvcr.io/nvidia/tao/tao-toolkit:5.3.0-deploy
classification_tf2 gen_trt_engine -e /experiment_spec.yaml