Problem: Inference results from deepstream and local inference do not match (using same png images)
While testing what percentage of predictions match between engine and pth models I get that only 26% matched out of 180k images.
How I reproduce results: I save images after they go through streammux, so in 416x416 shape and in .png format. For each image I also save bounding box coordinates where YoloV4 detected objects. To test predictions I download images and bounding box coordinates for each image, then I crop object based on bounding box coordinates and run resulting image through pth model.
Version: Deepstream 5.1
Model training: I train EfficientNetB0 locally with PyTorch and use following transformations for loading data (we are training 128 classes):
import Albumentations as A
from albumentations.pytorch import ToTensorV2
train_transforms = A.Compose(
[ A.Resize(height=224, width=224),
A.HorizontalFlip(p=0.5),
A.VerticalFlip(p=0.5),
A.RandomGamma(gamma_limit=(75, 90), p=0.8),
A.GridDropout(ratio=0.47, p=0.6),
A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
ToTensorV2()]
Model inference locally: I run inference with following preprocessing:
test_transforms = A.Compose(
[ A.Resize(height=224, width=224),
A.Normalize(mean=(0.485, 0.456, 0.406),
std=(0.229, 0.224, 0.225)),
ToTensorV2()]
Steps i do to export model:
- Convert trained model to .onnx:
model = efficientnet_b0(pretrained=False)
pt_model = torch.load(path_to_torch_model, map_location=torch.device(‘cpu’))
n_features = model.classifier[1].in_features
model.classifier[1] = nn.Linear(n_features, classes)
model.load_state_dict(pt_model)
model = nn.Sequential(model, nn.Softmax(-1))
dummy_input = torch.randn(batch_size, 3, 224, 224)
torch.onnx.export(model, dummy_input, path_to_onnx, verbose=False, input_names=[‘input_names’], output_names=[‘output_names’], export_params=True)
I checked that converted onnx model gives same results as pytorch model.
- Export .onnx to engine file with following command:
docker container run --gpus all --rm --volume $(pwd):/workspace/ --volume $(pwd):/data/ --workdir /workspace/ nvcr.io/nvidia/tensorrt:21.02-py3 trtexec --explicitBatch --onnx=best_23.onnx --saveEngine=efficientnet.engine --fp16 --workspace=4096
Deepstream configuration:
RTSP stream → Streammux (reshaping to 416x416) → YoloV4 (bounding boxes) → Classification
Deepstream classification config:
[property]
gpu-id=0
offsets=103.53;116.28;123.675
net-scale-factor=0.01735207357279195
labelfile-path=…/classifier/labels.txt
model-engine-file=…/classifier/efficientnet.engine
infer-dims=3;224;224
network-mode=2
network-type=1
num-detected-classes=128
interval=0
classifier-threshold=0
Questions:
- How can I achieve same preprocessing during training in python as in deepstream inference, because I guess that albumentation package gives different interpolation result than deepstream inference?
- Maybe there are other mistakes that I cant see?