Description
Hello,
I am trying to calibrate an ONNX model in INT8 precision using the TensorRT Python API. The resulting engine file is 1/4 the size of the original ONNX model with ~2% loss in accuracy, however, when calibrating I get these warnings: (I use IInt8EntropyCalibrator2)
Then I am trying to run inference using the INT8 engine file in Deepstream 6.4, but the model is still slow, almost as slow as the original FP32 ONNX model. Is it because of these warnings that I get? Or is it something else regarding the Deepstream config file?
Thank you!
Environment
TensorRT Version: 8.6.1
GPU Type: RTX 4060
Nvidia Driver Version: 555.22
CUDA Version: 12.4
CUDNN Version:
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.10.12
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.2.1
Baremetal or Container (if container which image + tag):
Relevant Files
This is the Python script I use for calibrating to INT8:
import torch
import tensorrt as trt
import pandas as pd
from pathlib import Path
from polygraphy.backend.trt import Calibrator, CreateConfig, EngineFromNetwork, Profile, NetworkFromOnnxPath, TrtRunner, SaveEngine
from torch.utils.data import DataLoader
from torchvision import transforms
from oml.datasets.base import DatasetQueryGallery
from oml.utils.dataframe_format import check_retrieval_dataframe_format
def dataloader(loader, input_name, dtype):
for i, images in enumerate(loader):
yield {input_name: images['input_tensors'].to(dtype=dtype).numpy()}
def calibrate(img_size, batch_size, dtype, model_name, input_name):
profile = Profile()
profile.add(name=input_name,
min=(batch_size, 3, img_size[0], img_size[1]),
opt=(batch_size, 3, img_size[0], img_size[1]),
max=(batch_size, 3, img_size[0], img_size[1]))
onnx_path = model_name
df = pd.read_csv("oml_dataset.csv")
dataset_root = Path("data")
check_retrieval_dataframe_format(df=df, dataset_root=dataset_root)
val_df = df[df['split'] == 'validation']
transform = transforms.Compose([
transforms.Resize(img_size, interpolation=3),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], inplace=True)
])
val_dataset = DatasetQueryGallery(val_df, dataset_root=dataset_root, transform=transform)
loader = DataLoader(val_dataset, batch_size=batch_size)
calibrator = Calibrator(
data_loader = dataloader(loader, input_name, dtype),
cache = onnx_path.replace(".onnx", "_calibration.cache").split('/')[-1],
batch_size = batch_size
)
engine = EngineFromNetwork(
network = NetworkFromOnnxPath(onnx_path),
config = CreateConfig(
int8 = True,
calibrator = calibrator,
profiles = [profile],
profiling_verbosity = trt.ProfilingVerbosity.DETAILED,
sparse_weights = False),
)
engine_path = onnx_path.replace(".onnx", ".engine").split('/')[-1]
build_engine = SaveEngine(engine = engine, path=engine_path)
build_engine()
if __name__ == "__main__":
img_size = (256, 128)
batch_size = 32
input_name = 'input'
dtype = torch.float32
model_name = 'vit_reid_embed_512_fp32_acc_8509.onnx'
calibrate(img_size, batch_size, dtype, model_name, input_name)
This is the Deepstream config file:
[property]
gpu-id=0
model-color-format=0
onnx-file=/app/V2/Pipeline/Models/ReId/vit_reid_embed_512_fp32_acc_8509.onnx
model-engine-file=/app/V2/Pipeline/Models/ReId/vit_reid_embed_512_fp32_acc_8509.onnx_b32_gpu0_int8.engine
int8-calib-file=/app/V2/Pipeline/Models/ReId/vit_reid_embed_512_fp32_acc_8509_calibration.cache
network-mode=1
batch-size=32
interval=0
gie-unique-id=2
process-mode=2
network-type=100
output-tensor-meta=1
infer-dims=3;256;128
tensor-meta-pool-size=256
scaling-filter=2
operate-on-class-ids=1
net-scale-factor=0.017354
offsets=123.675000;116.280000;103.530000
maintain-aspect-ratio=0
symmetric-padding=0
This is a link to the ONNX model: Dropbox
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered