TensorRT INT8 calibration python API

Harry-S · September 7, 2022, 2:50pm

Hello,

I would like to quantify many standard ONNX models with INT8 calibration using JPEG, JPG images format and after that I would like to have the validation result (Top1 and Top5 accuracy). To do that I have looked on NVIDIA/TensorRT GitHub repo and I saw this here .

According to that repo, we can generate a calibrated engine from EfficientNet ONNX model using JPEG/JPG image format by running build_engine.py. After that, we can do inference and have the validation result of the INT8 calibrated engine by running eval_gt.py. And it works great.

Question 1

can I use the same scripts to first generate a quatify with int8 calibrated engine and second run the validation to any classification model for example resnet18, squeezenet, etc…

Question 2

I would like to do the exact same thing for detection models using this repo here which is dedicated for EfficientDet. So it is possible to use it for others, for example yolov5.

Question 3

If I cannot use theses scripts for other models:

How I can do the calibration that is compatible with my request which means, it takes ONNX standard models, do the calibration with a standard dataset either imagenet(classification) or coco (detection) under JEPG/JPG image format and getting the validation result (Top1 and Top5 accuracy) without adding or modifiying ONNX layers as suggested here, I want just like the python scripts above.

Please, I have already red all your documentation here and all your samples here. All what I need is how to do:

Implementation of the calibrator class which is not clear.

Quatization with INT8 calibration using JPEG/JPG images format to standard ONNX models

Getting the validation results

maybe you could send me an image with quotations which indicates me what to put here and how to modify the code so it will be compatible with my models and how to implement the calibrator class for different model or if you have a video it will be great or eventually an online documentation but it has to be compatible with my request because I already seen all the documentation.

Thank you in advance,

Best regards,
Harry

SivaRamaKrishnaNV · September 8, 2022, 3:28am

Dear @Harry-S,
The INT8 calibrator code and evaluation code looks generic. Please test with your model and let us know if you see any issue.

Harry-S · September 8, 2022, 11:49am

Hello @SivaRamaKrishnaNV ,

Thank you very much for your reply.

So I can confirme from this that I can use the the EfficientNet repo here for other standard classification models, as well for the EfficientDet repo here for other standard detection models.

I have already use the EfficicentNet repo for resnet18 and I have use 500 images for the calibration as NVIDIA said it here . I did not choose random 500 images I choose as mlcommons used for thre calibration here I suppose.

However, after doing the calibration and run the validation script I have this as result:

resnet18

Top1 : 66.918
Top5 : 87.354
which is a lot of drop in accuracy with the resnet18 full precision here.

Question 1 :

Is that normal that we have a drop of 3% accuracy from full precision FP32 to INT8?

Question 2 :

If this is not normal what should I use for calibration? only 500 images or more and which images?

Question 3 :

I have also an other question about the --calib_preprocessor option here when calibrating and --preprocessor option here when validating. By default it is V2 so how to change it according to other standard models?

Best regards,
Harry

Harry-S · September 13, 2022, 7:23am

Any news ?

SivaRamaKrishnaNV · September 13, 2022, 4:02pm

Dear @Harry-S,
We don’t have perf numbers of EfficientNet to confirm. But, I remebering seeing 2-3% Drop in few object detection models in past.
The calibration images should cover all activation ranges. You may choose randomly or try increasing number of images.
The preprocessor here is specific to efficientNet. If you are asking in general about any model. You need to write a seperate preprocessing function for each model based on the operations.

Harry-S · September 13, 2022, 4:10pm

Hello @SivaRamaKrishnaNV ,

Thank you for your reply.

I tried to run the object detection samples which does not need an images pre-processing. It works fine with EfficientDet-D0 (see below). I am working now to see how I can find the Top1 accuracy from the mAP.

However, using yolov5n model is ending with an error. Please have a look below so you have a clear vision of the error in yolov5n.

EfficinetDet results:

loading annotations into memory...
Done (t=1.86s)
creating index...
index created!
Loading and preparing results...
Converting ndarray to lists...
(495840, 7)
0/495840
DONE (t=10.34s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=133.74s).
Accumulating evaluation results...
DONE (t=36.88s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.311
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.482
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.328
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.110
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.360
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.274
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.424
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.449
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.174
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.531
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.671

yolov5 error:

Traceback (most recent call last):
  File "/path/to/TensorRT/samples/python/efficientdet/eval_coco.py", line 79, in <module>
    main(args)
  File "/path/to/TensorRT/samples/python/efficientdet/eval_coco.py", line 42, in main
    detections = trt_infer.infer(batch, scales, args.nms_threshold)
  File "/path/to/TensorRT/samples/python/efficientdet/infer.py", line 123, in infer
    boxes = outputs[1]
IndexError: list index out of range

Question:

For classification model (which we need the pre-processing option) do you know how to change the pre processing function or do you have any example so I can implement my pre-processing functions.

NOTE:

I am using the eval_coco.py to run the validation and get this.

Thank you in advance.

Harry

Harry-S · September 14, 2022, 1:40pm

Hello @SivaRamaKrishnaNV ,

Actually the efficientDet scripts are not general for all object detection networks because apparently they use “automl” librairy for the validation which supports only EfficientNet and EfficientDet models.

However, I would like to get a simple example how to quantify and calibrate INT8 an object detection standard model (yolov5 for example) using TensorRT and then after that, run the validation on COCO dataset to get the accuracy or the mAP.

EDIT:

I think you can calibrate with the build_engine.py scipt the yolov5 using INT8 calibration but you cannot validate using the eval_coco.py in EfficientDet scripts here.
Could you please confirm that to me please?

Thank you in advance :)

Harry

AastaLLL · September 29, 2022, 7:08am

Hi,

Sorry for the late update.

Do you get the YOLO5 working?
The sample might not be such general since DNN sometimes has its own architecture and output layer name.
But the way of calibration should be similar.

Thanks.

Harry-S · September 29, 2022, 8:55am

Hello @AastaLLL ,

Thank you very much for your reply.

I have done the calibration the same way as EfficientDet was done here.

However, I am always stuck on how to do YOLOv5 inference using TensorRT and get the mAP (mean Average Precision) because it is different from EfficientDet as you told me, and for EfficientDet they used the automl from google which is not compatible with YOLOv5.

I can’t confirm though, if the calibration is good because I can’t have the mAP to see the accuracy of the network in INT8.

1) So for now, I have:

YOLOv5 engine, quantified in INT8 with calibration, using EfficientDet scripts. (not sure if it is well calibrated)

2) What I would like to have:

I would like to have a scipt to run inference on YOLOv5 and get the mAP using TensorRT.

I have found this script here the YOLOv5 official repo, which runs inference using TensorRT if we put this option --weights YOLOv5.engine.
So I thought maybe I can put the calibrated YOLOv5 engine from EfficientDet script that I have already done before.

However, I am facing conflicts on my Jetson Orin with Torch and Torchvision python version.

Jetson AGX Orin:

I am using the Jetpack 5.0.1 DP
I have installed Torch (pytorch) from here with CUDA.
- I have looked in this here to see which version to use that is compatible on my Jetpack version and I found 1.13.
  But, I have not found a compatible version of Torchvision here that is compatible with Torch v1.13 that I have installed, because there is no 1.13 version yet. The last version is 1.12.

NOTE:

I have tried a lot of vesion to see if I can resolve this version conflict but I did not succeded.

So if you could tell me how to resolve this version conflict in YOLOv5 officiel repo on Jetson Orin it will be great, Or, if you have any scripts or idea on how to run YOLOv5 and get the mAP from an already calibrated engine.

Thank you very much.

Harry

AastaLLL · October 3, 2022, 5:54am

Hi,

1. You can find the Torch and corresponding TorchVision version below:

2 To modify the eval_coco.py for the YOLOv5 model,
you can update the source, that calculates the bounding box, with the official YOLOv5 implementation below:

github.com

ultralytics/yolov5/blob/master/detect.py#L122


      
          model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz))  # warmup
          seen, windows, dt = 0, [], (Profile(), Profile(), Profile())
          for path, im, im0s, vid_cap, s in dataset:
              with dt[0]:
                  im = torch.from_numpy(im).to(model.device)
                  im = im.half() if model.fp16 else im.float()  # uint8 to fp16/32
                  im /= 255  # 0 - 255 to 0.0 - 1.0
                  if len(im.shape) == 3:
                      im = im[None]  # expand for batch dim
          
          
    # Inference
              with dt[1]:
                  visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
                  pred = model(im, augment=augment, visualize=visualize)
          
          
    # NMS
              with dt[2]:
                  pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
          
          
    # Second-stage classifier (optional)
              # pred = utils.general.apply_classifier(pred, classifier_model, im, im0s)

Thanks.

Harry-S · October 3, 2022, 7:17am

Hello @AastaLLL ,

Thank you very much for the suggestion. I will try it and let you know about the result :)

Harry

Harry-S · October 3, 2022, 9:25am

Hello again @AastaLLL ,

I have installed the new Jestpack 5.0.2 on my Jetson AGX Orin because there is no Torch with CUDA for my previous version of Jetpack 5.0.1 DP.
After that I have installed pyTorch with CUDA from here it is the 1.13 version I have no choices for this Jetpack.

After that, I cloned the YOLOv5 repo and installed the latest version of Torchvision because I couldn’t find the right version for my version of Torch + CUDA here in the matrix.

So to make it clear, I have installed:

Torch + CUDA from here, - version1.13
Torchvision from here, - version1.13.1

Now when running the YOLOv5 val.py script I have this error below:

(venv_yolov5) usr@ubuntu:/media/usr/B21F-F81E/ORIN/yolov5/yolov5$ python val.py --weights yolov5s.pt --data coco128.yaml --img 640
/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
  warn(f"Failed to load image Python extension: {e}")
val: data=/media/usr/B21F-F81E/ORIN/yolov5/yolov5/data/coco128.yaml, weights=['yolov5s.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 🚀 v6.2-183-gc98128f Python-3.8.10 torch-1.13.0a0+08820cb0.nv22.07 CUDA:0 (Orin, 30536MiB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt to yolov5s.pt...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14.1M/14.1M [00:00<00:00, 22.5MB/s]

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients

Dataset not found ⚠️, missing paths ['/media/usr/B21F-F81E/ORIN/yolov5/datasets/coco128/images/train2017']
Downloading https://ultralytics.com/assets/coco128.zip to coco128.zip...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.66M/6.66M [00:00<00:00, 36.4MB/s]
Dataset download success ✅ (3.8s), saved to /media/usr/B21F-F81E/ORIN/yolov5/datasets
Downloading https://ultralytics.com/assets/Arial.ttf to /home/usr/.config/Ultralytics/Arial.ttf...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 755k/755k [00:00<00:00, 33.3MB/s]
val: Scanning '/media/usr/B21F-F81E/ORIN/yolov5/datasets/coco128/labels/train2017' images and labels...126 found, 2 missing, 0 empty, 0 corrupt: 100%|██████████| 128/128 [00:00<00:00, 2962.29it/s]
val: New cache created: /media/usr/B21F-F81E/ORIN/yolov5/datasets/coco128/labels/train2017.cache
                 Class     Images  Instances          P          R      mAP50   mAP50-95:   0%|          | 0/4 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "val.py", line 406, in <module>
    main(opt)
  File "val.py", line 379, in main
    run(**vars(opt))
  File "/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "val.py", line 219, in run
    preds = non_max_suppression(preds,
  File "/media/usr/B21F-F81E/ORIN/yolov5/yolov5/utils/general.py", line 923, in non_max_suppression
    i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
  File "/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 40, in nms
    _assert_has_ops()
  File "/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torchvision/extension.py", line 33, in _assert_has_ops
    raise RuntimeError(
RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.
Exception in thread Thread-7:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
    return _ForkingPickler.loads(res)
  File "/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 297, in rebuild_storage_fd
    fd = df.detach()
  File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 508, in Client
    answer_challenge(c, authkey)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 752, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

So we can see that it detect my Jetson it says (Orin, 30536MiB), so Torch + CUDA is succesfully installed.
However, when importing Torchvision in python, I have this WARNING message bellow:

(venv_yolov5) usr@ubuntu:/media/usr/B21F-F81E/ORIN/yolov5/yolov5$ python
Python 3.8.10 (default, Jun 22 2022, 20:18:18)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchvision
/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
  warn(f"Failed to load image Python extension: {e}")
>>>

Question:

I believe that Torchvision has not the right version or it is not installed the right way of my Jetson.
So my question is how to install Torchvision with CUDA or how to install it the right way with a version that is compatible with the version of Torch+CUDA on my Jetson AGX Orin.

Thank you in advance @AastaLLL

Harry

AastaLLL · October 4, 2022, 3:13am

Hi,

Do you install TorchVision v0.13.1 (1.13.1 is mentioned above)?

We are going to give it a try.
Will share more information with you later.

Thanks.

Harry-S · October 4, 2022, 7:54am

Hello,

Yes indeed, I have installed TorchVision v0.13.1 and Torch (pyTorch) v1.13 + CUDA as Nvidia suggested.
Below is my pip list:

I will be waiting your results :)

Thank you

Harry-S · October 4, 2022, 1:49pm

Hello again @AastaLLL ,

I have ran the validation on google collab to see if it works there and which Torch and Torch version were used, here below are the result from google collab:

So I think and according to this they have both Torch and Torchvision with CUDA.

So I must install Torchvision with CUDA on my Jetson as well. Maybe this will help you :)

Harry

AastaLLL · October 5, 2022, 4:00am

Hi,

Did you mean if using TorchVision with CUDA support that Torch+TorchVision can work?
Thanks.

Harry-S · October 5, 2022, 7:28am

Hello @AastaLLL ,

Yes, I am using only Torch with CUDA support. However, Torchvision does not have CUDA support, perhaps this cause the issue?
I did not find any tutorial on how to install Torchvision with CUDA support for Jetson devices.

I tried to run the validation script of yolov5 in google collab provided by yolov5 and I printed the versions of Torch and Torchvision and found that both has CUDA support.

I am pretty sure that the issue is from torchvision. Because I have installed Torch with CUDA support according to the Nvidia official tutorial and it can detect very well my GPU:

I think that I need to install the right way Torchvision with CUDA support on my Jetson AGX Orin, what do you think?

Thank you

Harry

Harry-S · October 5, 2022, 12:29pm

Hello @AastaLLL ,

I think I have resolved the problem.

Today 05/10/2022 Nvidia has uploaded a new version of Torch+CUDA support compatible with Jetpack 5.0.2.
So I have installed the last one and I have build Torchvision from source here.

After doing that, I have Torch and TorchVision both with CUDA support I think.

I tried and ran the val.py scipt from yolov5 and it worked. Now I wil try to run it using a TensorRT engine, I hope there will be no issues. I will keep you updated.

Thank you very much :)

Harry

AastaLLL · October 6, 2022, 2:29am

Hi,

Thanks for the testing.
It’s good to know it works now.

Harry-S · October 6, 2022, 7:58am

Hello @AastaLLL ,

It works now, the solution is to install the latest version of Torch+CUDA support from Nvidia and build TorchVision from source.

However, I have a very bad result on yolov5 INT8 quatified and calibrated engine with EfficientDet scripts here.

Results

So I think the calibration is not well done for yolov5, what do you think?

Question

Could you please, tell me how to do the right INT8 calibration for yolov5 using JPEG/JPG image format from COCO dataset like the EfficientDet scripts do?

NOTE:

I get this warning below when generating the engine (maybe this could help you)
I found this thread here as well maybe it is not the calibration but TensorRT!

[TRT] [W]  - Subnormal FP16 values detected. 
[TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[TRT] [W] Weights [name=Conv_195 + PWN(PWN(Sigmoid_196), Mul_197).weight] had the following issues when converted to FP16:
[TRT] [W]  - Subnormal FP16 values detected. 
[TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[TRT] [W] Weights [name=Conv_195 + PWN(PWN(Sigmoid_196), Mul_197).weight] had the following issues when converted to FP16:
[TRT] [W]  - Subnormal FP16 values detected. 
[TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[TRT] [W] Weights [name=Conv_195 + PWN(PWN(Sigmoid_196), Mul_197).weight] had the following issues when converted to FP16:
[TRT] [W]  - Subnormal FP16 values detected. 
[TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[TRT] [W] Weights [name=Conv_198.weight] had the following issues when converted to FP16:
[TRT] [W]  - Subnormal FP16 values detected. 
[TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[TRT] [W] Weights [name=Conv_198.weight] had the following issues when converted to FP16:

Thank you very much for your help @AastaLLL :)

Harry

Topic		Replies	Views
Pytorch & torchversion compatible issue on L4T35.5.0 Jetson Orin Nano pytorch	20	358	November 7, 2024
Get wrong infer results while testing yolov4 on deepstream 5.0 DeepStream SDK	46	9367	October 12, 2021
TLT YOLOv4 (CSPDakrnet53) - TensorRT INT8 model gives wrong predictions (0 mAP) TAO Toolkit yolo	35	3821	December 6, 2021
INT8 Calibration with DS 6.3 worse than with DS 6.0 DeepStream SDK tensorrt , jetson , deepstream , tensorrt-model-optimizer	18	55	November 13, 2024
Pytorch compatibility issues (torch 2.0.0+nv23.5 && torchvision 0.15.1) Jetson Orin NX pytorch	10	15395	June 13, 2023
Deepstream infrence gives no detection TAO Toolkit	28	1932	December 9, 2021
Unable to build model engine for INT8 yolov8m quantized using tensorrt model optimizer TensorRT jetson , deepstream	5	278	September 24, 2024
Torchvision Faster RCNN failed to convert to TensorRT engine TensorRT tensorrt , ubuntu , python	3	1428	October 5, 2023
[ERROR] Model has dynamic shape but no optimization profile specified. Aborted (core dumped) TAO Toolkit	30	2021	December 13, 2021
Unable to generate tensorrt engine using ds-tao-detection app for yolov4_tiny for QAT trained etlt model DeepStream SDK	16	544	June 14, 2023

TensorRT INT8 calibration python API

Question 1

Question 2

Question 3

If I cannot use theses scripts for other models:

resnet18

Question 1 :

Question 2 :

Question 3 :

EfficinetDet results:

yolov5 error:

Question:

NOTE:

EDIT:

1) So for now, I have:

2) What I would like to have:

NOTE:

Question:

Results

Question

NOTE:

Related topics