TensorRT INT8 calibration python API

Hello,

I would like to quantify many standard ONNX models with INT8 calibration using JPEG, JPG images format and after that I would like to have the validation result (Top1 and Top5 accuracy). To do that I have looked on NVIDIA/TensorRT GitHub repo and I saw this here .

According to that repo, we can generate a calibrated engine from EfficientNet ONNX model using JPEG/JPG image format by running build_engine.py. After that, we can do inference and have the validation result of the INT8 calibrated engine by running eval_gt.py. And it works great.

Question 1

can I use the same scripts to first generate a quatify with int8 calibrated engine and second run the validation to any classification model for example resnet18, squeezenet, etc…

Question 2

I would like to do the exact same thing for detection models using this repo here which is dedicated for EfficientDet. So it is possible to use it for others, for example yolov5.

Question 3

If I cannot use theses scripts for other models:

How I can do the calibration that is compatible with my request which means, it takes ONNX standard models, do the calibration with a standard dataset either imagenet(classification) or coco (detection) under JEPG/JPG image format and getting the validation result (Top1 and Top5 accuracy) without adding or modifiying ONNX layers as suggested here, I want just like the python scripts above.

Please, I have already red all your documentation here and all your samples here. All what I need is how to do:

  • Implementation of the calibrator class which is not clear.
  • Quatization with INT8 calibration using JPEG/JPG images format to standard ONNX models
  • Getting the validation results

maybe you could send me an image with quotations which indicates me what to put here and how to modify the code so it will be compatible with my models and how to implement the calibrator class for different model or if you have a video it will be great or eventually an online documentation but it has to be compatible with my request because I already seen all the documentation.

Thank you in advance,

Best regards,
Harry

Dear @Harry-S,
The INT8 calibrator code and evaluation code looks generic. Please test with your model and let us know if you see any issue.

1 Like

Hello @SivaRamaKrishnaNV ,

Thank you very much for your reply.

So I can confirme from this that I can use the the EfficientNet repo here for other standard classification models, as well for the EfficientDet repo here for other standard detection models.

I have already use the EfficicentNet repo for resnet18 and I have use 500 images for the calibration as NVIDIA said it here . I did not choose random 500 images I choose as mlcommons used for thre calibration here I suppose.

However, after doing the calibration and run the validation script I have this as result:

resnet18

Top1 : 66.918
Top5 : 87.354
which is a lot of drop in accuracy with the resnet18 full precision here.

Question 1 :

  • Is that normal that we have a drop of 3% accuracy from full precision FP32 to INT8?

Question 2 :

  • If this is not normal what should I use for calibration? only 500 images or more and which images?

Question 3 :

I have also an other question about the --calib_preprocessor option here when calibrating and --preprocessor option here when validating. By default it is V2 so how to change it according to other standard models?

Best regards,
Harry

Any news ?

Dear @Harry-S,
We don’t have perf numbers of EfficientNet to confirm. But, I remebering seeing 2-3% Drop in few object detection models in past.
The calibration images should cover all activation ranges. You may choose randomly or try increasing number of images.
The preprocessor here is specific to efficientNet. If you are asking in general about any model. You need to write a seperate preprocessing function for each model based on the operations.

Hello @SivaRamaKrishnaNV ,

Thank you for your reply.

I tried to run the object detection samples which does not need an images pre-processing. It works fine with EfficientDet-D0 (see below). I am working now to see how I can find the Top1 accuracy from the mAP.

However, using yolov5n model is ending with an error. Please have a look below so you have a clear vision of the error in yolov5n.


EfficinetDet results:

loading annotations into memory...
Done (t=1.86s)
creating index...
index created!
Loading and preparing results...
Converting ndarray to lists...
(495840, 7)
0/495840
DONE (t=10.34s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=133.74s).
Accumulating evaluation results...
DONE (t=36.88s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.311
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.482
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.328
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.110
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.360
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.274
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.424
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.449
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.174
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.531
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.671

yolov5 error:

Traceback (most recent call last):
  File "/path/to/TensorRT/samples/python/efficientdet/eval_coco.py", line 79, in <module>
    main(args)
  File "/path/to/TensorRT/samples/python/efficientdet/eval_coco.py", line 42, in main
    detections = trt_infer.infer(batch, scales, args.nms_threshold)
  File "/path/to/TensorRT/samples/python/efficientdet/infer.py", line 123, in infer
    boxes = outputs[1]
IndexError: list index out of range

Question:

For classification model (which we need the pre-processing option) do you know how to change the pre processing function or do you have any example so I can implement my pre-processing functions.

NOTE:

I am using the eval_coco.py to run the validation and get this.

Thank you in advance.

Harry

Hello @SivaRamaKrishnaNV ,

Actually the efficientDet scripts are not general for all object detection networks because apparently they use β€œautoml” librairy for the validation which supports only EfficientNet and EfficientDet models.

However, I would like to get a simple example how to quantify and calibrate INT8 an object detection standard model (yolov5 for example) using TensorRT and then after that, run the validation on COCO dataset to get the accuracy or the mAP.

EDIT:

I think you can calibrate with the build_engine.py scipt the yolov5 using INT8 calibration but you cannot validate using the eval_coco.py in EfficientDet scripts here.
Could you please confirm that to me please?

Thank you in advance :)

Harry

Hi,

Sorry for the late update.

Do you get the YOLO5 working?
The sample might not be such general since DNN sometimes has its own architecture and output layer name.
But the way of calibration should be similar.

Thanks.

1 Like

Hello @AastaLLL ,

Thank you very much for your reply.

I have done the calibration the same way as EfficientDet was done here.

However, I am always stuck on how to do YOLOv5 inference using TensorRT and get the mAP (mean Average Precision) because it is different from EfficientDet as you told me, and for EfficientDet they used the automl from google which is not compatible with YOLOv5.

I can’t confirm though, if the calibration is good because I can’t have the mAP to see the accuracy of the network in INT8.

1) So for now, I have:

  • YOLOv5 engine, quantified in INT8 with calibration, using EfficientDet scripts. (not sure if it is well calibrated)

2) What I would like to have:

  • I would like to have a scipt to run inference on YOLOv5 and get the mAP using TensorRT.

I have found this script here the YOLOv5 official repo, which runs inference using TensorRT if we put this option --weights YOLOv5.engine.
So I thought maybe I can put the calibrated YOLOv5 engine from EfficientDet script that I have already done before.

However, I am facing conflicts on my Jetson Orin with Torch and Torchvision python version.

Jetson AGX Orin:

  • I am using the Jetpack 5.0.1 DP
  • I have installed Torch (pytorch) from here with CUDA.
    • I have looked in this here to see which version to use that is compatible on my Jetpack version and I found 1.13.
      But, I have not found a compatible version of Torchvision here that is compatible with Torch v1.13 that I have installed, because there is no 1.13 version yet. The last version is 1.12.

NOTE:

I have tried a lot of vesion to see if I can resolve this version conflict but I did not succeded.

So if you could tell me how to resolve this version conflict in YOLOv5 officiel repo on Jetson Orin it will be great, Or, if you have any scripts or idea on how to run YOLOv5 and get the mAP from an already calibrated engine.

Thank you very much.

Harry

Hi,

1. You can find the Torch and corresponding TorchVision version below:

2 To modify the eval_coco.py for the YOLOv5 model,
you can update the source, that calculates the bounding box, with the official YOLOv5 implementation below:

Thanks.

1 Like

Hello @AastaLLL ,

Thank you very much for the suggestion. I will try it and let you know about the result :)

Harry

Hello again @AastaLLL ,

I have installed the new Jestpack 5.0.2 on my Jetson AGX Orin because there is no Torch with CUDA for my previous version of Jetpack 5.0.1 DP.
After that I have installed pyTorch with CUDA from here it is the 1.13 version I have no choices for this Jetpack.

After that, I cloned the YOLOv5 repo and installed the latest version of Torchvision because I couldn’t find the right version for my version of Torch + CUDA here in the matrix.

So to make it clear, I have installed:

  • Torch + CUDA from here, - version1.13
  • Torchvision from here, - version1.13.1

Now when running the YOLOv5 val.py script I have this error below:

(venv_yolov5) usr@ubuntu:/media/usr/B21F-F81E/ORIN/yolov5/yolov5$ python val.py --weights yolov5s.pt --data coco128.yaml --img 640
/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
  warn(f"Failed to load image Python extension: {e}")
val: data=/media/usr/B21F-F81E/ORIN/yolov5/yolov5/data/coco128.yaml, weights=['yolov5s.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.6, max_det=300, task=val, device=, workers=8, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=False, project=runs/val, name=exp, exist_ok=False, half=False, dnn=False
YOLOv5 πŸš€ v6.2-183-gc98128f Python-3.8.10 torch-1.13.0a0+08820cb0.nv22.07 CUDA:0 (Orin, 30536MiB)

Downloading https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt to yolov5s.pt...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 14.1M/14.1M [00:00<00:00, 22.5MB/s]

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients

Dataset not found ⚠️, missing paths ['/media/usr/B21F-F81E/ORIN/yolov5/datasets/coco128/images/train2017']
Downloading https://ultralytics.com/assets/coco128.zip to coco128.zip...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.66M/6.66M [00:00<00:00, 36.4MB/s]
Dataset download success βœ… (3.8s), saved to /media/usr/B21F-F81E/ORIN/yolov5/datasets
Downloading https://ultralytics.com/assets/Arial.ttf to /home/usr/.config/Ultralytics/Arial.ttf...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 755k/755k [00:00<00:00, 33.3MB/s]
val: Scanning '/media/usr/B21F-F81E/ORIN/yolov5/datasets/coco128/labels/train2017' images and labels...126 found, 2 missing, 0 empty, 0 corrupt: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 128/128 [00:00<00:00, 2962.29it/s]
val: New cache created: /media/usr/B21F-F81E/ORIN/yolov5/datasets/coco128/labels/train2017.cache
                 Class     Images  Instances          P          R      mAP50   mAP50-95:   0%|          | 0/4 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "val.py", line 406, in <module>
    main(opt)
  File "val.py", line 379, in main
    run(**vars(opt))
  File "/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "val.py", line 219, in run
    preds = non_max_suppression(preds,
  File "/media/usr/B21F-F81E/ORIN/yolov5/yolov5/utils/general.py", line 923, in non_max_suppression
    i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS
  File "/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 40, in nms
    _assert_has_ops()
  File "/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torchvision/extension.py", line 33, in _assert_has_ops
    raise RuntimeError(
RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.
Exception in thread Thread-7:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
    return _ForkingPickler.loads(res)
  File "/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 297, in rebuild_storage_fd
    fd = df.detach()
  File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 508, in Client
    answer_challenge(c, authkey)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 752, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

So we can see that it detect my Jetson it says (Orin, 30536MiB), so Torch + CUDA is succesfully installed.
However, when importing Torchvision in python, I have this WARNING message bellow:

(venv_yolov5) usr@ubuntu:/media/usr/B21F-F81E/ORIN/yolov5/yolov5$ python
Python 3.8.10 (default, Jun 22 2022, 20:18:18)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchvision
/home/usr/Documents/venv_yolov5/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
  warn(f"Failed to load image Python extension: {e}")
>>>

Question:

I believe that Torchvision has not the right version or it is not installed the right way of my Jetson.
So my question is how to install Torchvision with CUDA or how to install it the right way with a version that is compatible with the version of Torch+CUDA on my Jetson AGX Orin.

Thank you in advance @AastaLLL

Harry

Hi,

Do you install TorchVision v0.13.1 (1.13.1 is mentioned above)?

We are going to give it a try.
Will share more information with you later.

Thanks.

1 Like

Hello,

Yes indeed, I have installed TorchVision v0.13.1 and Torch (pyTorch) v1.13 + CUDA as Nvidia suggested.
Below is my pip list:

image

I will be waiting your results :)

Thank you

Hello again @AastaLLL ,

I have ran the validation on google collab to see if it works there and which Torch and Torch version were used, here below are the result from google collab:

image

So I think and according to this they have both Torch and Torchvision with CUDA.

So I must install Torchvision with CUDA on my Jetson as well. Maybe this will help you :)

Harry