TensorRT 8.0.3 imagenet resnet model INT8 conversion identical output with different input after calibration

Description

I am trying to convert my resnet100 for classification tasks into the INT8 format. I have the inference code, and validation code, and was able to confirm FP32 and FP16 modes both gave accurate results. However, INT8 gave me a lot of trouble.

After carefully following the image_batcher.py and build_engine.py in TensorRT/samples/python/efficientnet at master · NVIDIA/TensorRT · GitHub, I was able to use identical preprocessing steps (to inference) in the batcher, and I validated the range of the preprocessed input tensors. They seemed correct. The program generated a calibration file with my directory of 5000 images (sample below used less). And it eventually saved an engine to disk. However, when I tried to examine the feature vectors, different input images will produce identical output, and they are incorrect, compared with same code with FP32 and FP16 modes. My evaluation just ended up having 0 accuracy.

My onnx model had batch axis as dynamic, so I added a dynamic profile with batch size ranging from 1 to 256. It didn’t matter what batch size calibration used.

Environment

TensorRT Version : 8.0.3
GPU Type : Tesla T4
Nvidia Driver Version : 450.119.03
CUDA Version : In container cuda-11.5
CUDNN Version :
Operating System + Version : ec2 instance
Python Version (if applicable) : 3.8.10
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) : Container nvcr.io/nvidia/tensorrt:21.11-py3

Steps To Reproduce

Here is my customization to TensorRT/build_engine.py at master · NVIDIA/TensorRT · GitHub

        log.info("Network Description")
        for input in inputs:
            self.batch_size = args.calib_batch_size
            log.info("Input '{}' with shape {} and dtype {}".format(input.name, input.shape, input.dtype))
            profile = self.builder.create_optimization_profile()
            profile.set_shape(input.name, (1, 3, input.shape[2], input.shape[3]),
                                          (128, 3, input.shape[2], input.shape[3]),
                                          (256, 3, input.shape[2], input.shape[3]))
            self.config.add_optimization_profile(profile)
        for output in outputs:
            log.info("Output '{}' with shape {} and dtype {}".format(output.name, output.shape, output.dtype))
        assert self.batch_size > 0
        self.builder.max_batch_size = 256

Here is the output from that build script:

[TensorRT] INFO: [MemUsageChange] Init CUDA: CPU +321, GPU +0, now: CPU 342, GPU 252 (MiB)
[TensorRT] WARNING: onnx2trt_utils.cpp:362: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
INFO:EngineBuilder:Network Description
INFO:EngineBuilder:Input 'input.1' with shape (-1, 3, 224, 224) and dtype DataType.FLOAT
INFO:EngineBuilder:Output '1333' with shape (-1, 512) and dtype DataType.FLOAT
INFO:EngineBuilder:Building int8 Engine in /training/models/resnet100.trt
/training/docker-tensorrt-workbench/build_engine.py:209: DeprecationWarning: Use build_serialized_network instead.
  with self.builder.build_engine(self.network, self.config) as engine, open(engine_path, "wb") as f:
[TensorRT] INFO: [MemUsageSnapshot] Builder begin: CPU 592 MiB, GPU 254 MiB
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +507, GPU +220, now: CPU 1101, GPU 474 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +114, GPU +52, now: CPU 1215, GPU 526 (MiB)
[TensorRT] WARNING: Calibration Profile is not defined. Running calibration with Profile 0
[TensorRT] INFO: Detected 1 inputs and 1 output network tensors.
[TensorRT] INFO: Total Host Persistent Memory: 22528
[TensorRT] INFO: Total Device Persistent Memory: 0
[TensorRT] INFO: Total Scratch Memory: 0
[TensorRT] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1738, GPU 1002 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 1739, GPU 1010 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1738, GPU 994 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1738, GPU 978 (MiB)
[TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation begin: CPU 1738 MiB, GPU 978 MiB
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1738, GPU 986 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1738, GPU 994 (MiB)
[TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation end: CPU 1738 MiB, GPU 1902 MiB
[TensorRT] INFO: Starting Calibration.
INFO:EngineBuilder:Calibrating image 4 / 40
[TensorRT] INFO:   Calibrated batch 0 in 3.5102 seconds.
INFO:EngineBuilder:Calibrating image 8 / 40
[TensorRT] INFO:   Calibrated batch 1 in 3.51987 seconds.
INFO:EngineBuilder:Calibrating image 12 / 40
[TensorRT] INFO:   Calibrated batch 2 in 3.54024 seconds.
INFO:EngineBuilder:Calibrating image 16 / 40
[TensorRT] INFO:   Calibrated batch 3 in 3.54137 seconds.
INFO:EngineBuilder:Calibrating image 20 / 40
[TensorRT] INFO:   Calibrated batch 4 in 3.5381 seconds.
INFO:EngineBuilder:Calibrating image 24 / 40
[TensorRT] INFO:   Calibrated batch 5 in 3.53881 seconds.
INFO:EngineBuilder:Calibrating image 28 / 40
[TensorRT] INFO:   Calibrated batch 6 in 3.56045 seconds.
INFO:EngineBuilder:Calibrating image 32 / 40
[TensorRT] INFO:   Calibrated batch 7 in 3.56059 seconds.
INFO:EngineBuilder:Calibrating image 36 / 40
[TensorRT] INFO:   Calibrated batch 8 in 3.55971 seconds.
INFO:EngineBuilder:Calibrating image 40 / 40
[TensorRT] INFO:   Calibrated batch 9 in 3.56199 seconds.
INFO:EngineBuilder:Finished calibration batches
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1746, GPU 1886 (MiB)
[TensorRT] INFO:   Post Processing Calibration data in 38.0515 seconds.
[TensorRT] INFO: Calibration completed in 78.6834 seconds.
[TensorRT] INFO: Writing Calibration Cache for calibrator: TRT-8003-EntropyCalibration2
INFO:EngineBuilder:Writing calibration cache data to: calibration.cache
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1781, GPU 734 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1781, GPU 742 (MiB)
[TensorRT] WARNING: Detected invalid timing cache, setup a local cache instead
[TensorRT] INFO: Detected 1 inputs and 1 output network tensors.
[TensorRT] INFO: Total Host Persistent Memory: 243872
[TensorRT] INFO: Total Device Persistent Memory: 53171200
[TensorRT] INFO: Total Scratch Memory: 512
[TensorRT] INFO: [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 148 MiB, GPU 910 MiB
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1870, GPU 828 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1870, GPU 836 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1870, GPU 820 (MiB)
[TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1869, GPU 804 (MiB)

Here is my generated calibration cache:

TRT-8003-EntropyCalibration2
input.1: 7f800000
1334: 7f800000
(Unnamed Layer* 1) [Constant]_output: 3bd7b8b7
(Unnamed Layer* 2) [Shuffle]_output: 3bd7b8b7
929: 7f800000
930: 7f800000
1337: 7f800000
(Unnamed Layer* 6) [Constant]_output: 3b0efec6
(Unnamed Layer* 7) [Shuffle]_output: 3b0efec6
934: 7f800000
1340: 7f800000
1343: 7ab458b8
939: 7f800000
940: 7f800000
1346: 7f800000
(Unnamed Layer* 14) [Constant]_output: 3ad4d621
(Unnamed Layer* 15) [Shuffle]_output: 3ad4d621
944: 7f800000
1349: 7a90e807
947: 7f800000
948: 7f800000
1352: 7f800000
(Unnamed Layer* 21) [Constant]_output: 3a969a81
(Unnamed Layer* 22) [Shuffle]_output: 3a969a81
952: 7f800000
1355: 7b035e84
955: 7f800000
956: 7f800000
1358: 7f800000
(Unnamed Layer* 28) [Constant]_output: 3b0ee760
(Unnamed Layer* 29) [Shuffle]_output: 3b0ee760
960: 7f800000
1361: 7a4e9ef7
1364: 7a0ed897
965: 7a805efb
966: 79ff7b20
1367: 7a0bd9a8
(Unnamed Layer* 36) [Constant]_output: 3acaee93
(Unnamed Layer* 37) [Shuffle]_output: 3acaee93
970: 79a74866
1370: 7988f33e
973: 7a24191d
974: 79b4ce9c
1373: 7984a58b
(Unnamed Layer* 43) [Constant]_output: 3aab3324
(Unnamed Layer* 44) [Shuffle]_output: 3aab3324
978: 78901743
1376: 760b50ec
981: 760b50ec
982: 761d6dcb
1379: 75e783b4
(Unnamed Layer* 50) [Constant]_output: 3a8b7a92
(Unnamed Layer* 51) [Shuffle]_output: 3a8b7a92
986: 750782a1
1382: 3ad03e59
989: 3bbd7be6
990: 3aff2390
1385: 3b75658c
(Unnamed Layer* 57) [Constant]_output: 3a9a2797
(Unnamed Layer* 58) [Shuffle]_output: 3a9a2797
994: 3b367975
1388: 3b578819
997: 3c0d3eb2
998: 3b7a7d0e
1391: 3b92bad5
(Unnamed Layer* 64) [Constant]_output: 3a9f9490
(Unnamed Layer* 65) [Shuffle]_output: 3a9f9490
1002: 3b2019b6
1394: 3b81d8d4
1005: 3c50a850
1006: 3ba35a12
1397: 3bc35d71
(Unnamed Layer* 71) [Constant]_output: 3af3ad53
(Unnamed Layer* 72) [Shuffle]_output: 3af3ad53
1010: 3b2e08b5
1400: 3b8c3ecb
1013: 3c5b1b0f
1014: 3b954245
1403: 3ba2990d
(Unnamed Layer* 78) [Constant]_output: 3aca1921
(Unnamed Layer* 79) [Shuffle]_output: 3aca1921
1018: 3b1c5c5f
1406: 3bcbac74
1021: 3c5ee144
1022: 3b990fbf
1409: 3baa9825
(Unnamed Layer* 85) [Constant]_output: 3b05ee7a
(Unnamed Layer* 86) [Shuffle]_output: 3b05ee7a
1026: 3b207e7b
1412: 3b8a2d2d
1029: 3c6ec711
1030: 3ba839f0
1415: 3b90c2e7
(Unnamed Layer* 92) [Constant]_output: 3abfafff
(Unnamed Layer* 93) [Shuffle]_output: 3abfafff
1034: 3b12b5ec
1418: 3b84d275
1037: 3c828356
1038: 3b62060b
1421: 3b82c46e
(Unnamed Layer* 99) [Constant]_output: 3ad0594a
(Unnamed Layer* 100) [Shuffle]_output: 3ad0594a
1042: 3b23cf3f
1424: 3bc19cb1
1045: 3c9c3911
1046: 3b89f6de
1427: 3b8b7f01
(Unnamed Layer* 106) [Constant]_output: 3aa2b279
(Unnamed Layer* 107) [Shuffle]_output: 3aa2b279
1050: 3b0d6831
1430: 3bbff987
1053: 3cb5e90b
1054: 3b989c5e
1433: 3b7f3220
(Unnamed Layer* 113) [Constant]_output: 3aacd63c
(Unnamed Layer* 114) [Shuffle]_output: 3aacd63c
1058: 3b186342
1436: 3ba70bdc
1061: 3cde4ad0
1062: 3bd0bc14
1439: 3bf217bf
(Unnamed Layer* 120) [Constant]_output: 3ad4cfe5
(Unnamed Layer* 121) [Shuffle]_output: 3ad4cfe5
1066: 3bf217bf
1442: 3b567412
1445: 3b1ea081
1071: 3b9bc123
1072: 3ab5bc2e
1448: 3b6c1ffb
(Unnamed Layer* 128) [Constant]_output: 3a8a53d7
(Unnamed Layer* 129) [Shuffle]_output: 3a8a53d7
1076: 3b0f86e4
1451: 3ae20f9c
1079: 3b9b02ba
1080: 3b61be4c
1454: 3b88f7e0
(Unnamed Layer* 135) [Constant]_output: 3a7a4331
(Unnamed Layer* 136) [Shuffle]_output: 3a7a4331
1084: 3ab1e180
1457: 3ac3fe16
1087: 3bbc7fbb
1088: 3b401555
1460: 3b5af4f0
(Unnamed Layer* 142) [Constant]_output: 3a8e7af2
(Unnamed Layer* 143) [Shuffle]_output: 3a8e7af2
1092: 3af870e7
1463: 3ac71e86
1095: 3bcb401c
1096: 3b603f99
1466: 3bd4708f
(Unnamed Layer* 149) [Constant]_output: 3a84c7f1
(Unnamed Layer* 150) [Shuffle]_output: 3a84c7f1
1100: 3bdb0064
1469: 3b16dfcb
1103: 3be5579d
1104: 3b74ecb7
1472: 3ba9e62c
(Unnamed Layer* 156) [Constant]_output: 3a6ebd68
(Unnamed Layer* 157) [Shuffle]_output: 3a6ebd68
1108: 3baefd5f
1475: 3b1ca833
1111: 3be97e83
1112: 3b8e5559
1478: 3bb9d60f
(Unnamed Layer* 163) [Constant]_output: 3a8852d7
(Unnamed Layer* 164) [Shuffle]_output: 3a8852d7
1116: 3b895b6f
1481: 3b35a312
1119: 3bd62fe7
1120: 3b84fbed
1484: 3baef164
(Unnamed Layer* 170) [Constant]_output: 3afc2d20
(Unnamed Layer* 171) [Shuffle]_output: 3afc2d20
1124: 3baef164
1487: 3b81bb3c
1127: 3c0d8b7d
1128: 3b8e2fbd
1490: 3ba869e3
(Unnamed Layer* 177) [Constant]_output: 3a82e480
(Unnamed Layer* 178) [Shuffle]_output: 3a82e480
1132: 3ae0018f
1493: 3b33bcab
1135: 3be3b76a
1136: 3b8dc703
1496: 3b96fe57
(Unnamed Layer* 184) [Constant]_output: 3a8c7e8e
(Unnamed Layer* 185) [Shuffle]_output: 3a8c7e8e
1140: 3b1ec9e1
1499: 3b515594
1143: 3c02dfd2
1144: 3b92e7f4
1502: 3b941d08
(Unnamed Layer* 191) [Constant]_output: 3a8f4047
(Unnamed Layer* 192) [Shuffle]_output: 3a8f4047
1148: 3b1e490d
1505: 3b08736e
1151: 3c194e1c
1152: 3b65e9cf
1508: 3ba6edda
(Unnamed Layer* 198) [Constant]_output: 3a94b4c9
(Unnamed Layer* 199) [Shuffle]_output: 3a94b4c9
1156: 3af0949b
1511: 3aec2aab
1159: 3c19bd98
1160: 3b5d276d
1514: 3b61ef4c
(Unnamed Layer* 205) [Constant]_output: 3a90d257
(Unnamed Layer* 206) [Shuffle]_output: 3a90d257
1164: 3ac0786d
1517: 3aff5b5b
1167: 3bf868e0
1168: 3b262f60
1520: 3b8cca4b
(Unnamed Layer* 212) [Constant]_output: 3a83b99b
(Unnamed Layer* 213) [Shuffle]_output: 3a83b99b
1172: 3ab3616a
1523: 3ac8cd82
1175: 3c299b8e
1176: 3b14345f
1526: 3b832ddd
(Unnamed Layer* 219) [Constant]_output: 3a8d6282
(Unnamed Layer* 220) [Shuffle]_output: 3a8d6282
1180: 3aaa8079
1529: 3ac64a56
1183: 3c1b556f
1184: 3b08705c
1532: 3b434171
(Unnamed Layer* 226) [Constant]_output: 3aa7d55d
(Unnamed Layer* 227) [Shuffle]_output: 3aa7d55d
1188: 3ab8e3b8
1535: 3b0429e5
1191: 3c06cd05
1192: 3b4151d5
1538: 3b810394
(Unnamed Layer* 233) [Constant]_output: 3a9977d9
(Unnamed Layer* 234) [Shuffle]_output: 3a9977d9
1196: 3aa41975
1541: 3acf2a74
1199: 3c25753f
1200: 3b2ee3d3
1544: 3b740030
(Unnamed Layer* 240) [Constant]_output: 3a800fad
(Unnamed Layer* 241) [Shuffle]_output: 3a800fad
1204: 3ad16109
1547: 3b1a54fe
1207: 3c24e6fc
1208: 3b31d579
1550: 3b6d1c5f
(Unnamed Layer* 247) [Constant]_output: 3a810c04
(Unnamed Layer* 248) [Shuffle]_output: 3a810c04
1212: 3a98343e
1553: 3b006418
1215: 3c2aba7e
1216: 3b258b3c
1556: 3b61e917
(Unnamed Layer* 254) [Constant]_output: 3a833126
(Unnamed Layer* 255) [Shuffle]_output: 3a833126
1220: 3ab9ad44
1559: 3b346df2
1223: 3c0c8b8f
1224: 3b279bc7
1562: 3b7c5105
(Unnamed Layer* 261) [Constant]_output: 3a859747
(Unnamed Layer* 262) [Shuffle]_output: 3a859747
1228: 3ac08109
1565: 3b2a6c65
1231: 3c4d5550
1232: 3b2f53ac
1568: 3b49f838
(Unnamed Layer* 268) [Constant]_output: 3a96c0f3
(Unnamed Layer* 269) [Shuffle]_output: 3a96c0f3
1236: 3aa7961d
1571: 3b2b98bd
1239: 3c3e7f44
1240: 3b324498
1574: 3b715eae
(Unnamed Layer* 275) [Constant]_output: 3a899b80
(Unnamed Layer* 276) [Shuffle]_output: 3a899b80
1244: 3abc6eae
1577: 3b50fa3f
1247: 3c4f6214
1248: 3b3df409
1580: 3b6e2524
(Unnamed Layer* 282) [Constant]_output: 3a9bc15c
(Unnamed Layer* 283) [Shuffle]_output: 3a9bc15c
1252: 3a9968a7
1583: 3b45157d
1255: 3c51d5c1
1256: 3b33bfa2
1586: 3b7c72dc
(Unnamed Layer* 289) [Constant]_output: 3a8b47e3
(Unnamed Layer* 290) [Shuffle]_output: 3a8b47e3
1260: 3ac21a64
1589: 3b461591
1263: 3c557fe0
1264: 3b255ca6
1592: 3b72b93c
(Unnamed Layer* 296) [Constant]_output: 3aaa84ce
(Unnamed Layer* 297) [Shuffle]_output: 3aaa84ce
1268: 3a949169
1595: 3b999bf0
1271: 3c5c96a2
1272: 3b2b422f
1598: 3b92f3aa
(Unnamed Layer* 303) [Constant]_output: 3a92430a
(Unnamed Layer* 304) [Shuffle]_output: 3a92430a
1276: 3a8f6092
1601: 3b8d44dc
1279: 3c729db1
1280: 3b42c812
1604: 3ba398b1
(Unnamed Layer* 310) [Constant]_output: 3a9426dd
(Unnamed Layer* 311) [Shuffle]_output: 3a9426dd
1284: 3acf4f80
1607: 3b566ea9
1287: 3c5d79a7
1288: 3b2e0069
1610: 3b8acd90
(Unnamed Layer* 317) [Constant]_output: 3a938d19
(Unnamed Layer* 318) [Shuffle]_output: 3a938d19
1292: 3ade2fe6
1613: 3bc06970
1295: 3c673039
1296: 3b29fe0a
1616: 3b71492c
(Unnamed Layer* 324) [Constant]_output: 3a774f86
(Unnamed Layer* 325) [Shuffle]_output: 3a774f86
1300: 3ae6a4b3
1619: 3b9b0e4b
1303: 3c7e875d
1304: 3b8354c6
1622: 3b6ce6b1
(Unnamed Layer* 331) [Constant]_output: 3ae74fa8
(Unnamed Layer* 332) [Shuffle]_output: 3ae74fa8
1308: 3b1469cb
1625: 3ad35bc4
1628: 3a41e752
1313: 3b195750
1314: 3afa5103
1631: 3bafb5d5
(Unnamed Layer* 339) [Constant]_output: 3ace4739
(Unnamed Layer* 340) [Shuffle]_output: 3ace4739
1318: 3ac85277
1634: 3ae1731b
1321: 3b3926ee
1322: 3b39e7e9
1637: 3b935125
(Unnamed Layer* 346) [Constant]_output: 3b0ad1c8
(Unnamed Layer* 347) [Shuffle]_output: 3b0ad1c8
1326: 3ad43734
1640: 3b0e8937
1329: 3b551758
1330: 3b54c765
1331: 3b54c765
(Unnamed Layer* 363) [Shuffle]_output: 3b54c765
(Unnamed Layer* 364) [Fully Connected]_output: 3b14cb0e
1332: 3b14cb0e
(Unnamed Layer* 374) [Shuffle]_output: 3b14cb0e
(Unnamed Layer* 375) [Scale]_output: 3cb503e3
1333: 3cb503e3

I don’t know how to decipher this.

Hi, Please refer to the below links to perform inference in INT8

Thanks!

I read those two articles. I am asking how the python example TensorRT/samples/python/efficientnet at master · NVIDIA/TensorRT · GitHub is different from your two links, and what could be inferred from my description of the problem.

Hi,

Could you please try on latest TRT 8.2 release. If you still face this issue, we recommend you to post your concern on Issues · NVIDIA/TensorRT · GitHub to get better help on this.

Thank you.