I am working on conversion of an ONNX model to TRT. I successfully converted the ONNX model from TRT in FP16 and it could run on the DeepStream. However, when I tried to convert the ONNX model to INT8 precision mode. It could only generate the engine file without calibration cache. I already implemented the INT8 calibrator with write_calibration_cache function. Can anyone tell me how to save the calibration cache ? Thanks.
Request you to check the sample link to validate on any missing step.
@AakankshaS Thanks. After replacing the build_engine() with build_cuda_engine(). However, I do not know why build_engine() is not entering the calibration while build_cuda_engine() does. With build_cuda_engine(), the conversion enters the INT8 conversion pipeline but error occurs as the following.
[TensorRT] VERBOSE: Total Host Persistent Memory: 306992
[TensorRT] VERBOSE: Total Device Persistent Memory: 265933824
[TensorRT] VERBOSE: Total Weight Memory: 0
[TensorRT] VERBOSE: Builder timing cache: created 141 entries, 1340 hit(s)
[TensorRT] VERBOSE: Engine generation completed in 94.061 seconds.
[TensorRT] VERBOSE: Calculating Maxima
[TensorRT] INFO: Starting Calibration.
2020-08-04 11:43:37 - ImageCalibrator - INFO - Calibration images pre-processed: 8/200
(8, 3, 608, 608)
Traceback (most recent call last):
File “createEngine.py”, line 180, in
File “createEngine.py”, line 167, in main
File “createEngine.py”, line 124, in build_engine
engine = builder.build_cuda_engine(network)
RuntimeError: Unable to cast Python instance to C++ type (compile in debug mode for details)
Process finished with exit code 1
I have correctly configured the build_cuda_engine() by setting the IBuilderConfig after checking the TensorRT Python doc. But I encountered the same error as build_engine() now.
@AakankshaS I just solved the problem by modify the output from get_batch() as the list of output as this [output] as well as using the build_engine() with IBuilderConfig set. And it is running now. Thanks.