Description & A clear and concise description of the bug or issue.
I have my model in onnx format. I am trying to create a .engine
file in the jetson xavier NX platform. I have tested that the model works fine in a desktop environment using onnxruntime. However, when I convert the model to .engine
using trtexec
I get bad results (using the following command):
./trtexec --onnx=resnetUnknown.onnx --int8 --saveEngine=resnetUnknown_batch5.engine --verbose
My best guess is that I need to use a calibration file to align the calibration, but I can’t find useful information on how to format the data in the file. I am guessing it’s a binary file with a sequence of floats, so if my data is 5x1x224x224, then I would have 250880 values per sample in the file, where I would just sequentially add each float value in binary format. I do this with a python script. Then I try to construct a .engine file with the following command:
./trtexec --onnx=resnetUnknown.onnx --int8 --saveEngine=resnetUnknown_batch5.engine --verbose --calib=calibration_data.bin
and I get the following error:
[08/11/2023-16:02:41] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 939, GPU 3127 (MiB)
[08/11/2023-16:02:41] [V] [TRT] Total per-runner device persistent memory is 0
[08/11/2023-16:02:41] [V] [TRT] Total per-runner host persistent memory is 1728
[08/11/2023-16:02:41] [V] [TRT] Allocated activation device memory of size 36126720
[08/11/2023-16:02:41] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +34, now: CPU 0, GPU 38 (MiB)
[08/11/2023-16:02:41] [V] [TRT] Calculating Maxima
[08/11/2023-16:02:41] [I] [TRT] Starting Calibration.
[08/11/2023-16:02:41] [E] Error[2]: [calibrator.cu::absTensorMax::141] Error Code 2: Internal Error (Assertion memory != nullptr failed. memory must be valid if nbElem != 0)
[08/11/2023-16:02:41] [V] [TRT] Trying to load shared library libcudnn.so.8
[08/11/2023-16:02:41] [V] [TRT] Loaded shared library libcudnn.so.8
[08/11/2023-16:02:41] [V] [TRT] Trying to load shared library libcudnn.so.8
[08/11/2023-16:02:41] [V] [TRT] Loaded shared library libcudnn.so.8
[08/11/2023-16:02:41] [E] Error[1]: [convolutionRunner.cpp::executeConv::462] Error Code 1: Cudnn (CUDNN_STATUS_BAD_PARAM)
[08/11/2023-16:02:41] [E] Error[3]: [engine.cpp::~Engine::306] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/engine.cpp::~Engine::306, condition: mObjectCounter.use_count() == 1. Destroying an engine object before destroying objects it created leads to undefined behavior.
)
[08/11/2023-16:02:41] [E] Error[2]: [calibrator.cpp::calibrateEngine::1181] Error Code 2: Internal Error (Assertion context->executeV2(&bindings[0]) failed. )
[08/11/2023-16:02:41] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[08/11/2023-16:02:41] [E] Engine could not be created from network
[08/11/2023-16:02:41] [E] Building engine failed
[08/11/2023-16:02:41] [E] Failed to create engine from model or file.
[08/11/2023-16:02:41] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8502]
Now, if I try to construct a .engine file with the following command (no --int8 command):
./trtexec --onnx=resnetUnknown.onnx --saveEngine=resnetUnknown_batch5.engine --verbose --calib=calibration_data.bin
Then the optimization runs through, but seems to me like the –calib file is not being used in the process. I test the resulting .engine
file and it gives me bad predictions (just like in the first case).
Environment
TensorRT Version: TensorRT 8.5.2
GPU Type: Nvidia Xavier NX JetPack 5.1.1
CUDA Version: CUDA 11.4.19
CUDNN Version: cuDNN 8.6.0
Relevant Files
How can I share files privately? I don’t want to do it publicly.