Int8 Calibration Failed on custom layers

ncxinhanzhong · March 5, 2020, 2:46pm

I created a customized onnx model as well as corresponding TensorRT plugins, and I can successfully convert the onnx model to TensorRT engine with fp32 and fp16 mode.

But when I try doing int8 calibration directly on my model, I met the following error:

[2020-02-28 05:23:22 ERROR] FAILED_ALLOCATION: std::exception
[2020-02-28 05:23:22 ERROR] Requested amount of memory (18446744065119617096 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[2020-02-28 05:23:22 ERROR] /home/jenkins/workspace/TensorRT/helpers/rel-6.0/L1_Nightly/build/source/rtSafe/resources.h (57) - OutOfMemory Error in CpuMemory: 0

[2020-02-28 05:23:22 ERROR] FAILED_ALLOCATION: std::exception
[2020-02-28 05:23:22 ERROR] Requested amount of memory (18446744065119617096 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[2020-02-28 05:23:22 ERROR] /home/jenkins/workspace/TensorRT/helpers/rel-6.0/L1_Nightly/build/source/rtSafe/resources.h (57) - OutOfMemory Error in CpuMemory: 0
[2020-02-28 05:23:22 ERROR] FAILED_ALLOCATION: std::exception
[2020-02-28 05:23:22 ERROR] Requested amount of memory (18446744065119617096 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
terminate called after throwing an instance of 'std::out_of_range'
what(): _Map_base::at

Weird thing is I succeed the calibration process by providing the program a fake cache table, this cache table is generated from a subset of the whole model (say the backbone) calibration.

My machine has memory of 64GB, so I am confused why this allocation exception happen.

Anyone can help? Thanks.
Need more information? Please let me know.

SunilJB · March 9, 2020, 5:04pm

Hi,

Could you please provide details on the platforms you are using:
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version
If possible, please share the script & model file to reproduce the issue.

Thanks

ncxinhanzhong · March 10, 2020, 6:02am

Hi,

OS: Ubuntu 16.04
GPU: 2080ti
driver: 418.56
CUDA version: 10.1
CUDNN:
Python: 3.7
PyTorch: 1.4.0
TensorRT Version: 6.0.1.5

SunilJB · March 13, 2020, 2:00am

Could you please share the script & model file to reproduce the issue?

Thanks

ncxinhanzhong · March 20, 2020, 2:42am

Sorry, but I cannot provide any scripts.

SunilJB · March 20, 2020, 4:25am

Hi,

Based on the logs snapshot, it’s seems to be trying to allocate unrealistic amount of memory:
“Requested amount of memory (18446744065119617096 bytes) could not be allocated.”

Without script to reproduce/debug the issue, I can suggest following things:

Use cudaMemcheck CUDA-MEMCHECK :: CUDA Toolkit Documentation to root cause any memory issue in code
Can you try running the code on clean system?
Can you try running “sudo systemctl stop lightdm” and try again? That should free up some system memory by stopping the graphical display.

Thanks

ncxinhanzhong · March 20, 2020, 5:26am

Seems like the error has nothing to do with gpu memory.

And I have monitored both memory and GPU memory usage while program running, it was far from full.

Topic		Replies	Views
Building TensorRT int8 engine fails TensorRT	1	332	January 20, 2021
../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory) TensorRT tensorrt , cuda , onnx	3	2809	March 5, 2021
Onnx to tennorrt engin quantization failing with _Map_base::at during assignment of tensor scales TensorRT cudnn	2	451	June 26, 2024
Tenssorrt INT8 precision engine build failed for the models having custom layer (BatchedNMSDynamic_TRT) TensorRT	11	1921	June 29, 2021
Trtexec convert onnx to trt error TensorRT cudnn	1	41	August 30, 2024
ONNX Model INT8 Engine Build TensorRT tensorrt , jetson-inference , calibration , onnx	3	1959	July 26, 2022
CUDA_ERROR_OUT_OF_MEMORY: out of memory cuDNN cuda , tensorflow , windows-driver	1	1775	July 31, 2023
[E] [TRT] C:\source\rtSafe\safeRuntime.cpp (25) - Cuda Error in nvinfer1::internal::DefaultAllocator::allocate: 2 (out of memory) TensorRT	6	2840	August 1, 2022
Error when converting onnx model to tensorrt on colab TensorRT	3	743	August 10, 2023
TensorRT8 INT8 (signed char) I/O interface for ONNX model TensorRT tensorrt , onnx	4	1365	February 15, 2022

Int8 Calibration Failed on custom layers

Related topics