CPU RAM consumption during TensorRT calibration process

orong13 · March 10, 2019, 6:30am

Hello,

Problem description:
I’m using the TensorRT (TR) C++ APIs in order to inference a FP 32bits CNN model (YoloV3) that was developed and trained using Tensorflow and TensorRT Python APIs.

I wanted to convert it to Int8 model using the TRT calibrator.

I have a set of ~9000 typical pictures that I’m using for the quantization process.
I divided them to set of batches in such way that each batch include 60 pictures.

Everything is working fine but I noticed that during the operation of the buildCudaEngine API which perform the calibration process using all previously created batches, the CPU RAM consumption is increase and increase till my all RAM is occupied which cause to a crush scenario of the entire PC.

I have 24GB RAM but due to this behavior I cannot use more than ~9000 pictures for the calibration process which limited my calibration table accuracy.

Is there any way to control this issue?

My platform:
Linux distro and version - Linux-x86_64, Ubuntu, 16.04
GPU type - GeForce GTX 1080
nvidia driver version - 396.26
CUDA version - Release 9.0, V9.0.252
CUDNN version - 7.1.4
Python version – 3.5.2
Tensorflow version – 1.8
TensorRT version – 4.0.1.6

NVES · March 11, 2019, 1:50am

We are triaging and will keep you updated.

NVES · March 12, 2019, 9:27pm

Hello,

To help us debug can you share a repro that demonstrates the OOM issue you are seeing?
Per engineering, TRT calibrator does not use host memory.

orong13 · March 15, 2019, 6:58pm

Hello,
Problem solved.

Thanks to your statement that the TRT calibrator does not use host memory i realized that probably the problem source related to my implementation of the getBatch service.

I checked again my implementation of the getBatch service and I found that I didn’t release the host memory that was allocated while I loaded the current batch from the disk before I loaded it to the GPU memory.

After I added the batch host memory release command and run again the calibration process via buildCudaEngine API the problem solved.

Thanks,