RAM memory leak when running Deep Learning inference

srdjan.usorac · December 14, 2018, 12:25pm

Hello,

I’m having a problem with RAM memory when running inference on platform: DRIVE AGX Developer Kit E3550.

The platform has ubuntu 16.04 flashed, and has installed components:
- CUDA 10.0
- OpenCV 3.4.0
- Tensorflow 1.11.0
- Keras 2.2.4

I am using the U-Net model implemented in Keras framework. I tried to run inference on platform. I have to note that I have already implemented the inference successfully on my host machine with NVIDIA GeForce GTX 1050 4GB GDDR5 graphics card.

I’m using the command:

tegrastats

to watch global RAM memory usage on platform. The usage before executing the program was 930/24746MB. On first try the program executed well. Just like on my host machine, the program should free the memory with the garbage collector. To insure that the garbage collector works I used the gc library. But unlike my host machine where the memory is free completely, the platform shows usage of 4029/24746MB at the end of program. I have tried to kill every python process, but the usage stayed the same. After the program I tried to run it one more time, but I get an error:

Segmentation fault

I also tried to run some CUDA 10.0 samples on platform, but got the same error again. Monitor, keyboard and mouse completely frozen. The only way I found for the platform to function again is to reset the platform. After resetting, the memory usage is 930/24746MB.

I also forgot to say that the model requires 3GB of GPU RAM memory to function.
I could not resolve this problem, can you please give me some directions?

FabianWeise · December 18, 2018, 11:23am

Hi,

we cannot guarantee you all TF will work on our platforms since TF is not optimized for it and has many memory bugs.

Fabian