RAM memory leak when running Deep Learning inference

Hello,

I’m having a problem with RAM memory when running inference on platform: DRIVE AGX Developer Kit E3550.

The platform has ubuntu 16.04 flashed, and has installed components:
- CUDA 10.0
- OpenCV 3.4.0
- Tensorflow 1.11.0
- Keras 2.2.4

I am using the U-Net model implemented in Keras framework. I tried to run inference on platform. I have to note that I have already implemented the inference successfully on my host machine with NVIDIA GeForce GTX 1050 4GB GDDR5 graphics card.

I’m using the command:

tegrastats

to watch global RAM memory usage on platform. The usage before executing the program was 930/24746MB. On first try the program executed well. Just like on my host machine, the program should free the memory with the garbage collector. To insure that the garbage collector works I used the gc library. But unlike my host machine where the memory is free completely, the platform shows usage of 4029/24746MB at the end of program. I have tried to kill every python process, but the usage stayed the same. After the program I tried to run it one more time, but I get an error:

Segmentation fault

I also tried to run some CUDA 10.0 samples on platform, but got the same error again. Monitor, keyboard and mouse completely frozen. The only way I found for the platform to function again is to reset the platform. After resetting, the memory usage is 930/24746MB.

I also forgot to say that the model requires 3GB of GPU RAM memory to function.
I could not resolve this problem, can you please give me some directions?

Hi,

we cannot guarantee you all TF will work on our platforms since TF is not optimized for it and has many memory bugs.

  • Fabian