Tensorflow-gpu using high system memory, which is the bottleneck

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 18.04): TX2 jetpack 4.5.1-b17
  • TensorFlow installed from (source or binary): source (tensorflow 2.6.0/2.3.1)
  • TensorFlow version (use command below): tensorflow 2.6.0 with cuda 10.2
  • CUDA/cuDNN version: CUDA Toolkit 10.2 / cuDNN 8.0
  • System memory: 8GB

I use tensorflow with cuda for object inferencing. It usage the system memory as high as 2 GB. Whcih is botolnetcj for me. I am using c++ based inference code.

Tried the below option but it did not help.
options.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(0.2);
options.config.mutable_gpu_options()->set_allow_growth(true);
options.config.mutable_gpu_options()->force_gpu_compatible();
options.config.set_inter_op_parallelism_threads(1);
options.config.set_intra_op_parallelism_threads(1);
options.config.set_use_per_session_threads(false);

Opened a ticker on tensorflow but those guys are sayign it is the cuda which is takign huge memeory. when I run same code without cuda and on CPU the memory usages is reduced by 2.4 GB.

Let me know how to overcome this issue.

Hi,

It’s a known issue from TensorFlow implementation

Our suggestion is to use other edge-friendly frameworks like TensorRT.
In general, TensorFlow will occupy 2x~3x memory compared to the TensorRT.

Thanks.

Thanks for the reply. We will definitely try tensorRT. I have one follow up question :
When we posted this issue to tensorflow, they pointed that CUDA libraries themselves are 2.3 GB in size. And hence this memory consumption. Does it mean that all CUDA libraries need to be loaded at once into memory, to make GPU functional ?

Hi,

The answer should be yes.
TensorFlow uses cuDNN to implement GPU inference function, which requires lots of memory.
TensorRT also has a similar problem but slight due to its optimization.

However, in our latest TensorRT release (v8.0 in JetPack4.6)
We provide an alternative for user to deploy a model with cuBLAS instead of cuDNN to save memory.

Thanks.