Difference of memory usage at each GPU model during tensorflow c++ inference

woojinn8 · November 5, 2019, 4:22pm

Hello everyone

I am train and freeze tensorflow graph at python and inference at tensorflow c++ api on windows 10.
During test, I figure out that GPU memory usage is different by gpu model.

I tested 5 gpus (1060(6GB), 1080Ti, 1660Ti, 2070, 2080Ti)
Test method is simple.
First, I mount single gpu and install driver.
And I run my c++ inference code.
I use same code and model(freezed graph pb file)
Finally, I check difference of gpu memory usage between before and after run c++ inference code

I use tensorflow gpu option allow_growth=true

I tested few times at every gpus.
Result of average memory usage at tensorflow is followed

1060 : 397 MB
1080Ti : 481 MB
1660Ti : 621 MB
2070 : 644 MB
2080Ti : 712 MB

This is happen when i used simple 2 layer FC layer for mnist example graph.

Why is this different?
Is this because of nvidia configuration or tensorflow?

I want to use small memory usage regardless of gpu model
Please help me.

nluehr · November 8, 2019, 6:10pm

There are several factors contributing to the the overall TF memory footprint, including:

The CUDA driver creates per-core contexts in device memory with space for things like stack memory and thread-local storage. This overhead will increase linearly with the number of CUDA cores on the GPU. (This is likely the dominant reason the 1080Ti has a larger memory footprint than the 1060.)
CUDNN and cuBLAS provide optimized kenrels for convolution and matrix multiple routines. Some of these require additional workspace allocations and are only available on some architectures (Turing provides TensorCore kernels, Pascal does not, for example).
The TensorFlow allocator grabs chunks of memory at a time, so falling slightly over a memory limit may result in a larger increase in allocated space then expected.

In general, allow_growth is not a good way to limit overall device memory consumption because it does not provide any information about what footprint is acceptable. Without that info, the framework will optimize for speed (by choosing fast but memory-hungry algorithms). A better option is to set the per_process_gpu_memory_fraction config option.

woojinn8 · November 20, 2019, 1:30am

Thank you for detail explanation.

And I have additional question.
In answer 2), Can I unable the optimized kernels for convolution and matrix multiple routines at CUDNN and cuBLAS?
Because I use Turing architectures GPU.

Is there anyway to edit CUDNN or CUBLAS to not use the optimized kernels?
I try to find the way, but i couldn’t

nluehr · November 20, 2019, 4:28pm

TensorFlow will attempt to optimize for speed within an allowed memory footprint. You can reduce that memory footprint by setting the per_process_gpu_memory_fraction instead of allow_growth.

Topic		Replies	Views
Tensorflow-gpu using high system memory, which is the bottleneck Jetson TX2 cuda , tensorflow	4	730	October 18, 2021
Same program Different memory usage on Different GPU CUDA Programming and Performance	2	1151	April 2, 2022
Memory of GPU increase so much when changing from 1080ti to 2080ti Frameworks (archived) tensorflow	0	413	June 23, 2020
New TensorRT Model occupying more GPU Memory as compared to older version TensorRT tensorrt , tensorflow , gpu	8	2084	August 20, 2021
The same model consumes different sizes of GPU memory in different GPU TensorRT	8	1838	August 8, 2022
General Question about Jetsons GPU/CPU Shared Memory Usage Jetson TX2	35	7904	October 18, 2021
Official Tensorflow uses all RAM when running on GPU Jetson Nano	7	2352	October 18, 2021
Unable to utilize all GPU memory when using tensorflow, failed to alloate memory CUDA Programming and Performance	1	1153	October 8, 2018
cuDNN 7.4.2.24 with CUDA 10.0.130 under Windows with C++ Tensorflow API cuDNN	0	601	April 7, 2019
Failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY: out of memory Frameworks (archived) cuda , tensorflow	1	2994	April 22, 2021

Difference of memory usage at each GPU model during tensorflow c++ inference

Related topics