Reduce TensorFlow GPU usage

AntoShan001 · May 8, 2019, 5:42am

I want to reduce the memory usage of the RAM, hence I tried

from imageai.Prediction.Custom import CustomImagePrediction
import tensorflow as tf
import cv2

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.192)
sess = tf.Session(config = tf.ConfigProto(gpu_options = gpu_options))

prediction = CustomImagePrediction()
prediction.setModelTypeAsDenseNet()
prediction.setModelPath(“Densenet.h5”)
prediction.setJsonPath(“model_class.json”)
prediction.loadModel(num_objects=5)

I Got

2019-05-08 10:42:12.682317: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-05-08 10:42:12.682515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 4.31GiB
2019-05-08 10:42:12.682572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-08 10:42:13.556847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-08 10:42:13.556958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-08 10:42:13.556995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-08 10:42:13.557176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1506 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-05-08 10:42:54.853030: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-08 10:42:54.853191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-08 10:42:54.853229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-08 10:42:54.853252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-08 10:42:54.853500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1506 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)

I closed the session then ran again

2019-05-08 10:56:09.342301: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-05-08 10:56:09.342495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 3.83GiB

Seems like some block of memory is not released.
When I ran the model after boot, i had “lfb ~1600x4MB”, when I run the model third time “lfb 622x4MB”

Whats the best way to restrict imageai library to use less RAM or the RAM we declar ?

Thank you

AastaLLL · May 8, 2019, 9:11am

Hi,

This issue is from TensorFlow.

By default TensorFlow allocates GPU memory for the lifetime of the process, not the lifetime of the session object.
So please exit the Python interpreter if you want the memory to be freed.

You can find more detail in this issue:
[url]https://github.com/tensorflow/tensorflow/issues/17048[/url]

Thanks.

AntoShan001 · May 8, 2019, 9:36am

hi,
I did free the Python interpreter (ctrl + c); even after that some block of memory is not released.

Even if i close everything on the Display, the TX2 is consuming RAM 2175/7846 MB (lfb 657x4MB)

Not sure what is consuming this much of memory !!

I have attached current Processes

AastaLLL · May 16, 2019, 7:46am

Hi,

I tried but doesn’t meet the memory issue as you do.
May I know the memory difference before/after running the TensorFlow?

A possible cause is that if the momory occupied by TF is GPU memory.
It may not be returned to the global memory directly.
Sometime it will be kept in a system memory pool to accelerate next GPU allocation.

You can check the buffed memory amount with this command:

$ sudo cat /sys/kernel/debug/nvmap/pagepool/page_pool_available_pages

This

AntoShan001 · May 16, 2019, 8:40am

------------ Before run the python script which has the model code -----------

~/tegrastats
RAM 1051/7846MB (lfb 1507x4MB) CPU [9%@345,1%@345,0%@343,12%@345,2%@345,12%@345] EMC_FREQ 24%@102 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 0% BCPU@40C MCPU@40C GPU@39C PLL@40C Tboard@37C Tdiode@39.75C PMIC@100C thermal@39.8C VDD_IN 1715/2042 VDD_CPU 304/451 VDD_GPU 152/158 VDD_SOC 381/472 VDD_WIFI 95/71 VDD_DDR 248/371

sudo cat /sys/kernel/debug/nvmap/pagepool/page_pool_available_pages
5312

------------ While the model loaded ----------

~/tegrastats
RAM 3478/7846MB (lfb 816x4MB) CPU [14%@346,0%@345,1%@345,21%@345,14%@345,13%@344] EMC_FREQ 17%@102 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 0% BCPU@42C MCPU@42C GPU@41.5C PLL@42C Tboard@39C Tdiode@42C PMIC@100C thermal@42.1C VDD_IN 1791/1899 VDD_CPU 304/314 VDD_GPU 152/173 VDD_SOC 457/467 VDD_WIFI 95/152 VDD_DDR 287/300

sudo cat /sys/kernel/debug/nvmap/pagepool/page_pool_available_pages
4688

python script says
2019-05-16 13:37:52.007770: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-05-16 13:37:52.008012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 5.52GiB
2019-05-16 13:37:52.008099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-16 13:37:53.617940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-16 13:37:53.618049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-16 13:37:53.618083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-16 13:37:53.618319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1506 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-05-16 13:38:32.591647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-16 13:38:32.591822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-16 13:38:32.591862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-16 13:38:32.591904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-16 13:38:32.592027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1506 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)

---------- After Close python script and terminal----------

~/tegrastats
RAM 2105/7846MB (lfb 1012x4MB) CPU [4%@343,0%@345,0%@345,13%@345,9%@345,17%@345] EMC_FREQ 18%@102 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 0% BCPU@42C MCPU@42C GPU@41C PLL@42C Tboard@39C Tdiode@41.5C PMIC@100C thermal@41.6C VDD_IN 1715/1708 VDD_CPU 304/304 VDD_GPU 228/159 VDD_SOC 457/457 VDD_WIFI 38/73 VDD_DDR 267/256

sudo cat /sys/kernel/debug/nvmap/pagepool/page_pool_available_pages
248736

------------ Re-run the python Script ----------

~/tegrastats
RAM 3488/7846MB (lfb 809x4MB) CPU [12%@345,0%@345,0%@345,14%@345,9%@345,9%@346] EMC_FREQ 22%@102 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 0% BCPU@43C MCPU@43C GPU@42.5C PLL@43C Tboard@40C Tdiode@42.75C PMIC@100C thermal@42.8C VDD_IN 1715/1698 VDD_CPU 304/304 VDD_GPU 152/162 VDD_SOC 457/457 VDD_WIFI 57/49 VDD_DDR 267/256

sudo cat /sys/kernel/debug/nvmap/pagepool/page_pool_available_pages
0

python script says
2019-05-16 13:46:47.998747: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-05-16 13:46:47.998936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 4.41GiB
2019-05-16 13:46:47.998991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-16 13:46:48.895423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-16 13:46:48.895538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-16 13:46:48.895604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-16 13:46:48.895795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1506 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-05-16 13:47:30.693801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-16 13:47:30.693950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-16 13:47:30.693984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-16 13:47:30.694012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-16 13:47:30.694136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1506 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)

AntoShan001 · May 16, 2019, 9:49am

Hi AastaLLL,

“It may not be returned to the global memory directly.” What do i do in this case ?

Seems like around 400 lfb is missing !!

AastaLLL · May 17, 2019, 6:39am

Hi,

This is Jetson specific mechanism.

You won’t see the GPU memory go back to the “available memory” immediately.
Sometime, it will be blocked in the memory buffer for the next allocation.

But this won’t cause an issue.
Once the system is running out of memory, the buffer will release all the memory back to “available memory”.

Thanks.

AntoShan001 · May 20, 2019, 7:09am

Hi AastaLLL,

Below is my code to reduce memory consumption of GPU

[b]from imageai.Prediction.Custom import CustomImagePrediction
import tensorflow as tf
import cv2

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.192)
sess = tf.Session(config = tf.ConfigProto(gpu_options = gpu_options))

prediction = CustomImagePrediction()
prediction.setModelTypeAsDenseNet()
prediction.setModelPath(“Densenet.h5”)
prediction.setJsonPath(“model_class.json”)
prediction.loadModel(num_objects=5)[/b]

Since imageAI build on tensorflow, it invokes another session as well.

Is there any efficient way that i could invoke the tensor-flow session once and make the imageAI to use limited GPU?

AastaLLL · May 28, 2019, 6:50am

Hi,

Could you try if decreases the workspace size helps?

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    <b>max_workspace_size_bytes=1 << 20</b>,
    precision_mode='FP16',
    minimum_segment_size=50
)

If not, it’s recommended to use pure TensorRT instead.

Thanks.

AntoShan001 · June 20, 2019, 3:48am

Hi AastaLL,

Sorry for late reply.

Thank you for this information. I’ll give it a try.

Topic		Replies	Views
ResourceExhaustedError: Running TF-TRT integration on Jetson AGX Jetson AGX Xavier	10	1177	October 18, 2021
Tensorflow Memory Error Jetson TX2	25	15343	October 18, 2021
Memory for GPU so small? Jetson TX2	14	4640	October 18, 2021
General Question about Jetsons GPU/CPU Shared Memory Usage Jetson TX2	35	7608	October 18, 2021
GPU out of memory when the total ram usage is 2.8G Jetson TX2	28	18657	October 18, 2021
Allocator (GPU_0_bfc) ran out of memory trying to allocate 325.33MiB with freed_by_count=0 Jetson Nano tensorflow , tf-trt , gpu	2	7460	October 15, 2021
object detection failed to run on TX2, based on tensorflow/modesl Jetson TX2	14	2109	October 18, 2021
Faster R-CNN: too many resources requested for launch Jetson TX2	27	7222	September 14, 2018
Tensorflow not using GPU in Jetson TX2 Jetson TX2	12	4343	October 18, 2021
failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED Jetson TX2	10	1271	March 1, 2018

Reduce TensorFlow GPU usage

Related topics