Reduce TensorFlow GPU usage

I want to reduce the memory usage of the RAM, hence I tried

from imageai.Prediction.Custom import CustomImagePrediction
import tensorflow as tf
import cv2

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.192)
sess = tf.Session(config = tf.ConfigProto(gpu_options = gpu_options))

prediction = CustomImagePrediction()
prediction.setModelTypeAsDenseNet()
prediction.setModelPath(“Densenet.h5”)
prediction.setJsonPath(“model_class.json”)
prediction.loadModel(num_objects=5)

I Got

2019-05-08 10:42:12.682317: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-05-08 10:42:12.682515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 4.31GiB
2019-05-08 10:42:12.682572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-08 10:42:13.556847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-08 10:42:13.556958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-08 10:42:13.556995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-08 10:42:13.557176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1506 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-05-08 10:42:54.853030: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-08 10:42:54.853191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-08 10:42:54.853229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-08 10:42:54.853252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-08 10:42:54.853500: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1506 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)

I closed the session then ran again

2019-05-08 10:56:09.342301: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-05-08 10:56:09.342495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 3.83GiB

Seems like some block of memory is not released.
When I ran the model after boot, i had “lfb ~1600x4MB”, when I run the model third time “lfb 622x4MB”

Whats the best way to restrict imageai library to use less RAM or the RAM we declar ?

Thank you

Hi,

This issue is from TensorFlow.

By default TensorFlow allocates GPU memory for the lifetime of the process, not the lifetime of the session object.
So please exit the Python interpreter if you want the memory to be freed.

You can find more detail in this issue:
[url]https://github.com/tensorflow/tensorflow/issues/17048[/url]

Thanks.

hi,
I did free the Python interpreter (ctrl + c); even after that some block of memory is not released.

Even if i close everything on the Display, the TX2 is consuming RAM 2175/7846 MB (lfb 657x4MB)

Not sure what is consuming this much of memory !!

I have attached current Processes

Hi,

I tried but doesn’t meet the memory issue as you do.
May I know the memory difference before/after running the TensorFlow?

A possible cause is that if the momory occupied by TF is GPU memory.
It may not be returned to the global memory directly.
Sometime it will be kept in a system memory pool to accelerate next GPU allocation.

You can check the buffed memory amount with this command:

$ sudo cat /sys/kernel/debug/nvmap/pagepool/page_pool_available_pages

This

------------ Before run the python script which has the model code -----------

~/tegrastats
RAM 1051/7846MB (lfb 1507x4MB) CPU [9%@345,1%@345,0%@343,12%@345,2%@345,12%@345] EMC_FREQ 24%@102 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 0% BCPU@40C MCPU@40C GPU@39C PLL@40C Tboard@37C Tdiode@39.75C PMIC@100C thermal@39.8C VDD_IN 1715/2042 VDD_CPU 304/451 VDD_GPU 152/158 VDD_SOC 381/472 VDD_WIFI 95/71 VDD_DDR 248/371

sudo cat /sys/kernel/debug/nvmap/pagepool/page_pool_available_pages
5312

------------ While the model loaded ----------

~/tegrastats
RAM 3478/7846MB (lfb 816x4MB) CPU [14%@346,0%@345,1%@345,21%@345,14%@345,13%@344] EMC_FREQ 17%@102 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 0% BCPU@42C MCPU@42C GPU@41.5C PLL@42C Tboard@39C Tdiode@42C PMIC@100C thermal@42.1C VDD_IN 1791/1899 VDD_CPU 304/314 VDD_GPU 152/173 VDD_SOC 457/467 VDD_WIFI 95/152 VDD_DDR 287/300

sudo cat /sys/kernel/debug/nvmap/pagepool/page_pool_available_pages
4688

python script says
2019-05-16 13:37:52.007770: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-05-16 13:37:52.008012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 5.52GiB
2019-05-16 13:37:52.008099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-16 13:37:53.617940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-16 13:37:53.618049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-16 13:37:53.618083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-16 13:37:53.618319: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1506 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-05-16 13:38:32.591647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-16 13:38:32.591822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-16 13:38:32.591862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-16 13:38:32.591904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-16 13:38:32.592027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1506 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)

---------- After Close python script and terminal----------

~/tegrastats
RAM 2105/7846MB (lfb 1012x4MB) CPU [4%@343,0%@345,0%@345,13%@345,9%@345,17%@345] EMC_FREQ 18%@102 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 0% BCPU@42C MCPU@42C GPU@41C PLL@42C Tboard@39C Tdiode@41.5C PMIC@100C thermal@41.6C VDD_IN 1715/1708 VDD_CPU 304/304 VDD_GPU 228/159 VDD_SOC 457/457 VDD_WIFI 38/73 VDD_DDR 267/256

sudo cat /sys/kernel/debug/nvmap/pagepool/page_pool_available_pages
248736

------------ Re-run the python Script ----------

~/tegrastats
RAM 3488/7846MB (lfb 809x4MB) CPU [12%@345,0%@345,0%@345,14%@345,9%@345,9%@346] EMC_FREQ 22%@102 GR3D_FREQ 0%@140 APE 150 MTS fg 0% bg 0% BCPU@43C MCPU@43C GPU@42.5C PLL@43C Tboard@40C Tdiode@42.75C PMIC@100C thermal@42.8C VDD_IN 1715/1698 VDD_CPU 304/304 VDD_GPU 152/162 VDD_SOC 457/457 VDD_WIFI 57/49 VDD_DDR 267/256

sudo cat /sys/kernel/debug/nvmap/pagepool/page_pool_available_pages
0

python script says
2019-05-16 13:46:47.998747: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2019-05-16 13:46:47.998936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 4.41GiB
2019-05-16 13:46:47.998991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-16 13:46:48.895423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-16 13:46:48.895538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-16 13:46:48.895604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-16 13:46:48.895795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1506 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-05-16 13:47:30.693801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-16 13:47:30.693950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-16 13:47:30.693984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-16 13:47:30.694012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-16 13:47:30.694136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1506 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)

Hi AastaLLL,

“It may not be returned to the global memory directly.” What do i do in this case ?

Seems like around 400 lfb is missing !!

Hi,

This is Jetson specific mechanism.

You won’t see the GPU memory go back to the “available memory” immediately.
Sometime, it will be blocked in the memory buffer for the next allocation.

But this won’t cause an issue.
Once the system is running out of memory, the buffer will release all the memory back to “available memory”.

Thanks.

Hi AastaLLL,

Below is my code to reduce memory consumption of GPU

[b]from imageai.Prediction.Custom import CustomImagePrediction
import tensorflow as tf
import cv2

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.192)
sess = tf.Session(config = tf.ConfigProto(gpu_options = gpu_options))

prediction = CustomImagePrediction()
prediction.setModelTypeAsDenseNet()
prediction.setModelPath(“Densenet.h5”)
prediction.setJsonPath(“model_class.json”)
prediction.loadModel(num_objects=5)[/b]

Since imageAI build on tensorflow, it invokes another session as well.

Is there any efficient way that i could invoke the tensor-flow session once and make the imageAI to use limited GPU?

Hi,

Could you try if decreases the workspace size helps?

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    <b>max_workspace_size_bytes=1 << 20</b>,
    precision_mode='FP16',
    minimum_segment_size=50
)

If not, it’s recommended to use pure TensorRT instead.

Thanks.

Hi AastaLL,

Sorry for late reply.

Thank you for this information. I’ll give it a try.