Memory Leak Jetson Nano

Hello,

I recognized pretty high used memory when executing anything with keras on my jetson Nano.

Since a while, I realised that when just calling (without actually using the model or anything else, just using this line of code) “model.load(“xyz.model”)”, it would take (to already 30% general usage) 35% additional memory consistently until i shut down the program.

This isn’t normal right? Does anybody has an idea what the problem could be?

I read somebody had similar issues when including wrong Paths. https://stackoverflow.com/questions/35757151/why-does-this-keras-model-require-over-6gb-of-memory

echo $PATH

"/usr/local/cuda-10.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"

echo $LD_LIBRARY_PATH

"/usr/local/cuda-10.0/lib64"

The Terminaloutput doesn’t help me at all, but maybe somebody sees somethign I am missing:

Using TensorFlow backend.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2020-01-31 14:58:17.405250: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2020-01-31 14:58:17.412536: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x2372c770 executing computations on platform Host. Devices:
2020-01-31 14:58:17.412596: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): <undefined>, <undefined>
2020-01-31 14:58:17.564130: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2020-01-31 14:58:17.564716: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x2027edc0 executing computations on platform CUDA. Devices:
2020-01-31 14:58:17.564774: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2020-01-31 14:58:17.566070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.86GiB freeMemory: 427.32MiB
2020-01-31 14:58:17.566137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-01-31 14:58:36.056005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-31 14:58:36.129107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2020-01-31 14:58:36.129256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2020-01-31 14:58:36.167318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 57 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.

Hi,

Based on your log, your Keras backend frameworks is TensorFlow.
It’s know that TensorFlow occupied lots of memory on the Jetson.

By the way, how do you install TensorFlow on your environment?
It’s recommended to use our package which can be found here:
https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html#prereqs

Thanks.

Okay thank you, good to know. But this high memory ussge is just since a few days and did not occur before.
Yes, i installed it the way that is recommended. I have no idea what could be the reasons for the change.

Hi Team
We do see similar memory leak when using Keras, tensor RT and darknet.
We initially suspected this issues is not to danknet but we do see similar problem across other framework as well.
The issue stated occur post we upgraded to Jetpack 4.3.

Thanks
Siva

Hi,

We do find a tiny memory leakage in cuDNN recently. Please check this topic for information:
https://forums.developer.nvidia.com/t/tensorrt-6-memory-leak/112142/

Do you think this is the same issue of yours?
If no, could you share a simple reproducible source with us?

Thanks.