Python code using tensorflow and cuda on Jetson TX2 is getting killed (logs below)

jwasan · August 2, 2019, 9:25pm

Could you please recommend what should be done based on the logs below:

2019-08-02 14:09:08.128243: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
WARNING: Logging before flag parsing goes to stderr.
2019-08-02 14:09:27.986443: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-08-02 14:09:28.036358: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-08-02 14:09:28.036555: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.02
pciBusID: 0000:00:00.0
2019-08-02 14:09:28.036653: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-02 14:09:28.036792: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-08-02 14:09:28.036910: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-08-02 14:09:28.073906: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-08-02 14:09:28.113267: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-08-02 14:09:28.139226: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-08-02 14:09:28.217861: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-08-02 14:09:28.218581: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-08-02 14:09:28.219103: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-08-02 14:09:28.219248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-02 14:09:28.241159: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-08-02 14:09:28.242110: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x8efae50 executing computations on platform Host. Devices:
2019-08-02 14:09:28.242170: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
2019-08-02 14:09:28.335767: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-08-02 14:09:28.336613: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x8eefe50 executing computations on platform CUDA. Devices:
2019-08-02 14:09:28.337567: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): NVIDIA Tegra X2, Compute Capability 6.2
2019-08-02 14:09:28.344185: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-08-02 14:09:28.345441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.02
pciBusID: 0000:00:00.0
2019-08-02 14:09:28.345867: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-02 14:09:28.346247: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-08-02 14:09:28.346485: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-08-02 14:09:28.346846: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-08-02 14:09:28.347169: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-08-02 14:09:28.347463: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-08-02 14:09:28.347712: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-08-02 14:09:28.349084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-08-02 14:09:28.350704: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-08-02 14:09:28.351229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-02 14:09:28.351946: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-02 14:09:33.653791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-02 14:09:33.653950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-08-02 14:09:33.654000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-08-02 14:09:33.655143: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-08-02 14:09:33.656047: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero
2019-08-02 14:09:33.656393: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1700 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
W0802 14:09:34.108056 548108099600 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py:1354: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Killed

jwasan · August 2, 2019, 9:33pm

JFYI,
Cuda Test Passed (Version 10.0) using Samples
Tensorflow version 1.14.0
OpenCV version 3.4.0
Total Available space locally on Jetson = 1.4-1.8GB
The video I am working on is 3.6GB and is placed on an external HDD

linuxdev · August 3, 2019, 5:18pm

That’s the out of memory kill. Even if you have swap some operations require physical RAM, but swap may still help since those operations not requiring RAM could swap. I couldn’t tell you how in Python to change the number of threads, but often using fewer threads implies less memory.

To demonstrate you could install htop (“sudo apt-get install htop”) and then watch memory as your program progresses. You’ll find memory use going up until the kill hits near the limits.

jwasan · August 3, 2019, 8:53pm

Yes I already did that. The gpu memory is getting overflowed and hence the kill.
I would like to know how can i reduce the number of threads in my system for this code.

alex.sack · August 4, 2019, 10:42am

I doubt the number of threads is the issue though…

Without sharing the code it’s really hard to say but I would start looking at batch size if this is a DL model. That is typically the #1 cause of OOM kills.

jwasan · August 4, 2019, 10:13pm

Yes it is a Deep Learning Model and when I checked on htop, it displayed all the 6 GPUs being filled and the memory crossing the max limit.
Can you guide me through how can I reduce the batch size of my DL model?

AastaLLL · August 13, 2019, 8:52am

Hi,

TX2 only has one GPU.
To reduce memory, you can try to set the per_process_gpu_memory_fraction flag.

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

Thanks.

agmanso · April 1, 2020, 7:48pm

Hi,
I’m having the same problem. I’m trying to convert a model trained with tensorflow to onnx with tf2onnx, for then convert it to tensorrt. And, excuse me, but I’m no an expert, I don’t know where I have to put this:
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, …)