Tensorflow Memory Error

Hello, seeming to have an error when running Tensorflow based models. From general google searches it seems that this is a GPU memory issue, however none of the fixes for other architectures worked.

I am running the TX2 with the latest Jetpack (3.1):
#define CUDNN_MAJOR 6
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 21
Cuda compilation tools, release 8.0, V8.0.72
bazel-0.5.1

import tensorflow as tf
tf.version
‘1.2.1’

and this is the error:

2017-08-09 11:02:04.230621: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:879] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2017-08-09 11:02:04.230736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 3.24GiB
2017-08-09 11:02:04.230790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-08-09 11:02:04.230815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-08-09 11:02:04.230856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2017-08-09 11:02:04.230891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:642] Could not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2017-08-09 11:02:05.303591: E tensorflow/stream_executor/cuda/cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-08-09 11:02:05.303663: E tensorflow/stream_executor/cuda/cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-08-09 11:02:05.303694: F tensorflow/core/kernels/conv_ops.cc:671] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
Aborted (core dumped)

I have tried previous version of Jetpack as well as tensorflow with the same error, and have tried completely separate Tensorflow models.

Hi,

Please try to enable following options:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

Thanks.

Nope, no dice unfortunately, with much the same error.

I also previously tried:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
...

With various differing values of both high and low and received the same error.

Hi,

Two things want to confirm first:

1. Do you follow the steps below to build your TX2?

2. Does this error also occur with MNIST?

We want to confirm this is a resource issue(memory…) or frameworks(build architecture…) issue first.
Thanks.

  1. The first build i followed that but using the JetsonHacks scripts based on github (basically those instructions), however for the tf 1.2 i followed Andrey1984’s instructions on this post https://devtalk.nvidia.com/default/topic/1016294/tensorflow-1-2-0-gpu-on-tx2/

and the build prompts on this https://gist.github.com/csarron/a265280010faeecae3e8c204c5749a67

  1. No i havent seen this error on minst, or other run examples. (Simple tests such as at the bottom of the link you posted run correctly with no errors.)

Hi,

NUMA is for multi-GPU.
It looks like your model wants to enable multi-GPU options.
But NUMA is turn off (by default) when building, and TX2 only have one GPU.

Could you try to disable the related option in your source and check it again?

Rebuilt tensorflow following the instructions you provided based on disabling the NUMA node. It didnt work, even tried with original and the additional config. options.

python detect.py h786poj.jpg weights.npz out3.jpg
[[ 0.40392157  0.44313725  0.4627451  ...,  0.17254902  0.17647059
   0.18431373]
 [ 0.42352941  0.45882353  0.47058824 ...,  0.17647059  0.18431373
   0.19215686]
 [ 0.44705882  0.47058824  0.47843137 ...,  0.18431373  0.19215686
   0.20392157]
 ..., 
 [ 0.58431373  0.58431373  0.59215686 ...,  0.54117647  0.5254902
   0.50588235]
 [ 0.58823529  0.57647059  0.56470588 ...,  0.52941176  0.51764706
   0.49411765]
 [ 0.59215686  0.56862745  0.54117647 ...,  0.5254902   0.51372549
   0.49411765]]
2017-08-14 12:41:15.428733: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:856] ARM has no NUMA node, hardcoding to return zero
2017-08-14 12:41:15.428852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 4.59GiB
2017-08-14 12:41:15.428947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-08-14 12:41:15.428998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-08-14 12:41:15.429024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2017-08-14 12:41:15.554341: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-08-14 12:41:15.554406: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 4 visible devices
2017-08-14 12:41:15.555134: I tensorflow/compiler/xla/service/service.cc:198] XLA service 0x30c2d40 executing computations on platform Host. Devices:
2017-08-14 12:41:15.555184: I tensorflow/compiler/xla/service/service.cc:206]   StreamExecutor device (0): <undefined>, <undefined>
2017-08-14 12:41:15.555849: I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices
2017-08-14 12:41:15.555894: I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 4 visible devices
2017-08-14 12:41:15.556610: I tensorflow/compiler/xla/service/service.cc:198] XLA service 0x3112ff0 executing computations on platform CUDA. Devices:
2017-08-14 12:41:15.556652: I tensorflow/compiler/xla/service/service.cc:206]   StreamExecutor device (0): NVIDIA Tegra X2, Compute Capability 6.2
2017-08-14 12:41:18.518643: E tensorflow/stream_executor/cuda/cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-08-14 12:41:18.518718: E tensorflow/stream_executor/cuda/cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-08-14 12:41:18.518749: F tensorflow/core/kernels/conv_ops.cc:671] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
Aborted (core dumped)

Hi,

Thanks for the testing. It looks like more complicated than a building issue.
Could you tell me how to reproduce this issue? Do you use a public GitHub code or could you share your source?

Thanks.

Hey,

Yeah sure, its using this as standard

https://github.com/matthewearl/deep-anpr

. Happy to share weights if that is needed. The training works correctly, it is just running the detect.py where the issue is present.

Hi,

Yes, please share weight for us.
Thanks.

Hi,

If you have tensorflow x86 environment. Could you also help us give it a try?
Thanks.

Here is the weights on google drive:

https://drive.google.com/file/d/0B3TAQ6gwtBNmZVVDTnAySHB3eU0/view?usp=sharing

Thanks.

We will try to reproduce this issue, and update more information to you later.

Hi,

Good news!
I can run deep_anpr sample with this whl (JetPack3.1):

sudo pip install tensorflow-1.3.0rc0-cp27-cp27mu-linux_aarch64.whl sudo reboot
cd [deep_anpr folder] ./detect.py in.jpeg weights.npz out.jpg

Thanks.

Hmmm hasnt appeared to work on this jetson, with a slightly different error in the last line:

2017-08-21 10:07:38.406485: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:879] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2017-08-21 10:07:38.406598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 5.67GiB
2017-08-21 10:07:38.406654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-08-21 10:07:38.406683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-08-21 10:07:38.406708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2017-08-21 10:07:38.406740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:657] Could not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2017-08-21 10:07:39.100041: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-08-21 10:07:39.100116: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-08-21 10:07:39.100146: F tensorflow/core/kernels/conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
Aborted (core dumped)

Weirdly enough our other jetson is working following the previous identical install instructions and versions. I think out next step will be to reflash the non-working jetson and attempt to reinstall from scratch to ensure nothing was done differently.

Hi,

The link in comment #14 is built with cuDNNv6. Please flash device with JetPack3.1.
Thanks.

Hi,

The link in comment #14 is built with cuDNNv6. Please flash device with JetPack3.1.
Thanks.

Reflashed the jetson and installed tensorflow with that .whl and it worked!

Thanks for the help.

Hi AastaLLL,

I have the same issue with https://github.com/igul222/improved_wgan_training
when you try gan_mnist.py

python gan_mnist.py

I got this…

nvidia@tegra-ubuntu:~/improved_wgan_training$ python gan_mnist.py 
Uppercase local vars:
	BATCH_SIZE: 50
	CRITIC_ITERS: 5
	DIM: 64
	ITERS: 200000
	LAMBDA: 10
	MODE: wgan-gp
	OUTPUT_DIM: 784
2017-10-17 05:31:07.423028: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-10-17 05:31:07.423174: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 2.72GiB
2017-10-17 05:31:07.423231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-10-17 05:31:07.423259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-10-17 05:31:07.423286: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py:175: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
2017-10-17 05:31:12.753831: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-10-17 05:31:12.753904: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-10-17 05:31:12.753941: F tensorflow/core/kernels/conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
Aborted (core dumped)

and I try #14, but it not working because I already install with same version

nvidia@tegra-ubuntu:~/tensorflow-tx2$ sudo pip install tensorflow-1.3.0-cp27-cp27mu-linux_aarch64.whl
The directory '/home/nvidia/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/nvidia/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already satisfied: tensorflow==1.3.0 from file:///home/nvidia/tensorflow-tx2/tensorflow-1.3.0-cp27-cp27mu-linux_aarch64.whl in /usr/local/lib/python2.7/dist-packages
Requirement already satisfied: protobuf>=3.3.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow==1.3.0)
Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow==1.3.0)
Requirement already satisfied: tensorflow-tensorboard<0.2.0,>=0.1.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow==1.3.0)
Requirement already satisfied: wheel in /usr/lib/python2.7/dist-packages (from tensorflow==1.3.0)
Requirement already satisfied: backports.weakref>=1.0rc1 in /usr/local/lib/python2.7/dist-packages (from tensorflow==1.3.0)
Requirement already satisfied: numpy>=1.11.0 in /usr/lib/python2.7/dist-packages (from tensorflow==1.3.0)
Requirement already satisfied: mock>=2.0.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow==1.3.0)
Requirement already satisfied: setuptools in /usr/local/lib/python2.7/dist-packages (from protobuf>=3.3.0->tensorflow==1.3.0)
Requirement already satisfied: werkzeug>=0.11.10 in /usr/local/lib/python2.7/dist-packages (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow==1.3.0)
Requirement already satisfied: html5lib==0.9999999 in /usr/local/lib/python2.7/dist-packages (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow==1.3.0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python2.7/dist-packages (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow==1.3.0)
Requirement already satisfied: bleach==1.5.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow-tensorboard<0.2.0,>=0.1.0->tensorflow==1.3.0)
Requirement already satisfied: pbr>=0.11 in /usr/local/lib/python2.7/dist-packages (from mock>=2.0.0->tensorflow==1.3.0)
Requirement already satisfied: funcsigs>=1; python_version < "3.3" in /usr/local/lib/python2.7/dist-packages (from mock>=2.0.0->tensorflow==1.3.0)

Hi,

Could you run this command and share the result with us?

ll /home/nvidia/.cache/pip/

Please remember to flash TX2 with JetPack3.1.
This wheel file is built with the JetPack3.1 package.

Thanks.