CUDA_ERROR_LAUNCH_FAILED error when running TensorFlow mnist example

Hello!
I just recently got the Jetson TX2 developer kit, and I would really like to use TensorFlow on it. I followed JetsonHacks’ tutorials on installing, and I had no problems during install. I tried both this:

and this:

I have a fresh full install of Jetpack 3.1 on the board as well. I have tried python 2 and 3 with TensorFlow.

The issue I’m encountering only seems to occur when trying to make more sophisticated models with convolutional neural networks. If you are familiar with the TensorFlow examples, then I have been using “minst_softmax.py” without a problem, however, “mnist_deep.py” always outputs this in the terminal:

nvidia@tegra-ubuntu:~/Desktop/tensorflow-r1.3/tensorflow/examples/tutorials/mnist$ ./mnist_deep.py
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz
Saving graph to: /tmp/tmpyJvseo
2017-12-02 00:56:23.092487: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-02 00:56:23.092610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 5.76GiB
2017-12-02 00:56:23.092659: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-12-02 00:56:23.092684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y
2017-12-02 00:56:23.092710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
step 0, training accuracy 0.04
step 100, training accuracy 0.86
step 200, training accuracy 0.96
step 300, training accuracy 0.94
step 400, training accuracy 0.86
step 500, training accuracy 0.92
step 600, training accuracy 0.96
step 700, training accuracy 0.96
step 800, training accuracy 0.96
step 900, training accuracy 1
2017-12-02 00:57:17.461035: E tensorflow/stream_executor/cuda/cuda_driver.cc:1068] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED
2017-12-02 00:57:17.461146: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED
2017-12-02 00:57:17.461188: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED
Traceback (most recent call last):
  File "./mnist_deep.py", line 177, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "./mnist_deep.py", line 169, in main
    x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0}))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 541, in eval
    return _eval_using_default_session(self, feed_dict, self.graph, session)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4085, in _eval_using_default_session
    return session.run(tensors, feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: No algorithm worked!
     [[Node: conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](reshape/Reshape, conv1/Variable/read)]]
     [[Node: Mean_1/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_79_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
 
Caused by op u'conv1/Conv2D', defined at:
  File "./mnist_deep.py", line 177, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "./mnist_deep.py", line 138, in main
    y_conv, keep_prob = deepnn(x)
  File "./mnist_deep.py", line 64, in deepnn
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
  File "./mnist_deep.py", line 106, in conv2d
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 397, in conv2d
    data_format=data_format, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access
 
NotFoundError (see above for traceback): No algorithm worked!
     [[Node: conv1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](reshape/Reshape, conv1/Variable/read)]]
     [[Node: Mean_1/_7 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_79_Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
 
2017-12-02 00:57:17.738653: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED
2017-12-02 00:57:17.738769: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED
2017-12-02 00:57:17.738800: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED
2017-12-02 00:57:17.738824: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x3372070: CUDA_ERROR_LAUNCH_FAILED
nvidia@tegra-ubuntu:~/Desktop/tensorflow-r1.3/tensorflow/examples/tutorials/mnist$

Keep in mind, that I have only worked with sample files from Tensorflow, and therefore I would believe they work as intended, and that the error lies somewhere in my setup.

If any of you have any idea of what the fault might be, please let me know!

Thanks in advance.

1 Like

Hi,

W can run ./mnist_deep.py successfully.
Our environment is JetPack3.1 + cuDNNv7 + this TF wheel.

Could you also try this setting on your side?

nvidia@tegra-ubuntu:/media/nvidia/NVIDIA/tensorflow/tensorflow/examples/tutorials/mnist$ python mnist_deep.py 
Extracting /tmp/tensorflow/mnist/input_data/train-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/train-labels-idx1-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-images-idx3-ubyte.gz
Extracting /tmp/tensorflow/mnist/input_data/t10k-labels-idx1-ubyte.gz
Saving graph to: /tmp/tmpX6a0Wf
2017-12-04 02:44:00.519511: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:879] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2017-12-04 02:44:00.519625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 4.19GiB
2017-12-04 02:44:00.519671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-12-04 02:44:00.519697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-12-04 02:44:00.519722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2017-12-04 02:44:00.519752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:657] Could not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
step 0, training accuracy 0.08
step 100, training accuracy 0.9
step 200, training accuracy 0.84
step 300, training accuracy 0.92
step 400, training accuracy 0.96
step 500, training accuracy 0.92
step 600, training accuracy 0.98
step 700, training accuracy 0.92
step 800, training accuracy 0.98
step 900, training accuracy 0.96
step 1000, training accuracy 0.9
step 1100, training accuracy 0.98
step 1200, training accuracy 1
step 1300, training accuracy 0.94
step 1400, training accuracy 0.96
step 1500, training accuracy 0.98
step 1600, training accuracy 0.96
step 1700, training accuracy 0.96
step 1800, training accuracy 1
step 1900, training accuracy 0.94
step 2000, training accuracy 1
step 2100, training accuracy 0.94
step 2200, training accuracy 0.98
step 2300, training accuracy 1
step 2400, training accuracy 1
step 2500, training accuracy 0.98
step 2600, training accuracy 0.98
step 2700, training accuracy 0.96
step 2800, training accuracy 0.98
step 2900, training accuracy 1
step 3000, training accuracy 0.98
step 3100, training accuracy 0.96
step 3200, training accuracy 1
step 3300, training accuracy 0.98
step 3400, training accuracy 0.98
step 3500, training accuracy 1
step 3600, training accuracy 0.98
step 3700, training accuracy 1
step 3800, training accuracy 0.98
step 3900, training accuracy 0.96
step 4000, training accuracy 0.98
step 4100, training accuracy 0.98
step 4200, training accuracy 0.96
step 4300, training accuracy 0.96
step 4400, training accuracy 1
step 4500, training accuracy 0.94
step 4600, training accuracy 1
step 4700, training accuracy 1
step 4800, training accuracy 0.96
step 4900, training accuracy 1
step 5000, training accuracy 1
step 5100, training accuracy 0.96
step 5200, training accuracy 0.98
step 5300, training accuracy 1
step 5400, training accuracy 0.98
step 5500, training accuracy 0.98
step 5600, training accuracy 1
step 5700, training accuracy 1
step 5800, training accuracy 0.98
step 5900, training accuracy 1
step 6000, training accuracy 1
step 6100, training accuracy 1
step 6200, training accuracy 0.98
step 6300, training accuracy 1
step 6400, training accuracy 0.98
step 6500, training accuracy 1
step 6600, training accuracy 1
step 6700, training accuracy 1
step 6800, training accuracy 0.98
step 6900, training accuracy 1
step 7000, training accuracy 1
step 7100, training accuracy 1
step 7200, training accuracy 1
step 7300, training accuracy 1
step 7400, training accuracy 1
step 7500, training accuracy 1
step 7600, training accuracy 0.98
step 7700, training accuracy 1
step 7800, training accuracy 0.98
step 7900, training accuracy 0.98
step 8000, training accuracy 0.98
step 8100, training accuracy 1
step 8200, training accuracy 1
step 8300, training accuracy 1
step 8400, training accuracy 1
step 8500, training accuracy 1
step 8600, training accuracy 0.98
step 8700, training accuracy 1
step 8800, training accuracy 0.98
step 8900, training accuracy 0.98
step 9000, training accuracy 1
step 9100, training accuracy 1
step 9200, training accuracy 1
step 9300, training accuracy 1
step 9400, training accuracy 1
step 9500, training accuracy 1
step 9600, training accuracy 0.98
step 9700, training accuracy 1
step 9800, training accuracy 1
step 9900, training accuracy 1
step 10000, training accuracy 1
step 10100, training accuracy 0.98
step 10200, training accuracy 1
step 10300, training accuracy 1
step 10400, training accuracy 1
https://github.com/peterlee0127/tensorflow-tx2step 10500, training accuracy 1
step 10600, training accuracy 1
step 10700, training accuracy 1
step 10800, training accuracy 1
...

Thanks

Hello.

Thanks for the reply. It seems to me, that the only difference we have is that you have cuDNNv7 and I have the version 6 that comes with Jetpack 3.1. I will try to update, and see if that makes a difference, thank you!

Hello again.

I have now flashed the Jetson with Jetpack 3.1. I then installed Tensor RT by following the debian install guide on the link you provided. I then installed the wheel file, and tried to run the “mnist_deep.py”. It gives me the same error as in my first post.

This, unfortunately, didn’t help. Just a few hours ago, the repo with the wheel file for Tensorflow 1.3 was updated, and now there is a wheelfile for TensorFlow 1.4. I’ll try this one out too, but I’d still like to get to the bottom of this issue.

Any other ideas on what might be wrong?

Thanks

Hi,

Which TensorFlow wheel do you use?
Have you tried the wheel file mentioned in comment #2?

Thanks.