Tensorflow crashes with strange error

During training of a deep learning script I got the strange error below. I guess it is a hardware problem but I am not sure. Do you have any idea?

System

  • Debian Jessie TensorFlow newest version installed from source (Bazel version 1.22.0) Python version 3.4 CUDA/cuDNN version 10.0/7 GPU Tesla K10.G1.8GB

Error Message
tensorflow/stream_executor/cuda/cuda_driver.cc:184] Check failed: is_host_ptr == points_to_host_memory (0 vs. 1)dst pointer is not actually on GPU: 0x4304b80500
Fatal Python error: Aborted

Thread 0x00007fd5a3fff700 (most recent call first):
File “/usr/lib/python3.4/threading.py”, line 290 in wait
File “/usr/lib/python3.4/queue.py”, line 167 in get
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/summary/writer/event_file_writer.py”, line 159 in run
File “/usr/lib/python3.4/threading.py”, line 920 in _bootstrap_inner
File “/usr/lib/python3.4/threading.py”, line 888 in _bootstrap

Thread 0x00007fd5a9127700 (most recent call first):
File “/usr/lib/python3.4/threading.py”, line 290 in wait
File “/usr/lib/python3.4/queue.py”, line 167 in get
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/summary/writer/event_file_writer.py”, line 159 in run
File “/usr/lib/python3.4/threading.py”, line 920 in _bootstrap_inner
File “/usr/lib/python3.4/threading.py”, line 888 in _bootstrap

Current thread 0x00007fd5e0e54700 (most recent call first):
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/client/session.py”, line 1410 in _call_tf_sessionrun
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/client/session.py”, line 1322 in _run_fn
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/client/session.py”, line 1337 in _do_call
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/client/session.py”, line 1331 in _do_run
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/client/session.py”, line 1155 in _run
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/client/session.py”, line 932 in run
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/training/session_manager.py”, line 296 in prepare_session
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py”, line 639 in create_session
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py”, line 862 in create_session
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py”, line 1193 in _create_session
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py”, line 1188 in init
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py”, line 717 in init
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py”, line 997 in init
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py”, line 576 in MonitoredTrainingSession
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow_estimator/python/estimator/estimator.py”, line 1403 in _train_with_estimator_spec
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow_estimator/python/estimator/estimator.py”, line 1158 in _train_model_default
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow_estimator/python/estimator/estimator.py”, line 1124 in _train_model
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow_estimator/python/estimator/estimator.py”, line 358 in train
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/tpu/tpu_estimator.py”, line 2733 in train
File “run_squad.py”, line 1215 in main
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/absl/app.py”, line 251 in _run_main
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/absl/app.py”, line 300 in run
File “/opt/lampp/htdocs/venv/lib/python3.4/site-packages/tensorflow/python/platform/app.py”, line 40 in run
File “run_squad.py”, line 1283 in

I am having the exact same issue now.

Did you find a solution?

I am having the same problem as well.

So I got rid of this error by testing and loading a different frozen graph. There should be some layer in the graph which needs them to be loaded on CPU during the pre-processing stage. But I am not totally sure.