Tensorflow 1.7 on Jetpack 3.2 unstable

I am running into stability issues using TensorFlow 1.7 with Jetpack 3.2 installed.

What is happening is that I can boot up, create a virtualenv with all my dependencies and install TensorFlow 1.7 with the wheel. Everything will work fine, however it’s when I reboot that I begin to run into problems. When I reactivate my virtualenv and try to run the program again, I will get an unknown error. Here’s an example:

(DeepSpeaker) nvidia@tegra-ubuntu:~/MultimodalID/DeepSpeaker$ python DeepSpeaker.py 
/home/nvidia/.virtualenvs/DeepSpeaker/lib/python3.5/site-packages/pydub/utils.py:165: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
2018-05-22 00:26:15.575820: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-05-22 00:26:15.575941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 5.75GiB
2018-05-22 00:26:15.575990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-05-22 00:26:17.443806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-22 00:26:17.443881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-05-22 00:26:17.443907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-05-22 00:26:17.444078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5178 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-05-22 00:26:17.881770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-05-22 00:26:17.881873: E tensorflow/core/common_runtime/direct_session.cc:167] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: unknown error
Traceback (most recent call last):
  File "DeepSpeaker.py", line 105, in <module>
    startsec=0, endsec=12, num_clips=9)
  File "/home/nvidia/MultimodalID/DeepSpeaker/experiments/inference_pipe.py", line 48, in run_twophase_inference
    sess2 = tf.Session(graph=g2)
  File "/home/nvidia/.virtualenvs/DeepSpeaker/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1509, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/nvidia/.virtualenvs/DeepSpeaker/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 638, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/home/nvidia/.virtualenvs/DeepSpeaker/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

The solution is for me to simply pip uninstall tensorflow and then reinstall the wheel, and after that the program runs normally.

Hi,

Could you try to reproduce this issue directly on TX2 image without virtualenv environment?
Thanks

Hi AastaLLL,

Now I’m thinking that it may be a virtualenv issue. I can solve the problem now by copying my old virtualenv packages over into a new virtualenv. Unfortunately I don’t have the time to try and replicate without virtualenv, installing all the dependencies was non-trivial for this project. But I will keep you updated to the best of my capabilities as I learn more about this bug.

However, after getting it to run in the new virtualenv, I could go back to the old virtualenv and run it perfectly fine… Very bizarre.

I’m also encountering the same error ‘Unable to create a second session’ using tensorflow 1.8, the tensorflow issue can be located in here CUDA cannot create more than one session · Issue #19482 · tensorflow/tensorflow · GitHub

Have you tried any solution?

Hi, ibeckermayer

Thanks for your feedback.
Good to know it works now.

Hi, davidnet

Could you open a new issue to specify your issue?
Thanks.