Tensorflow fails to create a session and issue with docker

Hi,

We are using the Jetson TX2 (L4T R28.2) platform with Tensorflow (1.7.0 with CUDA 9, it also happens with 1.7.1 with CUDA 8) to run real-time object detection.

We have 2 issues that we need help investigating them:

  1. From time to time, upon startup of our application we will get an error where Tensorflow fails to create a session, with an error like this:

2018-05-30 14:01:50.824196: E tensorflow/core/common_runtime/direct_session.cc:167] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: unknown error
Process DetectionProcess-1:
Traceback (most recent call last):
File “/usr/lib/python3.5/multiprocessing/process.py”, line 249, in _bootstrap
self.run()
File “/opt/unit/mp/UnitProcess.py”, line 23, in run
raise e
File “/opt/unit/mp/UnitProcess.py”, line 18, in run
self.work()
File “/opt/unit/vision/detection/detection_process.py”, line 62, in work
net = Factory.createObjectDetector(self._net_params, self.logger)
File “/opt/unit/vision/detection/factory.py”, line 17, in createObjectDetector
return MobileNet(params, logger)
File “/opt/unit/vision/detection/mobile_net.py”, line 121, in init
self.build(tracking_params[“inference_graph”], tracking_params[“label_map”])
File “/opt/unit/vision/detection/mobile_net.py”, line 172, in build
config=config)
File “/home/notraffic/.virtualenvs/cv/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1509, in init
super(Session, self).init(target, graph, config=config)
File “/home/notraffic/.virtualenvs/cv/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 638, in init
self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File “/home/notraffic/.virtualenvs/cv/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py”, line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

There is plenty memory free before our application starts (6GB free), but running simple gpu stress testing reveals that when this issue happens something is wrong with the memory and a simple script will fail with memory problems and only setting gpu_usage_fraction in tensorflow to a low value will make it work:

2018-06-07 05:53:02.401007: E tensorflow/stream_executor/cuda/cuda_blas.cc:462] failed to create cubl
as handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-06-07 05:53:02.402026: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:650] failed to rec
ord completion event; therefore, failed to create inter-stream dependency
2018-06-07 05:53:02.402026: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:650] failed to rec
ord completion event; therefore, failed to create inter-stream dependency
2018-06-07 05:53:02.402113: E tensorflow/stream_executor/event.cc:40] could not create CUDA event: CU
DA_ERROR_UNKNOWN
Segmentation fault (core dumped)

For us it looks like something gets “stuck” in the memory.

  1. We are trying to move our code to run on Docker with the GPU on the Jetson TX2, we managed to run it but the GPU performance is 50% of what we had before in terms of detection FPS (from 23FPS to 11FPS).
    We are using this guide: https://github.com/Technica-Corporation/Tegra-Docker
    We use JetPack ver 3.2

Hi,

Not sure if there is anything incorrect when compiling the TensorFlow source.
It’s recommended to use the TensorFlow package from the following link since tested:
https://devtalk.nvidia.com/default/topic/1031300
https://github.com/peterlee0127/tensorflow-nvJetson

Usually, fail to create session is caused by the incompatible of CUDA driver and TF library.
Please give the suggested wheel a try and let us know the result.

We don’t have too much experience about the docker you shared.
In general, please check if the CPU/GPU clock have maximized.

sudo ./jetson_clocks.sh

By the way, here is good realtime object-detection sample for your reference:
https://github.com/GustavZ/realtime_object_detection

Thanks.

Hi AastaLLL,

  1. We moved to JetPack 3.1/L4T R28.1 and it fixed the problem, trying with the exact same wheel, same CUDA and cuDNN version will work in R28.1 and fail in R28.2

It is important to say, we aren’t getting failed to create the session always, it happens sometimes and when it happens it will happen on program start.
Because of that we don’t believe it is a CUDA/TF incompatibility but rather an issue with the R28.2 drivers.

  1. We use both jetson_clocks.sh and nvpmodel -m 0 both didn’t help.
    We also tried it with chroot instead of docker and got the same performance.

Regarding the realtime object-detection, we already did some of the tricks from that repo and we are getting similar performance without docker.

  1. After further investigation we found out that the performance loss caused by some other issues, not related to docker, and after we fixed those we had similar performance using docker.
    At least using JetPack3.1

Good to know this.
Thanks for the feedback. : )

Hi AstaLLL,

Any update on the JetPack 3.2 issue? even though the docker issue was resolved the issue with JetPack 3.2 still exists and we had to revert to JetPack 3.1 which means we can’t upgrade beyond CUDA 8.

We tried many many versions of TF, both compiled by us and others and it still happens, it is important to say this issue is one that is hard to reproduce, it will happen sometimes and sometimes it won’t for a long time.

In terms of code to reproduce, just starting a TF session with gpu usage 1.0 will probably cause it to happen, from our investigation it seems like somehow even after the process that used the GPU died the GPU memory is not free even though linux free command reports plenty memory left.

This also can happen just after a reboot, our hunch is that there is something wrong with the GPU drivers as this doesn’t happen on JetPack 3.1 with the exact same CUDA 8, CuDNN 6.1 and TF versions and binaries.

Hi,

Actually, we don’t notice this issue.

TensorFlow works correctly in our environment.
In case it is related to wheel file, here is some package we have tested for your reference:
https://devtalk.nvidia.com/default/topic/1031300/jetson-tx2/tensorflow-1-8-wheel-with-jetpack-3-2-/

Thanks.

Hi,

We tried to use the Tensorflow 1.7 wheel file from:
https://devtalk.nvidia.com/default/topic/1031300/jetson-tx2/tensorflow-1-8-wheel-with-jetpack-3-2-/

And we have the exact same issue.
We also tried to upgrade to JetPack 3.2.1 - still the exact issue arises.

Please note that it might take few reboots to the Jetson before this issue happens.

Hi,

Have you set this configuration to the TensorFlow?

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

TensorFlow by default allocates all the available memory, and this will fail on an iGPU environment. (host/GPU share same physical memory)
Please check this topic for more details:
https://devtalk.nvidia.com/default/topic/1029742/jetson-tx2/tensorflow-1-6-not-working-with-jetpack-3-2/

Thanks.

Hi AastaLLL,

Yes we have tried it before and it still crashes from time to time.

The thread you mentioned is similar to our issue and I think there is a bigger issue with JetPack 3.2.

We are able to run the exact same code (CUDA 8, Tensorflow 1.7, our exact same code) in JetPack 3.1 and JetPack 3.2 and in 3.1 it will never happen and in 3.2 it will happen from time to time.

The issue will even happen after reboot, it seems like the memory gets “stuck”, even that free command says there is available memory the GPU can’t allocate it.

Also note that on the issue you linked there are various authors and it only “solved” it for one of them, for example this guy

It seems I was a bit too quick to celebrate. Sometimes the same error presents itself. It's intermittent and very difficult to pinpoint.

At first I got it after a reboot. Later after attempting to free up memory to fit a larger model.

There is certainly a memory allocation problem with the drivers of 3.2 (and also 3.2.1)

Hi,

We are sorry for any unclear explanation before.

1.
There is one thing we want to clarify first:
TensorFlow wheel has dependency on JetPack version.
You cannot use a wheel built on JetPack3.1 on the JetPack3.2 environment. Especially they are different in CUDA version.
Please check if your TensorFlow package is right built with JetPack3.2.

2.
We do have a memory allocation issue on CUDA and will be fixed in next release.
https://devtalk.nvidia.com/default/topic/1033209/jetson-tx2/general-question-about-jetsons-gpu-cpu-shared-memory-usage/2
In short, CUDA doesn’t allowed a big chunk memory allocation which is over than 4G.
This issue can be avoid with config.gpu_options.allow_growth configuration.

Since you got stuck when creating tf.session, we don’t think your issue is related to memory.
Instead, we suspect there is driver/CUDA incompatible issue between OS and TensorFlow package.

Could you help us check the TenorFlow building environment first?

Thanks.