Tensorflow1.14 is not working on RTX3090 inside the Docker container of Ubuntu18.04 and CUDA10.0 with Python2

asobod11138 · December 2, 2020, 6:47am

Description

Tensorflow1.14 is not working with RTX3090 inside the Docker container of CUDA10.0.
I created a program for reinforcement learning, and it works in the Docker container I created on GTX1080Ti.
I move on to a new PC with RTX3090, but it doesn’t work.
The program takes 20 minutes to start itself then error codes are shown.
On GTX1080Ti PC, it works without waiting a long time.

Note that some CUDA10.0 samples work on RTX3090 PC inside the docker container.

The error codes as follows
2020-12-01 08:12:06.722138: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2020-12-01 08:12:06.722611: I tensorflow/stream_executor/stream.cc:4838] [stream=0x555d1dba6960,impl=0x555d1dd5fe80] did not memzero GPU location; source: 0x7f1872ffbd20
2020-12-01 08:12:06.722627: I tensorflow/stream_executor/stream.cc:315] did not allocate timer: 0x7f1872ffbd30
2020-12-01 08:12:06.722633: I tensorflow/stream_executor/stream.cc:1839] [stream=0x555d1dba6960,impl=0x555d1dd5fe80] did not enqueue ‘start timer’: 0x7f1872ffbd30
2020-12-01 08:12:06.722642: I tensorflow/stream_executor/stream.cc:1851] [stream=0x555d1dba6960,impl=0x555d1dd5fe80] did not enqueue ‘stop timer’: 0x7f1872ffbd30
2020-12-01 08:12:06.722651: F tensorflow/stream_executor/gpu/gpu_timer.cc:65] Check failed: start_event_ != nullptr && stop_event_ != nullptr
Aborted (core dumped)

Environment

GPU Type: GTX1080TI(works), RTX3090(doesn’t work)
Nvidia Driver Version: 418.40.04(1080Ti PC), 455.45.01(3090 PC)
CUDA Version:10.1(1080Ti PC host), 11.1(3090 PC host), 10.0(docker)
CUDNN Version: 7
Operating System + Version: Ubuntu18.04(1080Ti PC), Ubuntu 20.04(3090 PC), Ubuntu 18.04(docker)
Python Version (if applicable): 2.7
TensorFlow Version (if applicable): 1.14
Baremetal or Container (if container which image + tag): I used it as a base image; nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04

AakankshaS · December 2, 2020, 8:51am

Hi @asobod11138,
I suggest you to raise this concern to CUDA forum, they should be able to help you better here.

Thanks!

asobod11138 · December 2, 2020, 9:56am

Thanks! I changed this question to CUDA forum.

cbuchner1 · December 2, 2020, 10:11am

I am wondering if maybe TensorFlow 1.14 might not contain native code for the RTX 3000 series of cards (Ampere generation) and hence it hits the JIT (Just-in-time) compilation in the CUDA driver on the host machine.

This would explain the long startup times.

I’ve previously seen crashes in JITted code due to driver bugs, and in one case due to an overly complex kernel getting JITTed and running out of (stack?) memory in the process.

Is upgrading to TensorFlow 1.15 an option? From what I gather here, they have Ampere support in it
https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/

Robert_Crovella · December 2, 2020, 3:46pm

This behavior isn’t surprising. You might wish to use a TF container that has already been built with software versions that support Ampere. A container like that is available on ngc and you can see the release notes to discover what software versions it contains.

asobod11138 · December 3, 2020, 1:53pm

Thank you very much!

asobod11138 · December 3, 2020, 2:12pm

It has seemed like a good solution, so I tried it, but I couldn’t solve my problem because of Python2.
Actually, I am using ROS melodic. Therefore, my program needs to run on Python2.

So I tried to find the container with Python2, and the latest version is 20.01-tf1-py2. However, it was released on 01/28/2020 at 08:24 AM.

I thought it doesn’t support Ampere, but I tried then the same problem occurred, as I thought.

Is there any container supporting Ampere with Tensorflow1 and Python2?

[Edit to add more information]
I said, “same problem” but not the same.
Waiting for a long time to start the program is the same, but the program is working.

Robert_Crovella · December 3, 2020, 3:16pm

There isn’t one in NGC that I know of. Python2 support was dropped a while ago. You may find one somewhere else on the web, or you can try building your own.

asobod11138 · December 3, 2020, 3:28pm

Alright. Thank you very much for your kindness.

frank_nollert · February 12, 2021, 3:43pm

Hi @Robert_Crovella , thanks for answering the question. Would you mind elaborating on why this behavior isn’t surprising?
I am in a very similar situation and use a cache to not JIT for every execution, which seems to work fine (Also TF1.14, inside docker with Cuda 10, host driver 460.39 with RTX3090):

export CUDA_CACHE_DISABLE=0
export CUDA_CACHE_MAXSIZE=2147483647

The compilation works and the program processes the data. However, the outputs are seemingly random numbers (which isn’t the case when running the same model from a 2080 without docker). Conceptually, should the JIT compilation work or is running CUDA10 inside a container in principal not supported for Ampere (based on the documentation I saw so far I thought it would be)?
Switching to a different base container means a lot of work in our case, hence I try to understand the problem better first. More information would be very appreciated.

Robert_Crovella · February 12, 2021, 4:05pm

I wouldn’t be able to say anything specific about this case. The reason I find it not surprising that something goes wrong in the original stated case is that the OP was running a container that was never designed to support an Ampere GPU and was never tested on an Ampere GPU before it was released. Bugs are always possible in any software, and such situations, in my experience, are a more likely place for bugs (when software has never been tested on the machine it is running on.)

Theoretically, CUDA has a “forward-compatibility path” which involves JIT of PTX as you already know. If all the libraries in use contain PTX, and compilation settings are correct, then this should allow a binary to run on a newer GPU. Indeed, the fact that the compilation proceeds and even the runtime appears to work without any errors “thrown” by the CUDA runtime, suggests to me that it is working in some fashion. The actual problem in your case may lie somewhere else, or it may be that this forward compatibility mechanism is actually not working correctly in this case, for some reason (bug, etc.) Bugs are always possible.

chenshihe666 · April 2, 2022, 12:15pm

hello，I met the same problem with you? have you solved the problem?
Environment:
GPU: RTX 3090
Nvidia Driver Version : 470.103.01(3090 PC)
CUDA Version : 11.4(3090 PC host), 10.0(docker)
CUDNN Version : 7
Operating System + Version : Ubuntu18.04(docker)
Python Version (if applicable) : 3.6
TensorFlow Version (if applicable) : 1.14

Topic		Replies	Views
all CUDA-capable devices are busy or unavailable. What is wrong? cuDNN	10	9091	October 12, 2021
Can't get cuda:10.0 docker container to run with tensorflow-gpu Frameworks tensorflow	3	1416	March 4, 2020
Crash on training (CUDA_ERROR_LAUNCH_FAILED) cuDNN	7	6754	October 12, 2021
Tensorflow 2.1 with CUDA10.2 warnings .. Frameworks tensorflow	15	17739	July 3, 2020
Docker container cant use GPU cuDNN tensorflow , docker , python , gpu	1	4354	July 1, 2022
CUDA 10.2 & Tensorflow 2.0. Getting an error when testing Tensorflow CUDA Setup and Installation	7	20851	March 20, 2020
run tensorflow 1.3 on tx2 stuck Jetson TX2	20	5571	October 18, 2021
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed Jetson TX2	8	6276	October 18, 2021
Downgrading CUDA 10.1 to 10 in windows because TF2.0.0 doesnt work with 10.1 CUDA Setup and Installation	9	22066	August 18, 2022
CUDA drivers insufficient Frameworks tensorflow	31	2457	October 12, 2021

Tensorflow1.14 is not working on RTX3090 inside the Docker container of Ubuntu18.04 and CUDA10.0 with Python2

Description

Environment

Related topics