Tensorflow1.14 is not working on RTX3090 inside the Docker container of Ubuntu18.04 and CUDA10.0 with Python2

Description

Tensorflow1.14 is not working with RTX3090 inside the Docker container of CUDA10.0.
I created a program for reinforcement learning, and it works in the Docker container I created on GTX1080Ti.
I move on to a new PC with RTX3090, but it doesn’t work.
The program takes 20 minutes to start itself then error codes are shown.
On GTX1080Ti PC, it works without waiting a long time.

Note that some CUDA10.0 samples work on RTX3090 PC inside the docker container.

The error codes as follows
2020-12-01 08:12:06.722138: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2020-12-01 08:12:06.722611: I tensorflow/stream_executor/stream.cc:4838] [stream=0x555d1dba6960,impl=0x555d1dd5fe80] did not memzero GPU location; source: 0x7f1872ffbd20
2020-12-01 08:12:06.722627: I tensorflow/stream_executor/stream.cc:315] did not allocate timer: 0x7f1872ffbd30
2020-12-01 08:12:06.722633: I tensorflow/stream_executor/stream.cc:1839] [stream=0x555d1dba6960,impl=0x555d1dd5fe80] did not enqueue ‘start timer’: 0x7f1872ffbd30
2020-12-01 08:12:06.722642: I tensorflow/stream_executor/stream.cc:1851] [stream=0x555d1dba6960,impl=0x555d1dd5fe80] did not enqueue ‘stop timer’: 0x7f1872ffbd30
2020-12-01 08:12:06.722651: F tensorflow/stream_executor/gpu/gpu_timer.cc:65] Check failed: start_event_ != nullptr && stop_event_ != nullptr
Aborted (core dumped)

Environment

GPU Type: GTX1080TI(works), RTX3090(doesn’t work)
Nvidia Driver Version: 418.40.04(1080Ti PC), 455.45.01(3090 PC)
CUDA Version:10.1(1080Ti PC host), 11.1(3090 PC host), 10.0(docker)
CUDNN Version: 7
Operating System + Version: Ubuntu18.04(1080Ti PC), Ubuntu 20.04(3090 PC), Ubuntu 18.04(docker)
Python Version (if applicable): 2.7
TensorFlow Version (if applicable): 1.14
Baremetal or Container (if container which image + tag): I used it as a base image; nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04

Hi @asobod11138,
I suggest you to raise this concern to CUDA forum, they should be able to help you better here.

Thanks!

1 Like

Thanks! I changed this question to CUDA forum.

I am wondering if maybe TensorFlow 1.14 might not contain native code for the RTX 3000 series of cards (Ampere generation) and hence it hits the JIT (Just-in-time) compilation in the CUDA driver on the host machine.

This would explain the long startup times.

I’ve previously seen crashes in JITted code due to driver bugs, and in one case due to an overly complex kernel getting JITTed and running out of (stack?) memory in the process.

Is upgrading to TensorFlow 1.15 an option? From what I gather here, they have Ampere support in it

1 Like

This behavior isn’t surprising. You might wish to use a TF container that has already been built with software versions that support Ampere. A container like that is available on ngc and you can see the release notes to discover what software versions it contains.

1 Like

Thank you very much!

It has seemed like a good solution, so I tried it, but I couldn’t solve my problem because of Python2.
Actually, I am using ROS melodic. Therefore, my program needs to run on Python2.

So I tried to find the container with Python2, and the latest version is 20.01-tf1-py2. However, it was released on 01/28/2020 at 08:24 AM.

I thought it doesn’t support Ampere, but I tried then the same problem occurred, as I thought.

Is there any container supporting Ampere with Tensorflow1 and Python2?


[Edit to add more information]
I said, “same problem” but not the same.
Waiting for a long time to start the program is the same, but the program is working.

There isn’t one in NGC that I know of. Python2 support was dropped a while ago. You may find one somewhere else on the web, or you can try building your own.

1 Like

Alright. Thank you very much for your kindness.