Can't identify the cuda device

Description

I wanted to convert my tensorflow model to tensorrt but while I ran a conversion script it shoots an issue regarding unidentified Cuda device.

Environment

TensorRT Version: 6.0
GPU Type: 1 GPU. Tesla t4
CUDA Version: 10.0
CUDNN Version: 7.6.5
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): 2.1 (GPU supported)

Relevant Files

Resource used to convert to TensorRT : https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html

Steps To Reproduce

() Any Tensorflow model
(
) Run conversion script which can be found on the link mentioned above.

##Tried
export CUDA_VISIBLE_DEVICES=0

##Error log
Can't identify the cuda device. Running on device 0 Segmentation fault (core dumped)

Thanks in advance.

Hi,
Can you try upgrading your cuda driver version?
Also, please ensure that your driver version matches or exceeds your CUDA Toolkit version.

In order to avoid system dependencies, i would recommend you to use NGC containers.
https://www.nvidia.com/en-in/gpu-cloud/containers/

Thanks

It used be working a few days back ! But shows this kind of error all of the sudden.
Please do check the versions mentioned above as everything was a latest installation,

I tried into a new system or instance.
CUDA : 10.2
Cudnn:7.6.5
tensorRT : 6.0
Tensorflow : 2.1 GPU supported
Python 3.6
Ubuntu 18.04

This time i got the following error stating it got KILLED!

2020-04-06 09:24:13.076327: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1592] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-04-06 09:24:13.076465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-06 09:24:13.076522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-04-06 09:24:13.077169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
Killed

Hi,

Seems to be similar to below issues.
Please refer below link, in case it helps:
https://github.com/tensorflow/tensorflow/issues/35968
https://github.com/tensorflow/tensorflow/issues/34287

Thanks

No issues with loading dynamic library.
and the 2nd link didn’t help :(

Hi,

Could you please share the repro script or complete error log file so that we can help better?

Thanks