Segmentation fault in Tensorflow 2.0 object detection api

Hi,

I have installed Tensorflow 2.0 GPU version as described in the web portal, commands given below link - https://www.tensorflow.org/install/gpu#ubuntu_1804_cuda_10

I have tried to install Nvidia driver 418 but its installing 430 and cuda is 10.1 not 10.0 for some reason in nvidia-smi.

How can I install nvidia driver 418 and cuda 10.0, which is compatible with tf 2.0.

I tried to use the same versions of drivers, in tf object detection api I get a Segmentation fault.

Can you please help me install Nvidia 418 drive with cuda 10.0 which is compatible with tf 2.0

The driver is backward compatible with older toolkit versions, so a 430 driver will work fine.

The linked instructions on tensorflow.org will install cuda 10.0 to /usr/local/cuda-10.0. It will also symlink /usr/local/cuda to /usr/local/cuda-10.0 if the former does not already exist. It sounds like you might have a previous installation of CUDA 10.1 on the system. Make sure your LD_LIBRARY_PATH variable links to /usr/local/cuda-10.0 and perhaps update the /usr/local/cuda symlink to point to /usr/local/cuda-10.0.

I have used the below script to update all the paths in the system -
https://github.com/phohenecker/switch-cuda/blob/master/switch-cuda.sh

But even after this is still see that the nvidia-smi gives me the cuda 10.1 not cuda 10.0.

Can you please share the sequence of commands/actions to update the LD_LIBRARY_PATH and symlink paths.

Sharing the command prompt log for reference -
for object detection api -
$ python3 object_detection_tutorial.py
2019-11-18 10:20:33.510150: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-11-18 10:20:33.642854: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-18 10:20:33.643298: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2019-11-18 10:20:33.668233: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-11-18 10:20:33.917598: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-11-18 10:20:34.105162: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2019-11-18 10:20:34.165999: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2019-11-18 10:20:34.476055: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2019-11-18 10:20:34.678114: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2019-11-18 10:20:35.231845: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-11-18 10:20:35.232203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-18 10:20:35.233159: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-18 10:20:35.233885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-11-18 10:20:35.249826: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-18 10:20:35.415423: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-11-18 10:20:35.430816: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x12187bd0 executing computations on platform Host. Devices:
2019-11-18 10:20:35.431022: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2019-11-18 10:20:35.550434: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-18 10:20:35.550976: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x121ba9c0 executing computations on platform CUDA. Devices:
2019-11-18 10:20:35.551005: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
2019-11-18 10:20:35.551213: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-18 10:20:35.551610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2019-11-18 10:20:35.551662: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-11-18 10:20:35.551698: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-11-18 10:20:35.551731: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2019-11-18 10:20:35.551752: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2019-11-18 10:20:35.551785: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2019-11-18 10:20:35.551815: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2019-11-18 10:20:35.551838: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-11-18 10:20:35.551931: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-18 10:20:35.552368: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-18 10:20:35.552748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-11-18 10:20:35.570675: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2019-11-18 10:20:35.582211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-18 10:20:35.582281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2019-11-18 10:20:35.582301: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2019-11-18 10:20:35.588437: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-18 10:20:35.589494: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-18 10:20:35.590709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3436 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-11-18 10:20:54.375827: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
Segmentation fault (core dumped)

For Nvidia-smi:

$ nvidia-smi
Mon Nov 18 10:26:52 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105… Off | 00000000:01:00.0 On | N/A |
| N/A 55C P3 N/A / N/A | 365MiB / 4040MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1874 G /usr/lib/xorg/Xorg 243MiB |
| 0 2018 G /usr/bin/gnome-shell 61MiB |
| 0 2934 G /snap/pycharm-community/163/jbr/bin/java 2MiB |
| 0 3271 G …uest-channel-token=15008582713786260033 56MiB |
±----------------------------------------------------------------------------+

I have the exact same error I’m running into when running the same file.

In my case the GPU is not visible to Tensorflow, I get FALSE when I run tf.test.is_gpu_available().

So I removed the driver and tried to reinstall the driver again I get the below error -

sudo apt-get install --no-install-recommends \

cuda-10-0 \
libcudnn7=7.6.2.24-1+cuda10.0  \
libcudnn7-dev=7.6.2.24-1+cuda10.0

[sudo] password for sumanh:
Reading package lists… Done
Building dependency tree
Reading state information… Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
cuda-10-0 : Depends: cuda-runtime-10-0 (>= 10.0.130) but it is not going to be installed
Depends: cuda-demo-suite-10-0 (>= 10.0.130) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

Hpw to fix this any idea.

I reinstalled the OS and now with fresh installations with tf 2.1 and cuda 10.2 i get below warnings-
2020-01-10 15:27:40.101443: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libnvinfer.so.6’; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-01-10 15:27:40.101504: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libnvinfer_plugin.so.6’; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-01-10 15:27:40.101512: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Suman.h, TensorFlow’s pip packages require CUDA == 10.1, cudnn >= 7.6, and TensorRT == 6.0. See https://www.tensorflow.org/install/gpu#software_requirements.