My Env :
CentOS 7.4
Pyhton 3.6.10
tensorflow ==1.15.0
Nvidia driver 396.37 (tesla p100)
Cuda 9.2
cuDnn 9.2-linux-x64-v7.6.5.32
Problem :
I use dual GPU system , that names are “/device:XLA_CPU:0” and “/device:XLA_CPU:1”.
But when I checked the nvidia-smi & training results, it was different from the window PC in the same environment.
So I’m trying to do it with just one GPU.
with K.tfdevice(‘/device:XLA_GPU:1’): Using the command to execute the code, the following error occurred:
2020-02-28 21:47:15.586539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-28 21:47:16.538678: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:3b:00.0
2020-02-28 21:47:16.539295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:86:00.0
2020-02-28 21:47:16.539406: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcudart.so.10.0’; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64::/opt/python/lib:/APP/enhpc/mpi/openmpi-gcc/lib:/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/lib:/opt/python/lib:/APP/enhpc/mpi/openmpi-gcc/lib:/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/lib::/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
2020-02-28 21:47:16.539470: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcublas.so.10.0’; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64::/opt/python/lib:/APP/enhpc/mpi/openmpi-gcc/lib:/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/lib:/opt/python/lib:/APP/enhpc/mpi/openmpi-gcc/lib:/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/lib::/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
2020-02-28 21:47:16.539518: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcufft.so.10.0’; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64::/opt/python/lib:/APP/enhpc/mpi/openmpi-gcc/lib:/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/lib:/opt/python/lib:/APP/enhpc/mpi/openmpi-gcc/lib:/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/lib::/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
2020-02-28 21:47:16.539565: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcurand.so.10.0’; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64::/opt/python/lib:/APP/enhpc/mpi/openmpi-gcc/lib:/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/lib:/opt/python/lib:/APP/enhpc/mpi/openmpi-gcc/lib:/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/lib::/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
2020-02-28 21:47:16.539612: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcusolver.so.10.0’; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64::/opt/python/lib:/APP/enhpc/mpi/openmpi-gcc/lib:/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/lib:/opt/python/lib:/APP/enhpc/mpi/openmpi-gcc/lib:/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/lib::/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
2020-02-28 21:47:16.539658: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘libcusparse.so.10.0’; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64::/opt/python/lib:/APP/enhpc/mpi/openmpi-gcc/lib:/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/lib:/opt/python/lib:/APP/enhpc/mpi/openmpi-gcc/lib:/usr/local/cuda-9.0/lib64:/usr/local/cuda-9.0/lib::/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
2020-02-28 21:47:16.542597: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-28 21:47:16.542621: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at 使用 pip 安装 TensorFlow for how to download and setup the required libraries for your platform.
Skipping registering GPU devices…
2020-02-28 21:47:16.543187: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-02-28 21:47:16.553819: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2020-02-28 21:47:16.555020: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xcc9c680 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-02-28 21:47:16.555042: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-02-28 21:47:16.832426: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xccff6c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-02-28 21:47:16.832490: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0
2020-02-28 21:47:16.832506: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0
2020-02-28 21:47:16.832875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-28 21:47:16.832889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]
2020-02-28 21:47:16.913283: I tensorflow/compiler/jit/xla_compilation_cache.cc:238] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
2020-02-28 21:47:16.914574: F tensorflow/stream_executor/cuda/cuda_driver.cc:175] Check failed: err == cudaSuccess || err == cudaErrorInvalidValue Unexpected CUDA error: CUDA driver version is insufficient for CUDA runtime version
2020-02-28 21:47:16.914588: F tensorflow/stream_executor/cuda/cuda_driver.cc:175] Check failed: err == cudaSuccess || err == cudaErrorInvalidValue Unexpected CUDA error: CUDA driver version is insufficient for CUDA runtime version
2020-02-28 21:47:16.914574: F tensorflow/stream_executor/cuda/cuda_driver.cc:175] Check failed: err == cudaSuccess || err == cudaErrorInvalidValue Unexpected CUDA error: CUDA driver version is insufficient for CUDA runtime version
Aborted (core dumped)
Attempts for Solutions
- tried to install cuda and driver again.
- Add environment variables to vi ~/.bashrc
export PATH=/usr/local/cuda-9.2/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-9.2/lib64:$LD_LIBRARY_PATH
I am not entirely sure about using XLA gpu for tensor flow. What should I do?