Configuring multiple versions of TensorRT and Tensorflow on HPC share cluster; TF-TRT Warning: Cannot dlopen some TensorRT libraries

We use Bright Computing for provisioning nodes on RHEL 9 and have cuda 11.7 and cuda 11.8 available as modules, as well as cudnn 8.5 for cuda 11.7 and cudnn 8.8 for cuda 11.8. I also created a module for cutensor-cuda11.7.

We also have various modules for Python, e.g., mamba with Python 3.11, Anaconda Python 3.9.10. Tensorflow 2.11.0 was installed via pip with --user.

NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 and NVIDIA RTX A6000 are the GPUs

What’s the reason for TF not finding the GPUs?

2023-03-30 11:54:34.772791: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-30 11:54:35.566539: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cm/shared/apps/cutensor-cuda11.7/1.3.1.3/lib/11:/cm/local/apps/cuda/libs/current/lib64:/cm/shared/apps/cuda11.7/toolkit/11.7.1/targets/x86_64-linux/lib:/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64

2023-03-30 11:54:35.566613: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cm/shared/apps/cutensor-cuda11.7/1.3.1.3/lib/11:/cm/local/apps/cuda/libs/current/lib64:/cm/shared/apps/cuda11.7/toolkit/11.7.1/targets/x86_64-linux/lib:/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64

2023-03-30 11:54:35.566627: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

>>> print(tf.__version__)

2.11.0
python mnist.py

2023-03-30 11:46:44.803644: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2023-03-30 11:46:45.605164: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cm/shared/apps/cutensor-cuda11.7/1.3.1.3/lib/11:/cm/local/apps/cuda/libs/current/lib64:/cm/shared/apps/cuda11.7/toolkit/11.7.1/targets/x86_64-linux/lib:/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64

2023-03-30 11:46:45.605449: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cm/shared/apps/cutensor-cuda11.7/1.3.1.3/lib/11:/cm/local/apps/cuda/libs/current/lib64:/cm/shared/apps/cuda11.7/toolkit/11.7.1/targets/x86_64-linux/lib:/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64

2023-03-30 11:46:45.605462: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

2023-03-30 11:46:47.410968: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cm/shared/apps/cutensor-cuda11.7/1.3.1.3/lib/11:/cm/local/apps/cuda/libs/current/lib64:/cm/shared/apps/cuda11.7/toolkit/11.7.1/targets/x86_64-linux/lib:/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64

2023-03-30 11:46:47.411008: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.

Skipping registering GPU devices...

But at least on cuda 11.8 the GPU is found:

python mnist.py

2023-03-30 12:04:52.263547: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.

To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

2023-03-30 12:04:53.184678: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

2023-03-30 12:04:55.144931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 46672 MB memory: -> device: 0, name: NVIDIA RTX A6000, pci bus id: 0000:c1:00.0, compute capability: 8.6

Epoch 1/10

2023-03-30 12:04:56.798351: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:637] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

2023-03-30 12:04:56.967906: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x1547d7dfd390 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:

2023-03-30 12:04:56.967960: I tensorflow/compiler/xla/service/service.cc:177] StreamExecutor device (0): NVIDIA RTX A6000, Compute Capability 8.6

2023-03-30 12:04:56.971514: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.

2023-03-30 12:04:57.084710: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8801

2023-03-30 12:04:57.094134: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:530] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.

Searched for CUDA in the following directories:

./cuda_sdk_lib

/usr/local/cuda-11.8

/usr/local/cuda

.

You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.

2023-03-30 12:04:57.094329: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc

2023-03-30 12:04:57.094586: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc

2023-03-30 12:04:57.094610: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INTERNAL: libdevice not found at ./libdevice.10.bc

[[{{node StatefulPartitionedCall_2}}]]

2023-03-30 12:04:57.111179: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc

2023-03-30 12:04:57.111354: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc

2023-03-30 12:04:57.155357: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc

2023-03-30 12:04:57.155587: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc

2023-03-30 12:04:57.171499: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc

2023-03-30 12:04:57.171675: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc
1 Like

Hi,
Please check the below links, as they might answer your concerns.

Thanks!

Nothing there about multiple versions. Any other specific suggestions?

Hi @rk3199 ,

We are checking on this. Will update you on the same.

Thanks

Hi @rk3199 ,
Did you try completely removing CUDA and reinstall it again.

Thanks

No as this is in a cluster a loaded as a module.

Lower versions of Python,. e.g., 3.7 does not generate this error/warning.

Hi @rk3199 ,
Can you please share the TRT version you are using?
Also if you can try an upgrade and let us know if issue is still there?

Thanks

pip list | grep -i tensorrt
WARNING: Ignoring invalid distribution -ensorflow (/path/to/me/.local/lib/python3.9/site-packages)
nvidia-tensorrt 8.4.3.1
tensorrt 8.6.1
tensorrt-bindings 8.6.1
tensorrt-dispatch 8.6.0
tensorrt-lean 8.6.0
tensorrt-libs 8.6.1

Upgrade what? We use modules so I can install specific versions.

1 Like

Hi, I have the same problem using pyhton 3.9.4, cuda and cudnn 11.5 on HPC. Any solution so far?

1 Like