Description
Recently my builds have started throwing a library load error, I have made no changes on my end. I suspect something has changed with the nvidia dependencies. However, I cannot find any differences.
Steps
0. Host environment:
Driver: 510.108.03
GPU: RTX 3070 Mobile
Docker: 20.10.12
I’ve also reproduced this on multiple machines.
1. Installation environment (Dockerfile):
This is a simplified version of my environment. Following instruction from: Installation Guide :: NVIDIA Deep Learning TensorRT Documentation
FROM nvidia/cudagl:11.4.2-devel-ubuntu18.04
ENV DEBIAN_FRONTEND=noninteractive
# setup image
RUN echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | tee /etc/apt/sources.list.d/cuda-repo.list && \
apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
RUN apt -qq update --fix-missing && apt upgrade -y && apt install -y \
wget \
git \
sudo \
net-tools \
build-essential \
gdb \
tmux
# install trt
ADD nv-tensorrt-repo-ubuntu1804-cuda11.4-trt8.2.5.1-ga-20220505_1-1_amd64.deb /tmp/.
RUN cd /tmp && \
dpkg -i nv-tensorrt-repo-ubuntu1804-cuda11.4-trt8.2.5.1-ga-20220505_1-1_amd64.deb && \
apt-key add /var/nv-tensorrt-repo-ubuntu1804-cuda11.4-trt8.2.5.1-ga-20220505/*.pub && \
apt-get update && \
apt-get install -y \
libnvinfer8=8.2.5-1+cuda11.4 \
libnvinfer-plugin8=8.2.5-1+cuda11.4 \
libnvparsers8=8.2.5-1+cuda11.4 \
libnvonnxparsers8=8.2.5-1+cuda11.4 \
libnvinfer-bin=8.2.5-1+cuda11.4 \
libnvinfer-dev=8.2.5-1+cuda11.4 \
libnvinfer-plugin-dev=8.2.5-1+cuda11.4 \
libnvparsers-dev=8.2.5-1+cuda11.4 \
libnvonnxparsers-dev=8.2.5-1+cuda11.4 \
libnvinfer-samples=8.2.5-1+cuda11.4 \
libnvinfer-doc=8.2.5-1+cuda11.4
2. Build docker:
docker build -t trt_failure .
3. Run docker:
nvidia-docker run -it --privileged --env="DISPLAY" trt_failure /bin/bash
4. Build and run sample in docker:
cd /usr/src/tensorrt/samples/sampleMNIST && sudo make -j4 && /usr/src/tensorrt/bin/sample_mnist
This throws an exception:
Could not load library libcublasLt.so.10. Error: libcublasLt.so.10: cannot open shared object file: No such file or directory
It should not be loading cuda 10 libraries at all. ldd
shows the sample binary and all libraries are only linked to 11, so I’m not sure where this 10 dependency is coming from.
gdb backtrace:
Could not load library libcublasLt.so.10. Error: libcublasLt.so.10: cannot open shared object file: No such file or directory
Thread 1 "sample_mnist" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007fffdb3727f1 in __GI_abort () at abort.c:79
#2 0x00007fff894876b6 in ?? () from /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8
#3 0x00007fff8945bcab in cudnnCreate () from /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8
#4 0x00007fffde702227 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#5 0x00007fffde7029a6 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#6 0x00007fffddb4700b in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#7 0x00007fffddb4c84c in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#8 0x00007fffdde16da1 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#9 0x00007fffdde1c222 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#10 0x00007fffdde1cb18 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#11 0x000055555540b3cf in nvinfer1::IBuilder::buildSerializedNetwork(nvinfer1::INetworkDefinition&, nvinfer1::IBuilderConfig&) ()
#12 0x00005555554086c6 in SampleMNIST::build() ()
#13 0x0000555555409f12 in main ()
This shows the libcudnn_ops_infer.so.8
is trying to load the libcublasLt.so.10
library, for whatever reason.
Out of curiosity, i tried installing libcublasLt.so.10
libraries along side, this means the libraries are found, however produces a different error. Likely because it’s using incompatible versions. I’ve also tried symlinking the 10 library to the 11.4 one, but that didn’t work.