TensorRT 8.2.5 could not load library libcublasLt.so.10

Description

Recently my builds have started throwing a library load error, I have made no changes on my end. I suspect something has changed with the nvidia dependencies. However, I cannot find any differences.

Steps

0. Host environment:

Driver: 510.108.03
GPU: RTX 3070 Mobile
Docker: 20.10.12

I’ve also reproduced this on multiple machines.

1. Installation environment (Dockerfile):

This is a simplified version of my environment. Following instruction from: Installation Guide :: NVIDIA Deep Learning TensorRT Documentation

FROM nvidia/cudagl:11.4.2-devel-ubuntu18.04
ENV DEBIAN_FRONTEND=noninteractive

# setup image
RUN echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | tee /etc/apt/sources.list.d/cuda-repo.list && \
    apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub

RUN apt -qq update --fix-missing && apt upgrade -y && apt install -y \
    wget \
    git \
    sudo \
    net-tools \
    build-essential \
    gdb \
    tmux

# install trt
ADD nv-tensorrt-repo-ubuntu1804-cuda11.4-trt8.2.5.1-ga-20220505_1-1_amd64.deb /tmp/.
RUN cd /tmp && \
    dpkg -i nv-tensorrt-repo-ubuntu1804-cuda11.4-trt8.2.5.1-ga-20220505_1-1_amd64.deb && \
    apt-key add /var/nv-tensorrt-repo-ubuntu1804-cuda11.4-trt8.2.5.1-ga-20220505/*.pub && \
    apt-get update && \
    apt-get install -y \
        libnvinfer8=8.2.5-1+cuda11.4 \
        libnvinfer-plugin8=8.2.5-1+cuda11.4 \
        libnvparsers8=8.2.5-1+cuda11.4 \
        libnvonnxparsers8=8.2.5-1+cuda11.4 \
        libnvinfer-bin=8.2.5-1+cuda11.4 \
        libnvinfer-dev=8.2.5-1+cuda11.4 \
        libnvinfer-plugin-dev=8.2.5-1+cuda11.4 \
        libnvparsers-dev=8.2.5-1+cuda11.4 \
        libnvonnxparsers-dev=8.2.5-1+cuda11.4 \
        libnvinfer-samples=8.2.5-1+cuda11.4 \
        libnvinfer-doc=8.2.5-1+cuda11.4

2. Build docker:

docker build -t trt_failure .

3. Run docker:

nvidia-docker run -it --privileged --env="DISPLAY" trt_failure /bin/bash

4. Build and run sample in docker:

cd /usr/src/tensorrt/samples/sampleMNIST && sudo make -j4 && /usr/src/tensorrt/bin/sample_mnist

This throws an exception:

Could not load library libcublasLt.so.10. Error: libcublasLt.so.10: cannot open shared object file: No such file or directory

It should not be loading cuda 10 libraries at all. ldd shows the sample binary and all libraries are only linked to 11, so I’m not sure where this 10 dependency is coming from.

gdb backtrace:

Could not load library libcublasLt.so.10. Error: libcublasLt.so.10: cannot open shared object file: No such file or directory

Thread 1 "sample_mnist" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fffdb3727f1 in __GI_abort () at abort.c:79
#2  0x00007fff894876b6 in ?? () from /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8
#3  0x00007fff8945bcab in cudnnCreate () from /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8
#4  0x00007fffde702227 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#5  0x00007fffde7029a6 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#6  0x00007fffddb4700b in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#7  0x00007fffddb4c84c in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#8  0x00007fffdde16da1 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#9  0x00007fffdde1c222 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#10 0x00007fffdde1cb18 in ?? () from /usr/lib/x86_64-linux-gnu/libnvinfer.so.8
#11 0x000055555540b3cf in nvinfer1::IBuilder::buildSerializedNetwork(nvinfer1::INetworkDefinition&, nvinfer1::IBuilderConfig&) ()
#12 0x00005555554086c6 in SampleMNIST::build() ()
#13 0x0000555555409f12 in main ()

This shows the libcudnn_ops_infer.so.8 is trying to load the libcublasLt.so.10 library, for whatever reason.

Out of curiosity, i tried installing libcublasLt.so.10 libraries along side, this means the libraries are found, however produces a different error. Likely because it’s using incompatible versions. I’ve also tried symlinking the 10 library to the 11.4 one, but that didn’t work.

Is this reproducible for you, is my TRT installation correct, any other ideas?

Hi,

Looks like your CUDA was not setup correctly.
Could you please take steps from the following.

To avoid setup related issues, we recommend you to use TensorRT NGC container.

Thank you.

It previously worked by because it would automatically install the latest cudnn, which would normally be cuda 11. However the latest cudnn release was only for 10.2, so it was now using this.

Pinning fixes the issue:

RUN echo '\
Package: libcudnn8\n\
Pin: origin\n\
Pin-Priority: 999\n\
\n\
Package: libcudnn8-dev\n\
Pin: origin\n\
Pin-Priority: 999' >> /etc/apt/preferences.d/cudnn.pref

A better fix would be not adding the cuda repos to the sources list, but I have to.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.