Nvidia Driver 390.87 + CUDA, Ubuntu 16.04 docker container, Python3. Host RHEL7.5

dmitro.biz · October 28, 2018, 5:40am

Dear Nvidia Support and Engineers!

We have a problem with the interaction of the CUDA that installed in the docker container (ubuntu 1604 + Nvidia driver 390.87 + CUDA) in the case of its launch under the control of RHEL7 + Nvidia driver 390.87.

This scheme has been worked out and is working properly if the container is launched under Ubuntu1604 + Nvidia driver 390.87.

Since our SmartCameras application was designed to work with driver 390.87 + CUDA to launch in the docker container,
we deployed the application, driver and modules CUDA and CUDDN to the docker container based on Ubuntu 16.04:

## FROM DOCKERFILE
## NVIDIA driver
RUN add-apt-repository ppa:graphics-drivers \
    && apt-get update -y\
    && DEBIAN_FRONTEND=noninteractive apt-get install nvidia-390 -yq \
    && DEBIAN_FRONTEND=interactive apt-mark hold nvidia-390 \
    && rm -rf /var/lib/apt/lists/*

## CUDA
COPY cuda/*.deb /opt/
RUN dpkg -i /opt/cuda-repo-ubuntu1604-9-1-local_9.1.85-1_amd64.deb \
    && apt-key add /var/cuda-repo-9-1-local/7fa2af80.pub \
    && apt-get update -y\
    && apt-get install cuda -y \
    && rm -f /opt/cuda-repo-ubuntu1604* \
    && rm -rf /var/lib/apt/lists/*
ENV LD_LIBRARY_PATH /usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
## export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
## export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

## CUDNN
COPY cudnn/*.deb /opt/
RUN dpkg -i /opt/libcudnn7_7.0.5.15-1+cuda9.1_amd64.deb \
    && dpkg -i /opt/libcudnn7-dev_7.0.5.15-1+cuda9.1_amd64.deb \
    && cp /usr/include/cudnn.h /usr/lib/x86_64-linux-gnu/ \
    && rm -f /opt/libcudnn*.deb

And on the host machine under the control of Ubuntu 16.04, we install only Nvidia Driver 390.87. This scheme works fine, but only under the control of Ubuntu 16.04 + Nvidia driver 390.87.

But - if the host machine is RedHat7 + Nvidia driver 390.87, when the application are running in the docker container - we get an error, as if the system CUDA is not available on the host machine, or the driver Nvidia is not installed or not available.

I attached a part of the Dockerfile with the installation of the driver and modules Cuda and Cudnn into the docker image.
And also attached printscreens of the nvidia-smi output from the host machine and from the docker container, which show that the driver is visible in both cases.

Output of the SmartCameras with error:

2018-10-26 19:58:57,270 ERROR FaceDetectorProcess PID 398: Traceback (most recent call last):
File “/usr/local/lib/python3.5/dist-packages/mxnet/symbol/symbol.py”, line 1512, in simple_bind
ctypes.byref(exe_handle)))
File “/usr/local/lib/python3.5/dist-packages/mxnet/base.py”, line 146, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [19:58:57] src/storage/storage.cc:114: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: unknown error

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x276938) [0x7f6029e8d938]
[bt] (1) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x276d48) [0x7f6029e8dd48]
[bt] (2) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x28bbfa6) [0x7f602c4d2fa6]
[bt] (3) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x28bec32) [0x7f602c4d5c32]
[bt] (4) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x28bf022) [0x7f602c4d6022]
[bt] (5) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2396be1) [0x7f602bfadbe1]
[bt] (6) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x241fbe3) [0x7f602c036be3]
[bt] (7) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2420bfc) [0x7f602c037bfc]
[bt] (8) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2425420) [0x7f602c03c420]
[bt] (9) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x242ff8a) [0x7f602c046f8a]

What do you recommend for us to undertake, so that the CUDA becomes available to the application?

Thanks!
Cortica, SmartCameras DevOps Team.

dmitro.biz · October 29, 2018, 12:54pm

Today we have already resolved our issue with a help of

We pulled and ran it on RedHat7.5, after that our application within ubuntu based docker - started correctly.
Nvidia Cuda works well.

Thank you!

Please close Question.

#GitHub - NVIDIA/nvidia-docker: Build and run Docker containers leveraging NVIDIA GPUs

If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers

docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo yum remove nvidia-docker

Add the package repositories

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo |
sudo tee /etc/yum.repos.d/nvidia-docker.repo

Install nvidia-docker2 and reload the Docker daemon configuration

sudo yum install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

Test nvidia-smi with the latest official CUDA image

docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

Topic		Replies	Views
command "docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi" fails with Error CUDA Setup and Installation	1	10005	January 16, 2019
Nvidia driver-container does not work after restart Docker and NVIDIA Docker	7	6369	March 24, 2022
Cuda 11.4.2 docker image driver version mismatch CUDA Setup and Installation	2	4283	January 7, 2022
Running Cuda on Docker CUDA Setup and Installation	7	17332	May 23, 2016
Failed to initialize NVML: Unknown Error when running nvidia-smi on Docker container CUDA Programming and Performance cuda , ubuntu , docker	2	10565	October 18, 2020
CUDA 9.1 setup and NVIDIA 390 driver not found on Ubuntu 16.04 CUDA Setup and Installation	14	12295	March 15, 2018
Installing new nvidia drivers and cuda and cudnn on an nvidia geforce 1050 ti? Drivers - Linux, Windows, MacOS cuda , ubuntu , cudnn	2	2906	January 8, 2024
Issues with cuda-12.6.0-1.x86_64 from RHEL8 repo CUDA Setup and Installation	12	3981	September 4, 2024
Ubuntu 16.04 problem with cuda 9.1 + 390.30 driver! CUDA Setup and Installation	8	14380	February 22, 2018
CUDA Repo. Update Issues - NVIDIA-RedHat Linux CUDA Setup and Installation cuda	3	1206	September 25, 2024

Nvidia Driver 390.87 + CUDA, Ubuntu 16.04 docker container, Python3. Host RHEL7.5

If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers

Add the package repositories

Install nvidia-docker2 and reload the Docker daemon configuration

Test nvidia-smi with the latest official CUDA image

Related topics