Nvidia Driver 390.87 + CUDA, Ubuntu 16.04 docker container, Python3. Host RHEL7.5

Dear Nvidia Support and Engineers!

We have a problem with the interaction of the CUDA that installed in the docker container (ubuntu 1604 + Nvidia driver 390.87 + CUDA) in the case of its launch under the control of RHEL7 + Nvidia driver 390.87.

This scheme has been worked out and is working properly if the container is launched under Ubuntu1604 + Nvidia driver 390.87.

Since our SmartCameras application was designed to work with driver 390.87 + CUDA to launch in the docker container,
we deployed the application, driver and modules CUDA and CUDDN to the docker container based on Ubuntu 16.04:

## FROM DOCKERFILE
## NVIDIA driver
RUN add-apt-repository ppa:graphics-drivers \
    && apt-get update -y\
    && DEBIAN_FRONTEND=noninteractive apt-get install nvidia-390 -yq \
    && DEBIAN_FRONTEND=interactive apt-mark hold nvidia-390 \
    && rm -rf /var/lib/apt/lists/*

## CUDA
COPY cuda/*.deb /opt/
RUN dpkg -i /opt/cuda-repo-ubuntu1604-9-1-local_9.1.85-1_amd64.deb \
    && apt-key add /var/cuda-repo-9-1-local/7fa2af80.pub \
    && apt-get update -y\
    && apt-get install cuda -y \
    && rm -f /opt/cuda-repo-ubuntu1604* \
    && rm -rf /var/lib/apt/lists/*
ENV LD_LIBRARY_PATH /usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
## export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
## export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

## CUDNN
COPY cudnn/*.deb /opt/
RUN dpkg -i /opt/libcudnn7_7.0.5.15-1+cuda9.1_amd64.deb \
    && dpkg -i /opt/libcudnn7-dev_7.0.5.15-1+cuda9.1_amd64.deb \
    && cp /usr/include/cudnn.h /usr/lib/x86_64-linux-gnu/ \
    && rm -f /opt/libcudnn*.deb

And on the host machine under the control of Ubuntu 16.04, we install only Nvidia Driver 390.87. This scheme works fine, but only under the control of Ubuntu 16.04 + Nvidia driver 390.87.

But - if the host machine is RedHat7 + Nvidia driver 390.87, when the application are running in the docker container - we get an error, as if the system CUDA is not available on the host machine, or the driver Nvidia is not installed or not available.

I attached a part of the Dockerfile with the installation of the driver and modules Cuda and Cudnn into the docker image.
And also attached printscreens of the nvidia-smi output from the host machine and from the docker container, which show that the driver is visible in both cases.

  • Output of the SmartCameras with error:

    2018-10-26 19:58:57,270 ERROR FaceDetectorProcess PID 398: Traceback (most recent call last):
    File “/usr/local/lib/python3.5/dist-packages/mxnet/symbol/symbol.py”, line 1512, in simple_bind
    ctypes.byref(exe_handle)))
    File “/usr/local/lib/python3.5/dist-packages/mxnet/base.py”, line 146, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
    mxnet.base.MXNetError: [19:58:57] src/storage/storage.cc:114: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: unknown error

    Stack trace returned 10 entries:
    [bt] (0) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x276938) [0x7f6029e8d938]
    [bt] (1) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x276d48) [0x7f6029e8dd48]
    [bt] (2) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x28bbfa6) [0x7f602c4d2fa6]
    [bt] (3) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x28bec32) [0x7f602c4d5c32]
    [bt] (4) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x28bf022) [0x7f602c4d6022]
    [bt] (5) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2396be1) [0x7f602bfadbe1]
    [bt] (6) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x241fbe3) [0x7f602c036be3]
    [bt] (7) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2420bfc) [0x7f602c037bfc]
    [bt] (8) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2425420) [0x7f602c03c420]
    [bt] (9) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x242ff8a) [0x7f602c046f8a]

What do you recommend for us to undertake, so that the CUDA becomes available to the application?

Thanks!
Cortica, SmartCameras DevOps Team.

Today we have already resolved our issue with a help of


We pulled and ran it on RedHat7.5, after that our application within ubuntu based docker - started correctly.
Nvidia Cuda works well.

Thank you!

Please close Question.

#https://github.com/NVIDIA/nvidia-docker

If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers

docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo yum remove nvidia-docker

Add the package repositories

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo |
sudo tee /etc/yum.repos.d/nvidia-docker.repo

Install nvidia-docker2 and reload the Docker daemon configuration

sudo yum install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

Test nvidia-smi with the latest official CUDA image

docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi