Dear Nvidia Support and Engineers!
We have a problem with the interaction of the CUDA that installed in the docker container (ubuntu 1604 + Nvidia driver 390.87 + CUDA) in the case of its launch under the control of RHEL7 + Nvidia driver 390.87.
This scheme has been worked out and is working properly if the container is launched under Ubuntu1604 + Nvidia driver 390.87.
Since our SmartCameras application was designed to work with driver 390.87 + CUDA to launch in the docker container,
we deployed the application, driver and modules CUDA and CUDDN to the docker container based on Ubuntu 16.04:
## FROM DOCKERFILE
## NVIDIA driver
RUN add-apt-repository ppa:graphics-drivers \
&& apt-get update -y\
&& DEBIAN_FRONTEND=noninteractive apt-get install nvidia-390 -yq \
&& DEBIAN_FRONTEND=interactive apt-mark hold nvidia-390 \
&& rm -rf /var/lib/apt/lists/*
## CUDA
COPY cuda/*.deb /opt/
RUN dpkg -i /opt/cuda-repo-ubuntu1604-9-1-local_9.1.85-1_amd64.deb \
&& apt-key add /var/cuda-repo-9-1-local/7fa2af80.pub \
&& apt-get update -y\
&& apt-get install cuda -y \
&& rm -f /opt/cuda-repo-ubuntu1604* \
&& rm -rf /var/lib/apt/lists/*
ENV LD_LIBRARY_PATH /usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
## export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
## export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
## CUDNN
COPY cudnn/*.deb /opt/
RUN dpkg -i /opt/libcudnn7_7.0.5.15-1+cuda9.1_amd64.deb \
&& dpkg -i /opt/libcudnn7-dev_7.0.5.15-1+cuda9.1_amd64.deb \
&& cp /usr/include/cudnn.h /usr/lib/x86_64-linux-gnu/ \
&& rm -f /opt/libcudnn*.deb
And on the host machine under the control of Ubuntu 16.04, we install only Nvidia Driver 390.87. This scheme works fine, but only under the control of Ubuntu 16.04 + Nvidia driver 390.87.
But - if the host machine is RedHat7 + Nvidia driver 390.87, when the application are running in the docker container - we get an error, as if the system CUDA is not available on the host machine, or the driver Nvidia is not installed or not available.
I attached a part of the Dockerfile with the installation of the driver and modules Cuda and Cudnn into the docker image.
And also attached printscreens of the nvidia-smi output from the host machine and from the docker container, which show that the driver is visible in both cases.
-
Output of the SmartCameras with error:
2018-10-26 19:58:57,270 ERROR FaceDetectorProcess PID 398: Traceback (most recent call last):
File “/usr/local/lib/python3.5/dist-packages/mxnet/symbol/symbol.py”, line 1512, in simple_bind
ctypes.byref(exe_handle)))
File “/usr/local/lib/python3.5/dist-packages/mxnet/base.py”, line 146, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [19:58:57] src/storage/storage.cc:114: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: unknown errorStack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x276938) [0x7f6029e8d938]
[bt] (1) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x276d48) [0x7f6029e8dd48]
[bt] (2) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x28bbfa6) [0x7f602c4d2fa6]
[bt] (3) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x28bec32) [0x7f602c4d5c32]
[bt] (4) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x28bf022) [0x7f602c4d6022]
[bt] (5) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2396be1) [0x7f602bfadbe1]
[bt] (6) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x241fbe3) [0x7f602c036be3]
[bt] (7) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2420bfc) [0x7f602c037bfc]
[bt] (8) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x2425420) [0x7f602c03c420]
[bt] (9) /usr/local/lib/python3.5/dist-packages/mxnet/libmxnet.so(+0x242ff8a) [0x7f602c046f8a]
What do you recommend for us to undertake, so that the CUDA becomes available to the application?
Thanks!
Cortica, SmartCameras DevOps Team.