I am working on a device called ZF-ProAI that uses Nvidia-Xavier-SOC, CPU 8 Cores @ 2.1 GHz, GPU Volta, 4TPC with Linux tegra-ubuntu 4.14.78-rt44-tegra OS installed in it.
This hardware is sold with this preinstalled OS and with CUDA-10 .1 for AI development.
A standalone python application for “object detection works fine” on this hardware
Retinanet_resnet50_fpn model + python3.7 + Conda environment
Now I want to containerize this application, but I am unable to find an exact base docker-image from docker hub (Docker Hub). I built the docker image using approximately matching docker container
# Dockerfile FROM nvidia/cuda:11.2.1-base-ubuntu18.04
When I run the container with the image the following error shows up,
The GPU usage using “–gpus all” was explained in How to Use the GPU within a Docker Container
nvidia@tegra-ubuntu: docker run --gpus all gpu-nvidia-test docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown. ERRO error waiting for container: context canceled
After many unsuccessful tries.
I created a simple python application to check if docker container would be able to use GPU.
# Simple python application. import torch import time while(1): print("gpu usage =",torch.cuda.is_available()) # Prints true if GPU is available time.sleep(1)
I stopped using the command “- -gpus all” command and tried to “volume mounts” in the docker container and mounted the resources as docker volumes needed for the above shown python application to run in docker container.
# Container creation using volume nvidia@tegra-ubuntu:~$ sudo docker run -v '/usr/local:/usr/local' -v '/usr/lib:/usr/lib' -v '/usr/share:/usr/share' -e LD_LIBRARY_PATH='/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH' -e Path='/usr/local/cuda-10.1/bin' -it gpu-nvidia-test
Even after mounting it seems that the docker container with python application is not able to use the GPU. It shows the following error.
/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:100.) return torch._C._cuda_getDeviceCount() > 0 gpu usage = False gpu usage = False gpu usage = False gpu usage = False
Can someone help me with this problem. Thank you in advance.