Jetson TX2 Unable to access or use gpu from docker

Hi All,

docker runs normally without using gpu

But when I use option --runtime nvidia,error appeared.

Cannot be easily flash due to embedded board.

Is there a way to resolve the error without flash?

  1. Issue

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

  1. Steps to reproduce the issue

docker run --runtime nvidia --network host -it nvcr.io/nvidia/l4t-base:r32.2.1

  1. Information to attach (optional if deemed irrelevant)

    1. sudo nvidia-container-cli -k -d /dev/tty info

      
      -- WARNING, the following logs are for debugging purposes only --
      
      I0907 08:05:09.483501 28389 nvc.c:372] initializing library context (version=1.5.0, build=4699c1b8b4991b6d869ea403e109291653bb040b)
      
      I0907 08:05:09.483604 28389 nvc.c:346] using root /
      
      I0907 08:05:09.483619 28389 nvc.c:347] using ldcache /etc/ld.so.cache
      
      I0907 08:05:09.483630 28389 nvc.c:348] using unprivileged user 65534:65534
      
      I0907 08:05:09.483673 28389 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
      
      I0907 08:05:09.483865 28389 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
      
      W0907 08:05:09.483985 28389 nvc.c:254] failed to detect NVIDIA devices
      
      I0907 08:05:09.484438 28390 nvc.c:274] loading kernel module nvidia
      
      E0907 08:05:09.485108 28390 nvc.c:276] could not load kernel module nvidia
      
      I0907 08:05:09.485168 28390 nvc.c:292] loading kernel module nvidia_uvm
      
      E0907 08:05:09.485433 28390 nvc.c:294] could not load kernel module nvidia_uvm
      
      I0907 08:05:09.485486 28390 nvc.c:301] loading kernel module nvidia_modeset
      
      E0907 08:05:09.485807 28390 nvc.c:303] could not load kernel module nvidia_modeset
      
      I0907 08:05:09.486481 28391 driver.c:101] starting driver service
      
      E0907 08:05:09.486915 28391 driver.c:168] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
      
      I0907 08:05:09.487231 28389 driver.c:203] driver service terminated successfully
      
      nvidia-container-cli: initialization error: driver error: failed to process request
      
      
    2. cat /etc/nv_tegra_release

      
      R32 (release), REVISION: 2.1, GCID: 16294929, BOARD: t186ref, EABI: aarch64, DATE: Tue Aug 13 04:45:36 UTC 2019
      
      
    3. docker -v

      
      Docker version 18.09.7, build 2d0083d
      
      
    4. dpkg -l ‘*nvidia*’

      
      Desired=Unknown/Install/Remove/Purge/Hold
      
      | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
      
      |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
      
      ||/ Name                               Version                Architecture           Description
      
      +++-==================================-======================-======================-==========================================================================
      
      un  libgldispatch0-nvidia              <none>                 <none>                 (no description available)
      
      ii  libnvidia-container-tools          1.5.0-1                arm64                  NVIDIA container runtime library (command-line tools)
      
      ii  libnvidia-container0:arm64         0.9.0~beta.1           arm64                  NVIDIA container runtime library
      
      ii  libnvidia-container1:arm64         1.5.0-1                arm64                  NVIDIA container runtime library
      
      un  nvidia-304                         <none>                 <none>                 (no description available)
      
      un  nvidia-340                         <none>                 <none>                 (no description available)
      
      un  nvidia-384                         <none>                 <none>                 (no description available)
      
      un  nvidia-common                      <none>                 <none>                 (no description available)
      
      ii  nvidia-container-runtime           3.5.0-1                arm64                  NVIDIA container runtime
      
      un  nvidia-container-runtime-hook      <none>                 <none>                 (no description available)
      
      ii  nvidia-container-toolkit           1.5.1-1                arm64                  NVIDIA container runtime hook
      
      un  nvidia-cuda-dev                    <none>                 <none>                 (no description available)
      
      un  nvidia-libopencl1-dev              <none>                 <none>                 (no description available)
      
      un  nvidia-prime                       <none>                 <none>                 (no description available)
      
      

Hi,

Since r32.2.1 is quite old, would you mind reflashing your device?
Although we have the container, the beta version of nvidia runtime plugin starts from r32.3.1.

Thanks.

hi, thank you for reply

It was difficult to flash everything, so I reinserted only the components in the SDK.
As a result, I was able to use the GPU in the container without any errors.