Cicc not in nvcr.io/nvidia/l4t-base:r32.3.1

cicc can be found on the latest JetPack host at /usr/local/cuda/nvvm/bin, but it’s nowhere to be found in nvcr.io/nvidia/l4t-base:r32.3.1

Unfortunately this means OpenCV fails with sh: 1: cicc: not found when building GpuMat. It seems /usr/local/cuda/nvvm is only available at runtime with --runtime nvidia.

Hi, do you have any workaround for this? I’m facing the same problem trying to compile the jetson-inference model inside that save image (nvcr.io/nvidia/l4t-base:r32.3.1).

Thank you.
Best regards.

So basically because of the bind mounting approach Nvidia chose for --runtime nvidia, those files are only available at runtime, not at docker build. You can set the default runtime to be nvidia so those files are mounted at build time, but for various reasons I would very very strongly recommend against doing that (I can’t emphasize that enough).

Instead, my approach to fixing this was to install cuda inside the image during the build stage and remove it before the build is complete, in the same layer so as not to increase image size. Please see the Dockerfile and build script in the docker branch of my opencv repo here for an example of how to do it. If you need a base l4t image with apt sources enabled, you can use these images here as bases (or build the linked Dockerfile yourself).

Hi, thanks for the info. I just gave it a quick look and saw that as part of the cuda installation part you install these 3 apt packages (amongst others):

    cuda-compiler-10-2 \
    cuda-minimal-build-10-2 \
    cuda-libraries-dev-10-2 

But they are not available inside the l4t-base.r32.3.1 image after apt update (the apt install autocomplete does not suggest any of them)… Did you have to add some alternative sources?

Thank you.

EDIT: Just saw that the base image of the one you pointed me to specifically adds the missing apt sources I was referring to, so problem solved so far. Thanks.

Hi, I’m getting a core dump importing cv2 from one of your images: docker.io/mdegans/tegra-opencv:jp-r32.3.1-cv-4.3.0
Am I missing something?
I’m running JetPack 4.3 on jetson nano.

drakorg@drakorg-desktop:~/workspace/nspi/docker_/images/jetson-nano/jp-r32.3.1.-cv-4.3.0-nspi-base$ docker pull docker.io/mdegans/tegra-opencv:jp-r32.3.1-cv-4.3.0
jp-r32.3.1-cv-4.3.0: Pulling from mdegans/tegra-opencv
Digest: sha256:dd1bf2da56d18f497bf0882645d2120b0765fe53680615fdeebcf38dcfea5c4c
Status: Image is up to date for mdegans/tegra-opencv:jp-r32.3.1-cv-4.3.0
docker.io/mdegans/tegra-opencv:jp-r32.3.1-cv-4.3.0
drakorg@drakorg-desktop:~/workspace/nspi/docker_/images/jetson-nano/jp-r32.3.1.-cv-4.3.0-nspi-base$ docker run -it --rm docker.io/mdegans/tegra-opencv:jp-r32.3.1-cv-4.3.0 bash
root@da7ca7f218a4:/usr/local/src/build_opencv# python3
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
Segmentation fault (core dumped)
root@da7ca7f218a4:/usr/local/src/build_opencv#

You must add --runtime nvidia (to docker run) to access the gpu and various runtime libraries. Without that it’ll unceremoniously segfault. Thank you for testing, btw!

edit: once you import cv2, you can check that it’s working like this:

>>> cv2.cuda.printCudaDeviceInfo(0)
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***

Device count: 1

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.0 / 10.0
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3964 MBytes (4156911616 bytes)
  GPU Clock Speed:                               0.92 GHz
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           0 / 0
  Compute Mode:
      Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1

Hi, please, thank you for taking the time to produce such images.
I followed your advice but I’m getting a different kind of error when doing so …

drakorg@drakorg-desktop:~/workspace/nspi/docker_/images/jetson-nano/detection_server$ docker run --runtime nvidia -it --rm docker.io/mdegans/tegra-opencv:jp-r32.3.1-cv-4.3.0 bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
drakorg@drakorg-desktop:~/workspace/nspi/docker_/images/jetson-nano/detection_server$ sudo docker run --runtime nvidia -it --rm docker.io/mdegans/tegra-opencv:jp-r32.3.1-cv-4.3.0 bash
[sudo] password for drakorg:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.

I tried both as my regular user (which has docker privileges) and also as root, but getting the same error on both occasions.

I just tested:

 sudo docker run -it --rm --runtime nvidia mdegans/tegra-opencv:jp-r32.3.1-cv-4.3.0

… and it works on my nano. I noticed your hostname is drakorg@drakorg-desktop You aren’t acidentally trying this on your host are you? If so, it happens to me all the time. Too many terminals.

Hi, err, no, I don’t have docker inside the container if that’s what you mean.
The error is produced at the host when trying to run the image … I’m not able to instantiate the image when adding the --runtime nvidia.

Are you under JetPack 4.3 or 4.4 DP?
I’ve a bad feeling there’s something odd about my docker/nvidia-docker installation … it’s pretty much a fresh installation, but it wouldn’t hurt reflashing everything and starting from a fresh 4.3 if this problem still persists.

I mean are you running that on your x86 desktop or is drakorg-desktop your Nano’s hostname? It’s easy to type in the wrong terminal window if you’re using ssh all the time. Just making sure we’re not accidentally on the wrong architecture.

I just ran that command ssh’ed into my nano, which is on 4.3 at the moment. It’s a new flash. My other devices are at 4.4. I would recommend using 4.4 since there have been some important improvements to the Nvidia Docker runtime, but it’s up to you (eg. cudnn is broken in 4.3, i think).

something odd about my docker/nvidia-docker installation

Possibly. One of the reasons I would recommend 4.4 is becuase some possible issues are fixed. As a side note, I’d recommend not putting a user in the docker group since it gives that user root privileges without a password or any sort of logging. You can easily remount / and shred everything with the --privileged flag, for example. Sudo is more typing but requires some interactive confirmation. It’s up to you and your individual situation.

No, drakorg-desktop is indeed the hostname of the nano. I just left it at the default suggested hostname after the nth reflash. I would have loved to move to 4.4, but when I did I found that dlib’s perfomance for face recognition jumped from 500ms per frame in JetPack 4.3 to 2 full seconds on 4.4. I opened an issue open in the modules tracker (https://github.com/ageitgey/face_recognition/issues/1130) where I explain the problem with more detail. That basically made me stay away from 4.4 so far. Also, 4.4 has some issue with OpenGL drivers, that prevent me from redirecting X11 sessions via ssh, where in 4.3 it works just fine (as long as I do not apply the recommended patches … if I do, it breaks the mesa drivers and triggers the same problem I face in a fresh install of 4.4 DP). I was hoping for all of this to dissapear on 4.4 final release, so my plan was to try to stay in 4.3 until then at least.

Note taken on the risk of putting the user in the docker group. I’ll have it mind for deployment.

Re 4.4 problems. Gotcha. In this case just keep in mind cudnn may or may not work on 4.3. Only the runtime library itself is mounted inside (by the nvidia runtime) and not a bunch of other files that may be needed. It’s untested. The image itself however should run like this:

[anzu@nanew] -- [~]
 $ sudo docker run -it --rm --runtime nvidia mdegans/tegra-opencv:jp-r32.3.1-cv-4.3.0
root@399732a85548:/usr/local/src/build_opencv# python3
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> cv2.cuda.printCudaDeviceInfo(0)
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***

Device count: 1

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.0 / 10.0
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3964 MBytes (4156911616 bytes)
  GPU Clock Speed:                               0.92 GHz
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           0 / 0
  Compute Mode:
      Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 10.0, CUDA Runtime Version = 10.0, NumDevs = 1

Please let me know if you try those commands and don’t get similar output.

I will reflash to get a fresh 4.3 installation and try again tomorrow … I’m quite confident that it will work, unless of course you have some tweaks on your system that I’m not aware of. I’ll post my results here as soon as I try it out.

1 Like

Re: tweaks, I don’t think so. I just use that device for testing 4.3, but it’s possible since there are some extras like DeepStream installed. Please let me know if there is a problem, tho.

Hi, sorry for the delay. Checking in just to tell that it worked like a charm with a fresh install of 4.3. I must have something messed up with my docker in the installation I was using. I still haven’t been able to put more time on it, but at least I wanted to tell you that everything is working fine now, at least up to that part. I can hardly wait to resume.

Thank you a lot.
Best regards,
Eduardo.

Glad it works! Please post here or report an issue on Github if you find any broken functionality. I know not all the tests pass for OpenCV on Tegra, so YMMV.

Hi, could you please elaborate on why you recommend against setting the default runtime to nvidia?
Thank you.
Best regards.
Eduardo

Got a few reasons, but one is because it would be used for every image built, not just Nvidia’s, and that might lead to odd behaviour. The bind mounting approach replaces a lot of stuff in the image and it breaks assumptions about repeatability. For example, some configure script could find a dependency at build time and the your built image won’t run on a generic aarch64 machine or on another Tegra machine without --runtime nvidia.

On the other hand, temporarily installing the deps, building the thing, and purging the dep that will be mounted at runtime anyway forces this step to be explicit so it won’t happen accidentally and lead to really strange results.

Another issue is security re: the principle of least privilege. The approach --runtime nvidia takes has some element of risk and it’s better to avoid that if not necessary, imo. It’s like you can use --privileged all the time where it’s often better to pass only a --device or two.