OSError: libcurand.so.10: cannot open shared object file: No such file or directory

aktaseren91 · October 13, 2021, 10:08am

I am using Nano SD. The system I have is L4t-r32.5. uname -a is below:

uname -a
Linux 97823c5 4.9.201-l4t-r32.5 #1 SMP PREEMPT Thu May 6 13:07:24 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

I prepared my dockerfile to use my deep learning algorithm requiring Cuda enabled Pytorch >= 1.0 and python >= 3.6. Dockerfile is completely working but the Nano device is giving the error below.

import torch
File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 188, in <module>
 _load_global_deps()
 File "/usr/local/lib/python3.6/dist-packages/torch/__init__.py", line 141, in _load_global_deps
 ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
 File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
 self._handle = _dlopen(self._name, mode)
 OSError: libcurand.so.10: cannot open shared object file: No such file or directory

I looked at similar issues reported by Nvidia blogs. Most of them suggest that the L4T distribution release version should be matched with the Pytorch Nvidia container based on the JetPack version I have. So, since I have L4T-32.5, I have been pulling the container below in my dockerfile.

FROM nvcr.io/nvidia/l4t-pytorch:r32.5.0-pth1.6-py3

However, I am still getting the error. Do you have any suggestions to solve this issue?

dusty_nv · October 13, 2021, 7:16pm

Hi @aktaseren91, did you start your container using the --runtime nvidia flag to the docker run command?

aktaseren91 · October 14, 2021, 5:00am

Hi @dusty_nv, thank you for the quick response.

I manage my deployments remotely for the devices I have with Balena Cloud Platform. As suggested in Balena Documentation, I pushed Dockerfile with the command below.

Balena push

As you suggested, I also added —runtime Nvidia flag into that command just now as well. However, it didn’t work.

dusty_nv · October 14, 2021, 1:22pm

Hi @aktaseren91, are you able to import torch when using just the nvcr.io/nvidia/l4t-pytorch:r32.5.0-pth1.6-py3 base container?

I’m unfamiliar with the Balena platform, so you may want to contact their support. Are you otherwise using the normal JetPack image from NVIDIA, or Balena OS?

aktaseren91 · October 18, 2021, 11:41am

@dusty_nv, unfortunately I am not able to import torch when I run it. I actually researched the issue. I recognized that Nvidia Runtime Container is not involved in BalenaOS. Therefore, I aimed to install this toolkit and then configure daemon.json file in it to go ahead with my Dockerfile.

However, I am struggling to install Nvidia Runtime Container even if Balena says that BalenaOS has Docker version 19.03.23. BalenaOS seems Yocto Linux based host OS having JetPack 4.5 (L4T version - r32.5.0) As you suggested on another similar error reported on Nvidia blog, I looked at https://nvidia.github.io/nvidia-container-runtime/ . In this repository, I couldn’t find whether BalenaOS is officially supported or not. Am I missing anything here?

dusty_nv · October 18, 2021, 6:00pm

Hi @aktaseren91, you seem to have found the issue - I would get in touch with Balena about how you are supposed to properly install the NVIDIA Container Runtime on BalenaOS, such that you can run GPU-accelerated containers.

aktaseren91 · October 19, 2021, 9:17am

Thanks a lot for the brainstorm. Balena blog guided me on this just now. They suggested a repo titled jetson nano cuda sample. Similar to the guide repo they suggested, I encountered Balena Hub which consists of full of projects realized. Some of them are similar to what I am doing with Nvidia devices.

For example,
ROS2-Pose estimation example project was done with the usage of libraries Cuda+PyTorch+OpenCV. It is a little complex but I tried it and it is working very well.

David_T · October 19, 2021, 4:57pm

Thanks @dusty_nv – we’ll make sure aktaseren91 is taken care of over on our Forums. :-)

dusty_nv · October 19, 2021, 6:36pm

Perfect, thanks @aktaseren91 @David_T!

system · November 2, 2021, 6:37pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.