Hi @pramodhrachuri, the official NVIDIA wheels for PyTorch on Jetson began releasing with JetPack 4.6.1 - I’m not sure if those wheels work on previous versions of JetPack-L4T. However, you can find other PyTorch wheels for JetPack 4.x at the top of this thread.
Hi I am getting the following error when trying to import torch.
python3 -c "import torch"
OSError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory
I am trying to use yolov7 with torch 1.9.0 within Docker.
I grabbed this wheel:
And added these installs in my Dockerfile based on feedback from this thread:
RUN apt-get update && apt-get install -y libopenblas-base libopenmpi-dev libomp-dev openmpi-bin
I am on python 3.6.9.
Any suggestions would be appreciated. Thank you!
Hi @gaversano, which version of JetPack are you running? Can you run this command below to show the versions of the OpenMPI libraries in your container?
find / -name 'libmpi*' /usr/lib/aarch64-linux-gnu/libmpi_java.so.20 /usr/lib/aarch64-linux-gnu/libmpi_mpifh.so.20.11.0 /usr/lib/aarch64-linux-gnu/libmpi.so.20.10.1 /usr/lib/aarch64-linux-gnu/libmpi_usempif08.so.20.10.0 /usr/lib/aarch64-linux-gnu/libmpi_usempi_ignore_tkr.so.20 /usr/lib/aarch64-linux-gnu/libmpi_java.so.20.10.0 /usr/lib/aarch64-linux-gnu/libmpi_mpifh.so.20 /usr/lib/aarch64-linux-gnu/libmpi_cxx.so.20 /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_mpifh.so.20.11.0 /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so.20.10.1 /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_usempif08.so.20.10.0 /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_java.so.20.10.0 /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_usempi_ignore_tkr.so.20.10.0 /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so.20.10.0 /usr/lib/aarch64-linux-gnu/libmpi_usempi_ignore_tkr.so.20.10.0 /usr/lib/aarch64-linux-gnu/libmpi_usempif08.so.20 /usr/lib/aarch64-linux-gnu/libmpi_cxx.so.20.10.0 /usr/lib/aarch64-linux-gnu/libmpi.so.20
That PyTorch 1.9 wheel was built against JetPack 4.x / Ubuntu 18.04, so it expects the MPI version to be libmpi.so.20
Hey I am running JetPack 4.5 (?) on Ubuntu 18.04.6 LTS.
# R32 (release), REVISION: 5.2, GCID: 27767740, BOARD: t186ref, EABI: aarch64, DATE: Fri Jul 9 16:05:07 UTC 2021
This is all I get so I suspect there may have been an issue with the MPI installation?
OK - is that from inside the container? What container are you using and what is it’s base container?
You could try purging/removing the MPI packages in the container first in case there are dangling references, and then re-install them.
Yes inside the container. My container is based on this (with the changes made described above).
OK, it appears that container is based on
nvcr.io/nvidia/l4t-ml:r32.4.4-py3 however you are on L4T R32.5 so instead it should be
nvcr.io/nvidia/l4t-ml:r32.5.0-py3. However I’m not sure if that’s related to your issue or not. Are you trying to use that existing container, just with PyTorch 1.9 instead of 1.7?
I would try this:
RUN apt-get purge -y libopenmpi-dev libopenmpi* openmpi-bin && \ apt-get install -y libopenmpi-dev openmpi-bin
Then check in the log that it actually installed the OpenMPI packages and verify with
find / -name 'libmpi*' again.
i made a very simple test with PyTorch and numpy :
I created a very large numpy array and multiplied itself by 2 10 times.
Same thing with PyTorch array.
Pytorch operations are much much more fast than numpy operations BUT :
The first time i multiply my PyTorch array by 2, it takes about 2s to perform the operation. The other operations are really fast (about 0.1 s when i get more than 1s with numpy operations).
Why is the first Pytorch operation so long ?
Hi Alain, the first time you use a PyTorch tensor on GPU, it takes extra time to initialize the CUDA context and load the kernels. PyTorch loads a lot of libraries that only get used when the first operation is performed. Other operations afterwards should be faster.
Many thanks for the highlights. It’s really interesting. Pytorch is really fast.
I will think about the routines I can convert with Pytorch. I will have to manage opencv, pycuda, numpy, pillow and Pytorch.
Is it possible to put pytorch tensor in a numpy array (and the other way) with low latency ? Is it easy to put pytorch tensor in a pillow array (and the other way) ?
Is there a link between pytorch and opencv ?
I ask many questions but if I have to make many translations between tensors and classical array, maybe I will loose the time I will earn with Pytorch ?
Hi @easybob, you can use
torch.from_numpy() to convert tensors to/from numpy arrays. IIRC the tensors need to be on CPU, so if the tensors are already on GPU you may need to call
.cpu() on them first before converting them to numpy. It’s that CPU<->GPU transfer that you may encounter additional latency, so you will want to try and minimize that if your application is latency-sensitive.
See here for functions from torchvision for converting tensors to/from PIL images: https://pytorch.org/vision/stable/transforms.html#conversion-transforms
I don’t believe there’s an explicit link between PyTorch and OpenCV, other than you can easily convert tensors to/from numpy arrays, and OpenCV cv2 Python module works with numpy arrays as well.
That’s great Dusty. I will start to work on a Pytorch version of JetsonSky.
I guess this will be interesting. Just have to work hard now.
Have a nice day.
Hello, I wonder if you can tell me why I got problems when I used pip to install torchprofile = 0.0.1 on my jetson nano. It told me that the I need to install torch >=1.4 and pip cannot find proper torch version, but I had already installed Pytorch 1.4.0 and torchvision = 0.5.0 as the instructions mentioned in this page, which I can use pip list to verify. My jetpack version is 4.2.0. The error messages are as follow:
Collecting torch>=1.4 (from torchprofile==0.0.1)
Could not find a version that satisfies the requirement torch>=1.4 (from torchprofile==0.0.1) (from versions: )
No matching distribution found for torch>=1.4 (from torchprofile==0.0.1)
i made a small test program to see if Pytorch / Torchvision can bring interesting things for JetsonSky.
Here is the test program :
import numpy as np import cv2 from PIL import Image, ImageFilter from torchvision import transforms as T from torchvision.transforms import functional as F import time Image_Test = '/home/alain/Work/Python/Pytorch/Images/4K_base.tif' Save_image_Pillow = '/home/alain/Work/Python/Pytorch/Images/4K_result_PILLOW.jpg' Save_image_OpenCV = '/home/alain/Work/Python/Pytorch/Images/4K_result_OpenCV.jpg' Save_image_Pytorch = '/home/alain/Work/Python/Pytorch/Images/4K_result_Pytorch.jpg' img_PIL = Image.open(Image_Test) img_OpenCV = cv2.imread(Image_Test,cv2.IMREAD_COLOR) for i in range(7) : start_time = time.perf_counter() img_PIL_Blur = img_PIL.filter(ImageFilter.GaussianBlur(radius=3)) stop_time = time.perf_counter() print(" Exec Pillow : ",(stop_time-start_time)*1000) img_PIL_Blur = img_PIL_Blur.save(Save_image_Pillow) print("") for i in range(7) : start_time = time.perf_counter() img_OpenCV_Blur = cv2.GaussianBlur(img_OpenCV,(11,11),cv2.BORDER_DEFAULT,) stop_time = time.perf_counter() print(" Exec OpenCV : ",(stop_time-start_time)*1000) cv2.imwrite(Save_image_OpenCV, img_OpenCV_Blur, [int(cv2.IMWRITE_JPEG_QUALITY), 95]) print("") image_tensor = F.to_tensor(img_PIL) image_tensor = image_tensor.to('cuda') transform = T.GaussianBlur(kernel_size=(11, 11), sigma=(1, 2)) for i in range(7) : start_time = time.perf_counter() image_tensor_blur = transform(image_tensor) stop_time = time.perf_counter() print(" Exec Pytorch : ",(stop_time-start_time)*1000) imag_Tensor_to_PIL = F.to_pil_image(image_tensor_blur, 'RGB') imag_Tensor_to_PIL = imag_Tensor_to_PIL.save(Save_image_Pytorch)
Very simple program : i load a 4K image and i apply a Gaussian Blur filter (5 times) using Pillow, OpenCV and Torchvision to see who is the faster library.
I have tested this program with my laptop and the AGX Orin. Here are the results :
4K image Gaussian Blur (PC windows i7-8750H + GTX1060 6GB) in ms
Exec Pillow : 151.3914000000227 Exec Pillow : 119.82139999997798 Exec Pillow : 119.87319999997226 Exec Pillow : 121.0879999999861 Exec Pillow : 119.32099999995671 Exec Pillow : 119.62960000005296 Exec Pillow : 123.94199999999955 Exec OpenCV : 10.811900000021524 Exec OpenCV : 7.164100000011331 Exec OpenCV : 7.870700000012221 Exec OpenCV : 8.089600000005248 Exec OpenCV : 8.753699999999753 Exec OpenCV : 7.0836000000440436 Exec OpenCV : 7.431500000052438 Exec Pytorch : 1506.0558999999785 Exec Pytorch : 51.52200000003404 Exec Pytorch : 66.1642000000029 Exec Pytorch : 20.14099999996688 Exec Pytorch : 62.205199999993965 Exec Pytorch : 12.609800000006999 Exec Pytorch : 57.274299999960476
4K image Gaussian Blur (Nvidia Jetson AGX Orin 64GB) in ms
Exec Pillow : 353.89680600019346 Exec Pillow : 304.88726800012955 Exec Pillow : 318.98665799963055 Exec Pillow : 313.87024899959215 Exec Pillow : 311.64901400006784 Exec Pillow : 323.0330470005356 Exec Pillow : 322.66202499977226 Exec OpenCV : 44.65785199954553 Exec OpenCV : 25.533990999974776 Exec OpenCV : 26.57884699965507 Exec OpenCV : 21.29485499972361 Exec OpenCV : 22.111289000349643 Exec OpenCV : 21.87230700019427 Exec OpenCV : 23.919809999824793 Exec Pytorch : 969.1214759996001 Exec Pytorch : 87.68033299929812 Exec Pytorch : 35.250306999841996 Exec Pytorch : 37.09012400031497 Exec Pytorch : 27.185547000044608 Exec Pytorch : 30.973184999311343 Exec Pytorch : 26.01031999984116
We can see i7-8750H CPU is faster than Orin CPU but we see Orin GPU is better than GTX1060 6GB.
Anyway, i am a bit disappointed when i compare OpenCV (CPU) and TorchVision (GPU).
Did i make something wrong with my test program ?
Hi Alain, I ran your script too and got similar results - I’m not sure if that OpenCV function is just faster than the torchvision equivalent or what (PyTorch has a focus on DNN training/inferencing). I might recommend trying the OpenCV CUDA module to see if that’s faster or trying out VPI.
Hi @wanggaouyuan, I don’t think pip picks up the version of your previously-installed PyTorch wheel correctly - can you try installing torchprofile with the
--no-dependencies flag or from source?
Almost a good thing if we get similar results. I thought I misunderstood something.
I will have to compile opencv with CUDA option and I will test it.
I have precompiled OpenCV + CUDA binaries for JetPack 5.x - you can find the URL here: https://github.com/dusty-nv/jetson-containers/blob/eb2307d40f0884d66310e9ac34633a4c5ef2e083/scripts/opencv_version.sh#L14
Oh, that’s great. I will try it asap.
I also tried VPI few months ago but I had an issue. I did not look further because no time left. I will give VPI a new try because this library seems really interesting.
If you install my sbts-install project it will install pytorch and torchvision and all of the dependencies for those jetpack versions in one command.
In addition, it also installs yolov7. Maybe you could replace the use of yolov5 with the better performing yolov7 ? If not, after installation you will have all of the dependencies you need to install yolov5 I think: