PyTorch for Jetson

Hi @pramodhrachuri, the official NVIDIA wheels for PyTorch on Jetson began releasing with JetPack 4.6.1 - I’m not sure if those wheels work on previous versions of JetPack-L4T. However, you can find other PyTorch wheels for JetPack 4.x at the top of this thread.

Hi I am getting the following error when trying to import torch.

python3 -c "import torch"

OSError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory

I am trying to use yolov7 with torch 1.9.0 within Docker.

I grabbed this wheel:

https://nvidia.box.com/shared/static/h1z9sw4bb1ybi0rm3tu8qdj8hs05ljbm.whl

And added these installs in my Dockerfile based on feedback from this thread:

RUN apt-get update && apt-get install -y libopenblas-base libopenmpi-dev libomp-dev openmpi-bin

I am on python 3.6.9.

Any suggestions would be appreciated. Thank you!

Hi @gaversano, which version of JetPack are you running? Can you run this command below to show the versions of the OpenMPI libraries in your container?

find / -name 'libmpi*'
/usr/lib/aarch64-linux-gnu/libmpi_java.so.20
/usr/lib/aarch64-linux-gnu/libmpi_mpifh.so.20.11.0
/usr/lib/aarch64-linux-gnu/libmpi.so.20.10.1
/usr/lib/aarch64-linux-gnu/libmpi_usempif08.so.20.10.0
/usr/lib/aarch64-linux-gnu/libmpi_usempi_ignore_tkr.so.20
/usr/lib/aarch64-linux-gnu/libmpi_java.so.20.10.0
/usr/lib/aarch64-linux-gnu/libmpi_mpifh.so.20
/usr/lib/aarch64-linux-gnu/libmpi_cxx.so.20
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_mpifh.so.20.11.0
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so.20.10.1
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_usempif08.so.20.10.0
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_java.so.20.10.0
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_usempi_ignore_tkr.so.20.10.0
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so.20.10.0
/usr/lib/aarch64-linux-gnu/libmpi_usempi_ignore_tkr.so.20.10.0
/usr/lib/aarch64-linux-gnu/libmpi_usempif08.so.20
/usr/lib/aarch64-linux-gnu/libmpi_cxx.so.20.10.0
/usr/lib/aarch64-linux-gnu/libmpi.so.20

That PyTorch 1.9 wheel was built against JetPack 4.x / Ubuntu 18.04, so it expects the MPI version to be libmpi.so.20

Hey I am running JetPack 4.5 (?) on Ubuntu 18.04.6 LTS.

cat /etc/nv_tegra_release

# R32 (release), REVISION: 5.2, GCID: 27767740, BOARD: t186ref, EABI: aarch64, DATE: Fri Jul  9 16:05:07 UTC 2021

This is all I get so I suspect there may have been an issue with the MPI installation?

/etc/alternatives/libmpi++.so
/etc/alternatives/libmpi.so

OK - is that from inside the container? What container are you using and what is it’s base container?

You could try purging/removing the MPI packages in the container first in case there are dangling references, and then re-install them.

Yes inside the container. My container is based on this (with the changes made described above).

OK, it appears that container is based on nvcr.io/nvidia/l4t-ml:r32.4.4-py3 however you are on L4T R32.5 so instead it should be nvcr.io/nvidia/l4t-ml:r32.5.0-py3. However I’m not sure if that’s related to your issue or not. Are you trying to use that existing container, just with PyTorch 1.9 instead of 1.7?

I would try this:

RUN apt-get purge -y libopenmpi-dev libopenmpi* openmpi-bin && \
    apt-get install -y libopenmpi-dev openmpi-bin

Then check in the log that it actually installed the OpenMPI packages and verify with find / -name 'libmpi*' again.

Hello,

i made a very simple test with PyTorch and numpy :

I created a very large numpy array and multiplied itself by 2 10 times.
Same thing with PyTorch array.

Pytorch operations are much much more fast than numpy operations BUT :

The first time i multiply my PyTorch array by 2, it takes about 2s to perform the operation. The other operations are really fast (about 0.1 s when i get more than 1s with numpy operations).

Why is the first Pytorch operation so long ?

Alain

Hi Alain, the first time you use a PyTorch tensor on GPU, it takes extra time to initialize the CUDA context and load the kernels. PyTorch loads a lot of libraries that only get used when the first operation is performed. Other operations afterwards should be faster.

Hello Dusty,

Many thanks for the highlights. It’s really interesting. Pytorch is really fast.
I will think about the routines I can convert with Pytorch. I will have to manage opencv, pycuda, numpy, pillow and Pytorch.

Is it possible to put pytorch tensor in a numpy array (and the other way) with low latency ? Is it easy to put pytorch tensor in a pillow array (and the other way) ?

Is there a link between pytorch and opencv ?

I ask many questions but if I have to make many translations between tensors and classical array, maybe I will loose the time I will earn with Pytorch ?

Alain

Hi @easybob, you can use Tensor.numpy() and torch.from_numpy() to convert tensors to/from numpy arrays. IIRC the tensors need to be on CPU, so if the tensors are already on GPU you may need to call .cpu() on them first before converting them to numpy. It’s that CPU<->GPU transfer that you may encounter additional latency, so you will want to try and minimize that if your application is latency-sensitive.

See here for functions from torchvision for converting tensors to/from PIL images: https://pytorch.org/vision/stable/transforms.html#conversion-transforms

I don’t believe there’s an explicit link between PyTorch and OpenCV, other than you can easily convert tensors to/from numpy arrays, and OpenCV cv2 Python module works with numpy arrays as well.

That’s great Dusty. I will start to work on a Pytorch version of JetsonSky.

I guess this will be interesting. Just have to work hard now.

Have a nice day.

Alain

Hello, I wonder if you can tell me why I got problems when I used pip to install torchprofile = 0.0.1 on my jetson nano. It told me that the I need to install torch >=1.4 and pip cannot find proper torch version, but I had already installed Pytorch 1.4.0 and torchvision = 0.5.0 as the instructions mentioned in this page, which I can use pip list to verify. My jetpack version is 4.2.0. The error messages are as follow:

Collecting torch>=1.4 (from torchprofile==0.0.1)
Could not find a version that satisfies the requirement torch>=1.4 (from torchprofile==0.0.1) (from versions: )
No matching distribution found for torch>=1.4 (from torchprofile==0.0.1)

Hello,

i made a small test program to see if Pytorch / Torchvision can bring interesting things for JetsonSky.

Here is the test program :

import numpy as np
import cv2
from PIL import Image, ImageFilter
from torchvision import transforms as T
from torchvision.transforms import functional as F
import time

Image_Test = '/home/alain/Work/Python/Pytorch/Images/4K_base.tif'
Save_image_Pillow = '/home/alain/Work/Python/Pytorch/Images/4K_result_PILLOW.jpg'
Save_image_OpenCV = '/home/alain/Work/Python/Pytorch/Images/4K_result_OpenCV.jpg'
Save_image_Pytorch = '/home/alain/Work/Python/Pytorch/Images/4K_result_Pytorch.jpg'


img_PIL = Image.open(Image_Test)
img_OpenCV = cv2.imread(Image_Test,cv2.IMREAD_COLOR)

for i in range(7) :
    start_time = time.perf_counter()
    img_PIL_Blur = img_PIL.filter(ImageFilter.GaussianBlur(radius=3))
    stop_time = time.perf_counter()
    print(" Exec Pillow : ",(stop_time-start_time)*1000)

img_PIL_Blur = img_PIL_Blur.save(Save_image_Pillow)
print("")

for i in range(7) :
    start_time = time.perf_counter()
    img_OpenCV_Blur = cv2.GaussianBlur(img_OpenCV,(11,11),cv2.BORDER_DEFAULT,)
    stop_time = time.perf_counter()
    print(" Exec OpenCV : ",(stop_time-start_time)*1000)

cv2.imwrite(Save_image_OpenCV, img_OpenCV_Blur, [int(cv2.IMWRITE_JPEG_QUALITY), 95])
print("")

image_tensor = F.to_tensor(img_PIL)
image_tensor = image_tensor.to('cuda')
transform = T.GaussianBlur(kernel_size=(11, 11), sigma=(1, 2))
for i in range(7) :
    start_time = time.perf_counter()
    image_tensor_blur = transform(image_tensor)
    stop_time = time.perf_counter()
    print(" Exec Pytorch : ",(stop_time-start_time)*1000)

imag_Tensor_to_PIL = F.to_pil_image(image_tensor_blur, 'RGB')
imag_Tensor_to_PIL = imag_Tensor_to_PIL.save(Save_image_Pytorch)

Very simple program : i load a 4K image and i apply a Gaussian Blur filter (5 times) using Pillow, OpenCV and Torchvision to see who is the faster library.

I have tested this program with my laptop and the AGX Orin. Here are the results :

4K image Gaussian Blur (PC windows i7-8750H + GTX1060 6GB) in ms

 Exec Pillow :  151.3914000000227
 Exec Pillow :  119.82139999997798
 Exec Pillow :  119.87319999997226
 Exec Pillow :  121.0879999999861
 Exec Pillow :  119.32099999995671
 Exec Pillow :  119.62960000005296
 Exec Pillow :  123.94199999999955

 Exec OpenCV :  10.811900000021524
 Exec OpenCV :  7.164100000011331
 Exec OpenCV :  7.870700000012221
 Exec OpenCV :  8.089600000005248
 Exec OpenCV :  8.753699999999753
 Exec OpenCV :  7.0836000000440436
 Exec OpenCV :  7.431500000052438

 Exec Pytorch :  1506.0558999999785
 Exec Pytorch :  51.52200000003404
 Exec Pytorch :  66.1642000000029
 Exec Pytorch :  20.14099999996688
 Exec Pytorch :  62.205199999993965
 Exec Pytorch :  12.609800000006999
 Exec Pytorch :  57.274299999960476

4K image Gaussian Blur (Nvidia Jetson AGX Orin 64GB) in ms

 Exec Pillow :  353.89680600019346
 Exec Pillow :  304.88726800012955
 Exec Pillow :  318.98665799963055
 Exec Pillow :  313.87024899959215
 Exec Pillow :  311.64901400006784
 Exec Pillow :  323.0330470005356
 Exec Pillow :  322.66202499977226

 Exec OpenCV :  44.65785199954553
 Exec OpenCV :  25.533990999974776
 Exec OpenCV :  26.57884699965507
 Exec OpenCV :  21.29485499972361
 Exec OpenCV :  22.111289000349643
 Exec OpenCV :  21.87230700019427
 Exec OpenCV :  23.919809999824793

 Exec Pytorch :  969.1214759996001
 Exec Pytorch :  87.68033299929812
 Exec Pytorch :  35.250306999841996
 Exec Pytorch :  37.09012400031497
 Exec Pytorch :  27.185547000044608
 Exec Pytorch :  30.973184999311343
 Exec Pytorch :  26.01031999984116

We can see i7-8750H CPU is faster than Orin CPU but we see Orin GPU is better than GTX1060 6GB.

Anyway, i am a bit disappointed when i compare OpenCV (CPU) and TorchVision (GPU).

Did i make something wrong with my test program ?

Alain

Hi Alain, I ran your script too and got similar results - I’m not sure if that OpenCV function is just faster than the torchvision equivalent or what (PyTorch has a focus on DNN training/inferencing). I might recommend trying the OpenCV CUDA module to see if that’s faster or trying out VPI.

Hi @wanggaouyuan, I don’t think pip picks up the version of your previously-installed PyTorch wheel correctly - can you try installing torchprofile with the --no-dependencies flag or from source?

Hello Dusty,

Almost a good thing if we get similar results. I thought I misunderstood something.

I will have to compile opencv with CUDA option and I will test it.

Alain

I have precompiled OpenCV + CUDA binaries for JetPack 5.x - you can find the URL here: https://github.com/dusty-nv/jetson-containers/blob/eb2307d40f0884d66310e9ac34633a4c5ef2e083/scripts/opencv_version.sh#L14

Oh, that’s great. I will try it asap.

I also tried VPI few months ago but I had an issue. I did not look further because no time left. I will give VPI a new try because this library seems really interesting.

Alain

If you install my sbts-install project it will install pytorch and torchvision and all of the dependencies for those jetpack versions in one command.

In addition, it also installs yolov7. Maybe you could replace the use of yolov5 with the better performing yolov7 ? If not, after installation you will have all of the dependencies you need to install yolov5 I think:

Cheers,
Kim Hendrikse