Hi @pramodhrachuri, the official NVIDIA wheels for PyTorch on Jetson began releasing with JetPack 4.6.1 - I’m not sure if those wheels work on previous versions of JetPack-L4T. However, you can find other PyTorch wheels for JetPack 4.x at the top of this thread.
Hi I am getting the following error when trying to import torch.
python3 -c "import torch"
OSError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory
I am trying to use yolov7 with torch 1.9.0 within Docker.
I grabbed this wheel:
https://nvidia.box.com/shared/static/h1z9sw4bb1ybi0rm3tu8qdj8hs05ljbm.whl
And added these installs in my Dockerfile based on feedback from this thread:
RUN apt-get update && apt-get install -y libopenblas-base libopenmpi-dev libomp-dev openmpi-bin
I am on python 3.6.9.
Any suggestions would be appreciated. Thank you!
Hi @gaversano, which version of JetPack are you running? Can you run this command below to show the versions of the OpenMPI libraries in your container?
find / -name 'libmpi*'
/usr/lib/aarch64-linux-gnu/libmpi_java.so.20
/usr/lib/aarch64-linux-gnu/libmpi_mpifh.so.20.11.0
/usr/lib/aarch64-linux-gnu/libmpi.so.20.10.1
/usr/lib/aarch64-linux-gnu/libmpi_usempif08.so.20.10.0
/usr/lib/aarch64-linux-gnu/libmpi_usempi_ignore_tkr.so.20
/usr/lib/aarch64-linux-gnu/libmpi_java.so.20.10.0
/usr/lib/aarch64-linux-gnu/libmpi_mpifh.so.20
/usr/lib/aarch64-linux-gnu/libmpi_cxx.so.20
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_mpifh.so.20.11.0
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so.20.10.1
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_usempif08.so.20.10.0
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_java.so.20.10.0
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_usempi_ignore_tkr.so.20.10.0
/usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so.20.10.0
/usr/lib/aarch64-linux-gnu/libmpi_usempi_ignore_tkr.so.20.10.0
/usr/lib/aarch64-linux-gnu/libmpi_usempif08.so.20
/usr/lib/aarch64-linux-gnu/libmpi_cxx.so.20.10.0
/usr/lib/aarch64-linux-gnu/libmpi.so.20
That PyTorch 1.9 wheel was built against JetPack 4.x / Ubuntu 18.04, so it expects the MPI version to be libmpi.so.20
Hey I am running JetPack 4.5 (?) on Ubuntu 18.04.6 LTS.
cat /etc/nv_tegra_release
# R32 (release), REVISION: 5.2, GCID: 27767740, BOARD: t186ref, EABI: aarch64, DATE: Fri Jul 9 16:05:07 UTC 2021
This is all I get so I suspect there may have been an issue with the MPI installation?
/etc/alternatives/libmpi++.so
/etc/alternatives/libmpi.so
OK - is that from inside the container? What container are you using and what is it’s base container?
You could try purging/removing the MPI packages in the container first in case there are dangling references, and then re-install them.
Yes inside the container. My container is based on this (with the changes made described above).
OK, it appears that container is based on nvcr.io/nvidia/l4t-ml:r32.4.4-py3
however you are on L4T R32.5 so instead it should be nvcr.io/nvidia/l4t-ml:r32.5.0-py3
. However I’m not sure if that’s related to your issue or not. Are you trying to use that existing container, just with PyTorch 1.9 instead of 1.7?
I would try this:
RUN apt-get purge -y libopenmpi-dev libopenmpi* openmpi-bin && \
apt-get install -y libopenmpi-dev openmpi-bin
Then check in the log that it actually installed the OpenMPI packages and verify with find / -name 'libmpi*'
again.
Hello,
i made a very simple test with PyTorch and numpy :
I created a very large numpy array and multiplied itself by 2 10 times.
Same thing with PyTorch array.
Pytorch operations are much much more fast than numpy operations BUT :
The first time i multiply my PyTorch array by 2, it takes about 2s to perform the operation. The other operations are really fast (about 0.1 s when i get more than 1s with numpy operations).
Why is the first Pytorch operation so long ?
Alain
Hi Alain, the first time you use a PyTorch tensor on GPU, it takes extra time to initialize the CUDA context and load the kernels. PyTorch loads a lot of libraries that only get used when the first operation is performed. Other operations afterwards should be faster.
Hello Dusty,
Many thanks for the highlights. It’s really interesting. Pytorch is really fast.
I will think about the routines I can convert with Pytorch. I will have to manage opencv, pycuda, numpy, pillow and Pytorch.
Is it possible to put pytorch tensor in a numpy array (and the other way) with low latency ? Is it easy to put pytorch tensor in a pillow array (and the other way) ?
Is there a link between pytorch and opencv ?
I ask many questions but if I have to make many translations between tensors and classical array, maybe I will loose the time I will earn with Pytorch ?
Alain
Hi @easybob, you can use Tensor.numpy()
and torch.from_numpy()
to convert tensors to/from numpy arrays. IIRC the tensors need to be on CPU, so if the tensors are already on GPU you may need to call .cpu()
on them first before converting them to numpy. It’s that CPU<->GPU transfer that you may encounter additional latency, so you will want to try and minimize that if your application is latency-sensitive.
See here for functions from torchvision for converting tensors to/from PIL images: https://pytorch.org/vision/stable/transforms.html#conversion-transforms
I don’t believe there’s an explicit link between PyTorch and OpenCV, other than you can easily convert tensors to/from numpy arrays, and OpenCV cv2 Python module works with numpy arrays as well.
That’s great Dusty. I will start to work on a Pytorch version of JetsonSky.
I guess this will be interesting. Just have to work hard now.
Have a nice day.
Alain
Hello, I wonder if you can tell me why I got problems when I used pip to install torchprofile = 0.0.1 on my jetson nano. It told me that the I need to install torch >=1.4 and pip cannot find proper torch version, but I had already installed Pytorch 1.4.0 and torchvision = 0.5.0 as the instructions mentioned in this page, which I can use pip list to verify. My jetpack version is 4.2.0. The error messages are as follow:
Collecting torch>=1.4 (from torchprofile==0.0.1)
Could not find a version that satisfies the requirement torch>=1.4 (from torchprofile==0.0.1) (from versions: )
No matching distribution found for torch>=1.4 (from torchprofile==0.0.1)
Hello,
i made a small test program to see if Pytorch / Torchvision can bring interesting things for JetsonSky.
Here is the test program :
import numpy as np
import cv2
from PIL import Image, ImageFilter
from torchvision import transforms as T
from torchvision.transforms import functional as F
import time
Image_Test = '/home/alain/Work/Python/Pytorch/Images/4K_base.tif'
Save_image_Pillow = '/home/alain/Work/Python/Pytorch/Images/4K_result_PILLOW.jpg'
Save_image_OpenCV = '/home/alain/Work/Python/Pytorch/Images/4K_result_OpenCV.jpg'
Save_image_Pytorch = '/home/alain/Work/Python/Pytorch/Images/4K_result_Pytorch.jpg'
img_PIL = Image.open(Image_Test)
img_OpenCV = cv2.imread(Image_Test,cv2.IMREAD_COLOR)
for i in range(7) :
start_time = time.perf_counter()
img_PIL_Blur = img_PIL.filter(ImageFilter.GaussianBlur(radius=3))
stop_time = time.perf_counter()
print(" Exec Pillow : ",(stop_time-start_time)*1000)
img_PIL_Blur = img_PIL_Blur.save(Save_image_Pillow)
print("")
for i in range(7) :
start_time = time.perf_counter()
img_OpenCV_Blur = cv2.GaussianBlur(img_OpenCV,(11,11),cv2.BORDER_DEFAULT,)
stop_time = time.perf_counter()
print(" Exec OpenCV : ",(stop_time-start_time)*1000)
cv2.imwrite(Save_image_OpenCV, img_OpenCV_Blur, [int(cv2.IMWRITE_JPEG_QUALITY), 95])
print("")
image_tensor = F.to_tensor(img_PIL)
image_tensor = image_tensor.to('cuda')
transform = T.GaussianBlur(kernel_size=(11, 11), sigma=(1, 2))
for i in range(7) :
start_time = time.perf_counter()
image_tensor_blur = transform(image_tensor)
stop_time = time.perf_counter()
print(" Exec Pytorch : ",(stop_time-start_time)*1000)
imag_Tensor_to_PIL = F.to_pil_image(image_tensor_blur, 'RGB')
imag_Tensor_to_PIL = imag_Tensor_to_PIL.save(Save_image_Pytorch)
Very simple program : i load a 4K image and i apply a Gaussian Blur filter (5 times) using Pillow, OpenCV and Torchvision to see who is the faster library.
I have tested this program with my laptop and the AGX Orin. Here are the results :
4K image Gaussian Blur (PC windows i7-8750H + GTX1060 6GB) in ms
Exec Pillow : 151.3914000000227
Exec Pillow : 119.82139999997798
Exec Pillow : 119.87319999997226
Exec Pillow : 121.0879999999861
Exec Pillow : 119.32099999995671
Exec Pillow : 119.62960000005296
Exec Pillow : 123.94199999999955
Exec OpenCV : 10.811900000021524
Exec OpenCV : 7.164100000011331
Exec OpenCV : 7.870700000012221
Exec OpenCV : 8.089600000005248
Exec OpenCV : 8.753699999999753
Exec OpenCV : 7.0836000000440436
Exec OpenCV : 7.431500000052438
Exec Pytorch : 1506.0558999999785
Exec Pytorch : 51.52200000003404
Exec Pytorch : 66.1642000000029
Exec Pytorch : 20.14099999996688
Exec Pytorch : 62.205199999993965
Exec Pytorch : 12.609800000006999
Exec Pytorch : 57.274299999960476
4K image Gaussian Blur (Nvidia Jetson AGX Orin 64GB) in ms
Exec Pillow : 353.89680600019346
Exec Pillow : 304.88726800012955
Exec Pillow : 318.98665799963055
Exec Pillow : 313.87024899959215
Exec Pillow : 311.64901400006784
Exec Pillow : 323.0330470005356
Exec Pillow : 322.66202499977226
Exec OpenCV : 44.65785199954553
Exec OpenCV : 25.533990999974776
Exec OpenCV : 26.57884699965507
Exec OpenCV : 21.29485499972361
Exec OpenCV : 22.111289000349643
Exec OpenCV : 21.87230700019427
Exec OpenCV : 23.919809999824793
Exec Pytorch : 969.1214759996001
Exec Pytorch : 87.68033299929812
Exec Pytorch : 35.250306999841996
Exec Pytorch : 37.09012400031497
Exec Pytorch : 27.185547000044608
Exec Pytorch : 30.973184999311343
Exec Pytorch : 26.01031999984116
We can see i7-8750H CPU is faster than Orin CPU but we see Orin GPU is better than GTX1060 6GB.
Anyway, i am a bit disappointed when i compare OpenCV (CPU) and TorchVision (GPU).
Did i make something wrong with my test program ?
Alain
Hi Alain, I ran your script too and got similar results - I’m not sure if that OpenCV function is just faster than the torchvision equivalent or what (PyTorch has a focus on DNN training/inferencing). I might recommend trying the OpenCV CUDA module to see if that’s faster or trying out VPI.
Hi @wanggaouyuan, I don’t think pip picks up the version of your previously-installed PyTorch wheel correctly - can you try installing torchprofile with the --no-dependencies
flag or from source?
Hello Dusty,
Almost a good thing if we get similar results. I thought I misunderstood something.
I will have to compile opencv with CUDA option and I will test it.
Alain
I have precompiled OpenCV + CUDA binaries for JetPack 5.x - you can find the URL here: https://github.com/dusty-nv/jetson-containers/blob/eb2307d40f0884d66310e9ac34633a4c5ef2e083/scripts/opencv_version.sh#L14
Oh, that’s great. I will try it asap.
I also tried VPI few months ago but I had an issue. I did not look further because no time left. I will give VPI a new try because this library seems really interesting.
Alain
If you install my sbts-install project it will install pytorch and torchvision and all of the dependencies for those jetpack versions in one command.
In addition, it also installs yolov7. Maybe you could replace the use of yolov5 with the better performing yolov7 ? If not, after installation you will have all of the dependencies you need to install yolov5 I think:
Cheers,
Kim Hendrikse