Can I execute yolov5 on the GPU of JETSON AGX XAVIER?

I just tested Yolov5 on my Jetson AGX Xavier, just follow this tutorial mentioned in this GitHub page.

Make sure all the packages and versions match while installing from requirements.txt file.

Hi,
Thank you for your reply.

I just tested Yolov5 on my Jetson AGX Xavier, just follow this tutorial mentioned in this GitHub page.

?? But don’t you get the message when you execute ‘python detect.py --source …’ like
YOLOv5 🚀 v6.1-246-g2dd3db0 Python-3.7.5 torch-1.11.0 CPU
?
When your GPU is working, I think you will get the message like
YOLOv5 🚀 v6.1-227-ga6e99e4 Python-3.7.5 torch-1.11.0+cu102 CUDA:0 (NVIDIA GeForce RTX 2080 Ti, 11011MiB)
(this is the message of my PC’s case).

Hey,

I just realised yolov5 was running on CPU power and not on GPU. I too got the same message after executing detect.py

YOLOv5 🚀 v6.1-242-ga80dd66 Python-3.8.10 torch-1.11.0 CPU

Will have to look online for enabling GPU on yolov5. I apologise for that.

I am also working on yolo Object Detection, but currently working on yolov4 based on darknet. Do let me know if you figure out how to enable the GPU for yolov5.

Thanks

Hi,

So, I made one solution.

Before to do this, make sure that your Xavier has ‘JetPack 5.0.1 DP’ linux system and upgrade it.

~$ sudo apt-get -y update
~$ sudo apt-get -y upgrade

And it’s convenient for you to make ‘python’ alternatives.

~$ sudo update-alternatives --install /usr/bin/python python /usr/bin/python2.7 1
~$ sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.8 2

(note that we must use >python3.7.)
Then git ‘yolov5’ and make virtual environment.

~$ git clone GitHub - ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
~$ sudo apt-get install python3.8-venv
~$ python -m venv yolov5
~$ cd yolov5/
~/yolov5$ source ./bin/activate
~/yolov5$ pip install --upgrade pip
~/yolov5$ pip install -U -r requirements.txt

And once check that it runs on your cpu.

~/yolov5$ python detect.py --source data/images/
/home/hamada/yolov5/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
warn(f"Failed to load image Python extension: {e}")
detect: weights=yolov5s.pt, source=data/images/, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, >view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, >update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 🚀 v6.1-246-g2dd3db0 Python-3.8.10 torch-1.11.0 CPU

Downloading https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt to yolov5s.pt…
100%|██████████████████████████████████████| 14.1M/14.1M [00:01<00:00, 11.7MB/s]

Fusing layers…
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
image 1/2 /home/hamada/yolov5/data/images/bus.jpg: 640x480 4 persons, 1 bus, Done. (0.852s)
image 2/2 /home/hamada/yolov5/data/images/zidane.jpg: 384x640 2 persons, 2 ties, Done. (0.637s)
Speed: 5.8ms pre-process, 744.8ms inference, 8.0ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp

Check the list of pip

~/yolov5$ pip list
Package Version



torch 1.11.0
torchvision 0.12.0

and uninstall the torch

~/yolov5$ pin uninstall torch

and install new torch version for GPU.(download from https://developer.download.nvidia.com/compute/redist/jp/v50/pytorch/torch-1.12.0a0+2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl)

~/yolov5$ pip install torch-1.12.0a0+2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl
~/yolov5$ sudo apt-get install libopenblas-dev

But the execution result is as follows.

~/yolov5$ python detect.py --source data/images/
/home/hamada/yolov5/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
warn(f"Failed to load image Python extension: {e}")
detect: weights=yolov5s.pt, source=data/images/, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, >view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, >update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 🚀 v6.1-246-g2dd3db0 Python-3.8.10 torch-1.12.0a0+2c916ef.nv22.3 CUDA:0 (Xavier, 31011MiB)

RuntimeError: Couldn’t load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while >compiling torchvision from source. For further information on the compatible versions, check GitHub - pytorch/vision: Datasets, Transforms and Models specific to Computer Vision for the >compatibility matrix. Please check your PyTorch version with torch.version and your torchvision version with torchvision.version and verify if >they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.

So, we need to make and install an adequit ‘torchvision’ version.

At first uninstall unsuitable version.

~/yolov5$ pip uninstall torchvision

Then download ‘torch7’ from the GitHub,

~$ git clone GitHub - torch/torch7: http://torch.ch
~$ cp torch7/lib/TH/ yolov5/lib/python3.8/site-packages/torch/include/
~$ cd torch7
~/torch7$ mkdir build
~/torch7$
~/torch7$ cmake -S . -B build
~/torch7$ cp build/lib/TH/THGeneral.h ~/yolov5/lib/python3.8/site-packages/torch/include/TH

and copy the include files of ‘TH’ folder, generate ‘THGeneral.h’ file and also copy it.
Then, download the ‘torchvision’ to the virtual environment of yolov5

~/yolov5$ git clone --branch v0.12.0 GitHub - pytorch/vision: Datasets, Transforms and Models specific to Computer Vision torchvision
~/yolov5$ cd torchvision/

set the version and

~/yolov5/torchvision$ export BUILD_VERSION=0.12.0

build it.(it takes 30 min. or more.)

~/yolov5/torchvision$ python setup.py install

Then,

~/yolov5/torchvision$ cd …/
~/yolov5$ python detect.py --source data/images/
detect: weights=yolov5s.pt, source=data/images/, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, >view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, >update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 🚀 v6.1-246-g2dd3db0 Python-3.8.10 torch-1.12.0a0+2c916ef.nv22.3 CUDA:0 (Xavier, 31011MiB)

Fusing layers…
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
image 1/2 /home/hamada/yolov5/data/images/bus.jpg: 640x480 4 persons, 1 bus, Done. (0.079s)
image 2/2 /home/hamada/yolov5/data/images/zidane.jpg: 384x640 2 persons, 2 ties, Done. (0.076s)
Speed: 2.6ms pre-process, 77.4ms inference, 8.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp4

Phew!

1 Like

Hey, nVidia software technical support staff.

If you exist in this forum, would you give me some feedback for above solution? Otherwise, I can’t decide to close this topic or not.

Hi,

Really sorry for the late reply.

Since Jetson is an ARM platform, you will need the PyTorch and TorchVision that are built with ARM architecture.
For JetPack 5.0, you can find PyTorch v1.11.0 and v1.12.0 in the below topic:

We also have a container that has PyTorch pre-installed.

For running YOLOv5 on Jetson, it’s recommended to try our Deepstream SDK.
It has optimized the pipeline based on the Jetson hardware and is expected to give you a better performance.
Below is a sample from the community for your reference:

Thanks.

Since Jetson is an ARM platform, you will need the PyTorch and TorchVision that are built with ARM architecture.

You don’t have to give me such explanation that makes a person fool.
That’s why I installed the ready-made’torch-1.12.0a0 + 2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl’ and recompiled ‘torchvision’ under the Xavier’s configuration.

Below is a sample from the community for your reference:

I don’t know wheter you’re a member of nVidia or not, but why do you have to mess up the system with Docker, Container…etc., to test an application like yolov5? My Xavier has only 32GB storage, and even though it has 8 arm cores, but the CPU power is far less than a normal PC. If it can be done with pip on a normal PC, I think that nVidia should make it possible to do the same thing on Tegra.
And what do guys expect from Tegra? Isn’t it attractive because the SOC chip has an nVidia GPU? Then, guys want to run a demo of an application that can be accelerated by the GPU. But it’s not easy, not simple. Why?
I think chip vendor’s SOC business will not go well unless they take care for the applications which customer want to run on it. Intel’s Edison is a good example.

Kazu

Hi, Mr.AastaLLL, and nVidia support staff or other contributors

Please keep the following in mind when answering or giving advice on a topic:

It is not called ‘support’ to paste the URL of the site searched by keywords etc.
without carefully reading the contents of the trouble.

It is a shameful act to request an error log or test but not comment on the submitted result.

The delay increases as the output video over time
High MTU causes Kernel Panic

Kazu

Hi,

Sorry for the inconvenience it brings to you.
Since PyTorch is a third-party library and it doesn’t have an official package for Jetson.
You will need to install it manually (not from pip) or use our prebuilt docker.

Based on the log you shared, it’s around ~12fps to infer a YOLOv5 model.

image 1/2 /home/hamada/yolov5/data/images/bus.jpg: 640x480 4 persons, 1 bus, Done. (0.079s)
image 2/2 /home/hamada/yolov5/data/images/zidane.jpg: 384x640 2 persons, 2 ties, Done. (0.076s)

It’s much slower than we expect on Xavier. (should >30fps)
Usually, the slowness comes from OpenCV since it handles image data with CPUs. (ex. decoder, converter, … ).
It’s possible to enable GStreamer in OpenCV which uses hardware as the decoder.
But it still has some CPU <-> GPU buffer transfer inside and still causes overhead and latency.

Regarding this, it’s recommended to try the Deepstream sample shared above.
We have optimized the pipeline of camera - inference - display.
In the testing from the community, it can reach 56fps on FP32 mode, and 169fps on IN8 mode.

More, please noted that Xavier by default runs on dynamic frequency mode.
You can configure it to full performance mode with the following command:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

Hi, Mr.AastaLLL,

Please do not change the story.
You’ve introduced the ready-made ‘torch-1.12.0a0 + 2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl’ by yourself, haven’t you? If you say that ‘since PyTorch is a third party library…’, then Ubuntu itself is a third party one. My question is that there is no corresponding ‘torchvision’, so I’m just asking if this method is a simple workaround. I’m not trying to compete for fps just because I want to be able to use the GPU easily . Generally, if you try to use Xavier in a car, 30fps is not enough, and the data latency is more important than fps. Don’t you just focus on fps only?

Hi,

The torch-1.12.0a0 + 2c916ef.nv22.3-cp38-cp38-linux_aarch64.whl is not aviable on the default apt server.
But since it is located on our server, it’s possible to install it by pip3 with the download link.
You can find more instructions in our document:

However, we don’t provide the TorchVision prebuilt package. So please build it from the source.
The compatible version can be found in this comment (shared before) or TorchVision’s GitHub.

Do you want to use Xavier in the car use case?
To give a further suggestion, would you mind sharing more details about your use case with us?

For car-related problems, we also have solutions specific to the DRIVE use case.
https://developer.nvidia.com/drive

Thanks

Hi, Mr.AastaLLL,

However, we don’t provide the TorchVision prebuilt package. So please build it from the source.

What are you going to do by repeating that I pointed out? Is this your support?
After already reading all the linked materials before you indicated, I thought
which was the easiest way and just took to re-build ‘torchvision’.

To give a further suggestion, would you mind sharing more details about your use case with us?

I’m not going to share ‘use cases’ with dishonest people. You’ll just give me nonsense replies.

Kazu

Hi,

After building the TorchVison from source, is there any further issue to deploy your use case?
Thanks.

Hi, Mr. AastaLLL

Any further issue? After building the TorchVision?
It’s easy, so why not test it by yourself?
It will improve your skills as a support staff.

Kazu

Hi,

Do you get your use case work?

Thanks.

Hi, Mr. AastaLLL

I have no intention of cooperating with those who send the following rude e-mails. My opinion here is just nVidia should take into account the effort of support staff caused by SW engineer’s trivial mistakes.

Hello,
This is an automated message from NVIDIA Developer Forums to let you know that your post was hidden.
Nsight Systems not able to read the .nsys-rep file generated in latest Jetpack 5 (L4T 34.1.1) - #5
Your post was flagged as off-topic: the community feels it is not a good fit for the topic, as currently defined by the title and the first post.
This post was hidden due to flags from the community, so please consider how you might revise your post to reflect their feedback. You can edit your post after 10 minutes, and it will be automatically unhidden.
However, if the post is hidden by the community a second time, it will remain hidden until handled by staff.
For additional guidance, please refer to our community guidelines.

If you are in my position, would you be willing to cooperate?

Kazu

Hi,

Sorry that I have no idea about the e-mail.

Back to the TorchVision’s question.
We can submit an internal bug to see if it is possible to add the TorchVision as our official prebuilt.
And allow users to get the corresponding package automatically (at least listed in the doc).

Do you think this will help you (in the next time’s setup) and other users?
Thanks.

Hi,
On Jetson platforms, the optimal solution of running deep learning inferences is to use DeepStream SDK. For using other frameworks, it would need other users to share experience and guidance.

Hi, Mr.AastaLLL

Do you think this will help you (in the next time’s setup) and other users?

We need an incentive to answer the questions in the forum. Of course, the incentive for you and other nVidia support staff is salary. On the other hand, what is the incentive for a volunteer like me? ‘I want to help people’ is also a good motivation. However, can volunteers keep their lofty motivation when the response from the support support staff is irrelevant or disappointing?

Kazu

Hi, Mr.DaneLLL

Thanks for your proposal. But this is the topic for yolov5 and torchvision’s version. So I think you are off-topic, but nVidia AI system does not warn you because you are the support staff of nVidia?

Kazu