Installing PyTorch for CUDA 10.2 on Jetson Xavier NX for YOLOv5

I have successfully been able to get Yolov5 working on my Jetson Xavier NX. However, I realized that the software is not using any of my GPU memory. After some research, I realized I installed the PyTorch version that was for the CPU and not the GPU. The thing is, installing pytorch was incredibly difficult and I only ended up doing everything after following a youtube video step-by-step. So, I feel like I am back to square 1 on trying to get yolov5 working on my jetson xavier nx, but this time I would like to use the onboard GPU (utilizing cuda 10.2). Has anyone been able to do this? I did not use docker at all for any of this.

Hi @bryantsp, please see this topic for pre-built PyTorch wheels for Jetson with CUDA support:

1 Like

Oh I wish I found this webpage earlier! I will definitely give it a go! Thank you

Okay, so it turns out that I have been on that page before and that is the page that I used to install pytorch. I kinda went through hell and back installing it, but now that I have it installed, I want to make sure I don’t mess anything up (again). I am currently running Torch 1.8.1 and Torchvision 0.9.1.

Using the jetson stats interface, I see that it says that OpenCV 4.1.1 is NOT compiled to CUDA. Could this be the main issue right now? I was looking for how to compile it via google but I went down a few rabbit holes. Where can I find how to compile CUDA with OpenCV?

PyTorch doesn’t use OpenCV for running DNN, so it is probably unrelated. In the YOLO code, is it calling .cuda() on the PyTorch model and tensors (or .to('cuda:0'))? If not, that YOLO code isn’t setup to run the models on the GPU.

BTW you can find how to build OpenCV with CUDA enabled here: GitHub - mdegans/nano_build_opencv: Build OpenCV on Nvidia Jetson Nano

If by “YOLO code”, you mean the detect.py file, then it is not calling

.cuda()

or

.to(‘cuda:0’)

What does this mean? I have to get a different yolo code?

Which YOLO code are you using? In PyTorch, if you want it to run on the GPU, you need to call .cuda() or .to('cuda:0') on the model and tensors.

If you are talking about the Ultralytics detect.py, it does appear to do this. You just need to run detect.py with the --device=0 option:

https://github.com/ultralytics/yolov5/blob/3bef77f5cb7eda3fa3cae53f2579cd3363c99744/detect.py#L198

I am using Yolov5. In the Ultralytics detects.py, I have tried running it with --device=0 but it would never run.

I am kinda new with using all these softwares, so I dont really understand when you say i need to call .cuda() or .to(‘cuda:0’). What do you mean by this? Is that after I import torch, torchvision, and cv2 in the python command window?

The Ultralytics code already does this under the covers, so I wouldn’t worry about it for now.

What’s the error you get when you run it with --device=0?

BTW here is a version that runs YOLOv5 with TensorRT (using GPU), and appears to be compatible with Ultralytics models: GitHub - SeanAvery/yolov5-tensorrt: YOLOv5 in TensorRT

This is the error that Iget when running --device=0

~/yolov5$ python3 detect.py --source 0 --weights best4.pt --conf 0.4
detect: weights=[‘best4.pt’], source=0, imgsz=160, conf_thres=0.4, iou_thres=0.45, max_det=1000, device=0, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False
Traceback (most recent call last):
File “detect.py”, line 271, in
main(opt)
File “detect.py”, line 266, in main
run(**vars(opt))
File “/home/xav/.local/lib/python3.6/site-packages/torch/autograd/grad_mode.py”, line 27, in decorate_context
return func(*args, **kwargs)
File “detect.py”, line 82, in run
device = select_device(device)
File “/home/xav/yolov5/utils/torch_utils.py”, line 72, in select_device
assert torch.cuda.is_available(), f’CUDA unavailable, invalid device {device} requested’ # check availability
AssertionError: CUDA unavailable, invalid device 0 requested
signal_shutdown [atexit]

I have seen this, and I cannot remember why I was told not to use it, but my team has specific instructions for this project. As you can see from my error, I am using a webcam for detection

If you run this with python3, are you able to use CUDA in PyTorch?

import torch
print(torch.__version__)
print('CUDA available: ' + str(torch.cuda.is_available()))
print('cuDNN version: ' + str(torch.backends.cudnn.version()))
a = torch.cuda.FloatTensor(2).zero_()
print('Tensor a = ' + str(a))
b = torch.randn(2).cuda()
print('Tensor b = ' + str(b))
c = a + b
print('Tensor c = ' + str(c))

Can you run the deviceQuery sample ok?

cd /usr/local/cuda/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery

If deviceQuery runs, try running the PyTorch test script again.

I don’t believe I published a wheel for PyTorch 1.8.1, only 1.8.0 - are you sure you didn’t install yours from somewhere else?

You can also try the l4t-pytorch container and verify the PyTorch in the container sees the GPU. Select a container for your version of JetPack.

If you run this with python3, are you able to use CUDA in PyTorch?

~/yolov5$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch
print(torch.version)
1.8.1

print('CUDA available: ’ + str(torch.cuda.is_available()))
CUDA available: False

print('cuDNN version: ’ + str(torch.backends.cudnn.version()))
cuDNN version: None

a = torch.cuda.FloatTensor(2).zero_()
Traceback (most recent call last):
File “”, line 1, in
TypeError: type torch.cuda.FloatTensor not available. Torch not compiled with CUDA enabled.

print('Tensor a = ’ + str(a))
Traceback (most recent call last):
File “”, line 1, in
NameError: name ‘a’ is not defined

b = torch.randn(2).cuda()
Traceback (most recent call last):
File “”, line 1, in
File “/home/xav/.local/lib/python3.6/site-packages/torch/cuda/init.py”, line 164, in _lazy_init
raise AssertionError(“Torch not compiled with CUDA enabled”)
AssertionError: Torch not compiled with CUDA enabled

print('Tensor b = ’ + str(b))
Traceback (most recent call last):
File “”, line 1, in
NameError: name ‘b’ is not defined

c = a + b
Traceback (most recent call last):
File “”, line 1, in
NameError: name ‘a’ is not defined

Can you run the deviceQuery sample ok?

Looks like it runs
/usr/local/cuda/samples/1_Utilities/deviceQuery$ sudo make
[sudo] password for xav:
Sorry, try again.
[sudo] password for xav:
/usr/local/cuda-10.2/bin/nvcc -ccbin g++ -I…/…/common/inc -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_32,code=sm_32 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery.o -c deviceQuery.cpp
/usr/local/cuda-10.2/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_32,code=sm_32 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o deviceQuery deviceQuery.o
mkdir -p …/…/bin/aarch64/linux/release
cp deviceQuery …/…/bin/aarch64/linux/release
xav@xav2:/usr/local/cuda/samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “Xavier”
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 7.2
Total amount of global memory: 7766 MBytes (8142753792 bytes)
( 6) Multiprocessors, ( 64) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 1109 MHz (1.11 GHz)
Memory Clock rate: 1109 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

I don’t believe I published a wheel for PyTorch 1.8.1, only 1.8.0 - are you sure you didn’t install yours from somewhere else?

I think I may have tried using your published wheel files, but they might not have worked. (Now I am not too sure where I got 1.8.1 from). I did try to use your wheel file on friday, but when It came to installing torchvision, I got errors from the setup.py file being unable to import torch

Oops, sorry. Didn’t reply directly to you ^

It seems likely that your PyTorch isn’t detecting GPU because you are still running the CPU-only version of PyTorch. I recommend to uninstall it and install the PyTorch 1.8 wheel that was built with CUDA from the topic above, or use the l4t-pytorch or l4t-ml container if you continue to have problems. It appears that the l4t-ml container already has many of the dependencies of Ultralytics repo, so that may be the way to go.

I am not sure why, but this time I was able to sucessfully uninstall pytorch and install the one from above. I’m using the cuda version now (finally), but for some reason, when I check jetson_stats, the GPU isnt being used at all. Do I have to manually call .cuda() or .to('cuda:0') or run detect.py with the --device=0? Or is there something else that needs to be done?

Also, thank you so much for your help thus far!

OK, cool that you got the PyTorch CUDA wheel installed. Yep, you need to run detect.py with --device=0. This will in turn trigger the Utralytics code to call .to('cuda:0') for you

Forgot to update you. Changing that the detect.py code worked! There are two locations in the detect.py code where you can set the device to 0, and changing both didst work. But changing one did. Anyway, thanks for all your help!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.