How to run pytorch custom inference on Jetson Nano's GPU?

grzegorz.kozinski · June 17, 2022, 5:51pm

I installed pytorch on my Nano 2GB with the script delivered in “Jetson Inference” repository and tutorial. When I check in python torch is installed and cuda.is_available() gives me True.

I prepared some small benchmark code to see if this works as it should, where I run ione image inference through pretrained alexnet.

alexnet = models.alexnet(pretrained=True)
alexnet.eval()

input_batch = torch.randn((1, 3, 224, 224))

start_time = time.time()
output_batch = alexnet(input_batch)
print(time.time() - start_time)

On my laptop CPU torch (i5 10th gen) it takes 0.04 second, on Jetson nano it takes more than 0.4 s.

When I moved both model and data batch .to(torch.device(“cuda”)) the script ended up with inferencing this one image for 46 seconds (?!?!?!).

I don’t think this is normal. How can I use pytorch models in proper way on my Nano?

I might add, that for example on “detectnet” demo I get over 20 FPS so I think the installation of pytorch, jetpack and cuda libraries is correct.

dusty_nv · June 17, 2022, 6:47pm

If this was the first inferencing iteration of the program, it takes longer because it needs to load and initialize a bunch of CUDA libraries the first time a GPU operation is performed in PyTorch. Try discarding the time of the first and timing a bunch of iterations after that.

grzegorz.kozinski · June 19, 2022, 7:44am

Hello, thanks for the response.

I modified the script so it runs couple of times and execution time is decreasing with time as you suggested:
-51 s
-1.39 s
-0.15 s
-0.10 s
-0.05 s
-0.03 s

However, I also put into the time measured part of the code, the line of code that moves data to the torch.device(“cuda”), because it seems to me, that I’d have to do it each time I conduct inference. Without measuring time of moving the tensor into the cuda device, the process takes only 0.008 s.

So I also would like to ask you whether putting tensor to cuda with the use of Tensor.to(torch.device(“cuda”)) is recommended method on Nano or maybe there is some faster alternative to this?

dusty_nv · June 21, 2022, 3:38pm

Unless you can modify or re-use a tensor in-place that has already been allocated on the GPU, it would seem to be necessary. You could also try using PyTorch’s APIs for pinned memory: https://pytorch.org/docs/stable/notes/cuda.html#use-pinned-memory-buffers

In reality, PyTorch isn’t the most optimized library for realtime inferencing and as such there are faster alternatives such as TensorRT (like the jetson-inference library uses). jetson-inference is also careful to use zero-copy memory to avoid needing CPU/GPU memory transfers or the overhead of allocating memory at runtime.

system · July 13, 2022, 3:24am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can't use GPU to accelerate inference resnet model Jetson Nano jetson-inference	17	1268	May 10, 2023
Running PyTorch CUDA Jetson Nano pytorch	8	2135	July 13, 2022
Why is torch.tensor.to('cuda') so slow? Jetson AGX Orin pytorch	5	90	December 9, 2024
Jetson nano slow cuda times with pytorch Jetson Nano cuda , pytorch	14	1101	October 11, 2023
Loading image to GPU with pytorch very slow Jetson Nano cuda , pytorch	4	1486	September 8, 2022
Strange jumping results on FPS and inference time Jetson Nano	9	1190	October 18, 2021
Jetson nano sometimes extremely slow with GPU Jetson Nano cuda , pytorch	7	1139	November 3, 2023
Course project using GPU acceleration Jetson Nano jetson-inference	3	442	October 18, 2021
Use GPU in Jetson Nano Ubuntu 18.04.6 Jetson Nano jetson-inference	6	637	March 13, 2024
YOLOv8 Python Script has really high inference time due unused GPU Memory Jetson Orin NX cuda , pytorch , cudnn	4	658	March 20, 2024

How to run pytorch custom inference on Jetson Nano's GPU?

Related topics