I installed pytorch on my Nano 2GB with the script delivered in “Jetson Inference” repository and tutorial. When I check in python torch is installed and cuda.is_available() gives me True.
I prepared some small benchmark code to see if this works as it should, where I run ione image inference through pretrained alexnet.
If this was the first inferencing iteration of the program, it takes longer because it needs to load and initialize a bunch of CUDA libraries the first time a GPU operation is performed in PyTorch. Try discarding the time of the first and timing a bunch of iterations after that.
I modified the script so it runs couple of times and execution time is decreasing with time as you suggested:
-51 s
-1.39 s
-0.15 s
-0.10 s
-0.05 s
-0.03 s
However, I also put into the time measured part of the code, the line of code that moves data to the torch.device(“cuda”), because it seems to me, that I’d have to do it each time I conduct inference. Without measuring time of moving the tensor into the cuda device, the process takes only 0.008 s.
So I also would like to ask you whether putting tensor to cuda with the use of Tensor.to(torch.device(“cuda”)) is recommended method on Nano or maybe there is some faster alternative to this?
In reality, PyTorch isn’t the most optimized library for realtime inferencing and as such there are faster alternatives such as TensorRT (like the jetson-inference library uses). jetson-inference is also careful to use zero-copy memory to avoid needing CPU/GPU memory transfers or the overhead of allocating memory at runtime.