I’m seeing something odd on my Jetson Nano 4GB (It’s actually a Jetson AGX Orin developer kit running the emulated Jetson Nano 4GB image)
I installed the Nvidia pytorch wheel (I tried 1.12 and 1.11, from here). Running with cuda works great, but if I try to run on cpu with this version of pytorch, inference runs very slowly (like ~3s per inference on a 224x224 image) and the entire output tensor is NANs.
If I install the pip pytorch wheel, I get reasonable times (0.05s/inference) and comparable results to what I get with cuda. However, the pip pytorch wheel for aarch64 isn’t built with cuda, so I have to switch venvs to test cpu vs cuda, which feels odd to me.
Things I have tried:
upgrading numpy
different pytorch versions
I’m running a torchscript mobilenetv2 model, generated like this:
from torchvision.models import mobilenet_v2
import torch
net = mobilenet_v2()
torch.save(net,"mobilenetv2.pth")
script_module = torch.jit.script(net)
torch.jit.save(script_module,"mobilenetv2.pt")
Is this expected behavior? Any ideas of things to try?