Nvidia torch + cuda produces only NAN on CPU

Hi,

I’m seeing something odd on my Jetson Nano 4GB (It’s actually a Jetson AGX Orin developer kit running the emulated Jetson Nano 4GB image)

I installed the Nvidia pytorch wheel (I tried 1.12 and 1.11, from here). Running with cuda works great, but if I try to run on cpu with this version of pytorch, inference runs very slowly (like ~3s per inference on a 224x224 image) and the entire output tensor is NANs.

If I install the pip pytorch wheel, I get reasonable times (0.05s/inference) and comparable results to what I get with cuda. However, the pip pytorch wheel for aarch64 isn’t built with cuda, so I have to switch venvs to test cpu vs cuda, which feels odd to me.

Things I have tried:

  • upgrading numpy
  • different pytorch versions

I’m running a torchscript mobilenetv2 model, generated like this:

from torchvision.models import mobilenet_v2
import torch

net = mobilenet_v2()
torch.save(net,"mobilenetv2.pth")
script_module = torch.jit.script(net)
torch.jit.save(script_module,"mobilenetv2.pt")

Is this expected behavior? Any ideas of things to try?

Hi,

It looks like you are using Nano rather than Orin Nano.
I’m moving your topic to the Nano board first.

Sorry, my bad.
You are using Orin Nano. Moving your topic back.

Thanks.

Hi,

Could you share the JetPack version you used?
Since we have PyTorch 1.13 for JetPack 5.0.2, it’s recommended to give it a try.
https://developer.download.nvidia.com/compute/redist/jp/v502/pytorch/

Thanks.

Hi, I’m using JetPack 5.0.2-b231 (installed using apt).

I just tried pytorch 1.13 that you linked (I used torch-1.13.0a0+08820cb0.nv22.07-cp38-cp38-linux_aarch64.whl ), and I’m seeing the same result.

Hi,

Thanks for your testing.

Could you share a simple source and model that can reproduce the NAN output?
We want to reproduce this issue in our environment first.

Thanks.

Sure! The model is generated using:

from torchvision.models import mobilenet_v2
import torch

net = mobilenet_v2()
torch.save(net,"net.pth")
script_module = torch.jit.script(net)
torch.jit.save(script_module,"mobilenetv2.pt")

And I’m running it using:

import torch
import numpy
import random

use_cuda = False

torch.manual_seed(0)
m = torch.jit.load("mobilenetv2.pt")
if use_cuda:
    m.cuda()
m.eval()
with torch.autograd.set_detect_anomaly(True):
    input_t = torch.rand((1,3,224,224))
    if use_cuda:
        input_t = input_t.cuda()
    out = m(input_t)
    print(out)

I haven’t used torchscript much, so please let me know if I’m making any obvious errors.

Thanks!

Hi,

Thanks for sharing.
Will reproduce this issue in our environment first.

Hi,

Thanks for your patience.

Confirmed that we can reproduce the same issue in our environment.
(JetPack 5.0.2+ l4t-pytorch:r35.1.0-pth1.12-py3 container)

We are checking this issue with our internal team.
Will share more information with you later.

Thanks.

Hi,

is there any update? I also get this behaviour with the latest JetPack.

Hi,

The CPU inference issue is still under investigation.
Since we expect users to run the model on GPU, this issue’s priority is relatively lower.

Thanks.

Hi,

Here is some update about this issue.
Due to limited resources, we will focus on the GPU mode support of the prebuilt PyTorch package

Thanks.