Compiled pytorch error in jetson xavier agx

Hi, I’m trying to build PyTorch 1.7 in python 3.7 from source. I build it successfully several days ago. But now I find I can’t get the right output from PyTorch. E.g.:

>>> import torch
>>> torch.randn(10)
tensor([ 2.,  1.,  0.,  0.,  1., -1.,  0.,  1.,  1., -0.])
>>> 13/54
0.24074074074074073
>>> torch.tensor(13/54)
tensor(0.)

Like the code above, the output decimal has no digits after the decimal point and my deployment code also gets NaN loss. So can anyone help me out? Thanks.

My Environment:

  • Jetson AGX Xavier
  • Jetpack 5.1

Edit:
I find the output is corret under cuda. E.g.:

>>> import torch
>>> torch.randn(10)
tensor([ 0., -2., -1.,  2., -0., -1.,  0., -1., -1., -1.])
>>> torch.randn(10).cuda()
tensor([ 1.4657, -0.1883, -1.0067, -2.1090,  0.3517, -0.0147, -0.8071,  0.7974,
         0.6983,  1.6613], device='cuda:0')

But I also need PyTorch works well in CPU because some code just use short CPU tensor for convenience. Is there an easy way to sovle this problem? Thanks a lot.

I solve this problem by using other people’s compiled whl file from this commet. And it works well. But I also want to know the reason it happens. Any comment will be appreciated.

Hi,

Please help to try if you can get the expected output with following command:

torch.randn(10, dtype=torch.float)

Thanks.

Hi @bangwhe, PyTorch 1.7 had a bug on aarch64 which led to incorrect tensor results on the CPU (see PyTorch issue #47098)

This issue was patched with this patch, which is linked to in the Build Instructions from the PyTorch topic: https://gist.github.com/dusty-nv/ce51796085178e1f38e3c6a1663a93a1#file-pytorch-1-7-jetpack-4-4-1-patch

So that wheel you are using was probably built with this patch applied to fix the issue.

Thanks for your reply. But I have removed this environment so I can’t show you the output.

Thanks for your explanation about this bug and solution. But my sdcard was broken and I have format it, so I lost my torch wheel.

Also, to be clear, the wheel I posted a link for is for pytorch 1.8, not 1.7.

Thanks for mention that. I finally use PyTorch 1.8 and torchvision 0.7.0.