Getting wrong outputs on TX2 with PyTorch compiled from source

Hello,

I trained a custom CNN and wanted to deploy it on the TX2. It works perfectly on 3 different systems (not ARM based). However, with the same weight file, input and source code, it gives outputs that are larger than expected.
Initially, I thought it was a version issue, however, I built another version from source (v0.3.0 and v0.3.1) and both give unexpected output values. Also, toggling the GPU-CPU flag does not help. I built the same version also on another system (from source) and it works fine.

Is there something that I am missing here?

Thanks for your time.

Hi,

Could you share more information about your model?

Not sure if this is relevant to the incorrect GPU architecture.
Could you check following pyTorch installation script first?
https://gist.github.com/dusty-nv/ef2b372301c00c0a9d3203e42fd83426

Thanks.

Yes, what more would you like to know about the system?

I followed the procedure in the script. I ran

python setup.py install

i.e. “install mode” and not “develop mode”. Will that make a difference?

Thanks.

Hi,

Could you share the model you used?
We want to check if there is any particular operation that yields accuracy issue.

Thanks.

Hi,

So I use a custom neural network. For starters it uses the pytorch resnet (https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py) to load the part of the resnet. I have attached the exact version of the file.

I have tried it with PyTorch v0.3.0 and v0.3.1, and it does not work. I thought it might be an architecture issue so I actually trained on the Jetson and then evaluated it there but still got similar results.

Regards,
Ankit

Hi,

Could you run the verification sample in dusty-nv tutorial first?
https://github.com/dusty-nv/jetson-reinforcement

If possible, please provide an example code to help us reproduce this issue.
Maybe we can reproduce it by comparing the results of an x86-machine and Jetson TX2?

Thanks.

Hi,

So running the commands from “Verify PyTorch” section, I get reasonable outputs:

nvidia@tegra-ubuntu:~$ python
Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
0.3.0b0+af3964a
>>> print('CUDA available: ' + str(torch.cuda.is_available()))
CUDA available: True
>>> a = torch.cuda.FloatTensor(2).zero_()
>>> print('Tensor a = ' + str(a))
Tensor a = 
 0
 0
[torch.cuda.FloatTensor of size 2 (GPU 0)]

>>> b = torch.randn(2).cuda()
>>> print('Tensor b = ' + str(b))
Tensor b = 
 0.8007
 2.0221
[torch.cuda.FloatTensor of size 2 (GPU 0)]

>>>  c = a + b
  File "<stdin>", line 1
    c = a + b
    ^
IndentationError: unexpected indent
>>> c = a + b
>>> print('Tensor c = ' + str(c))
Tensor c = 
 0.8007
 2.0221
[torch.cuda.FloatTensor of size 2 (GPU 0)]

I am guessing it happens during the inference. Where can I attach the weights file and the inference scripts? Would it be possible to send it via email? Or do you suggest a better way to share the code with you?

Thanks for your time.

Hi,

You can send it via private message.
Thanks.

Hi,

I figured out what the issue was. The input to PyTorch is somehow between 0-255 on the TX2 while on 2 other laptops it was between 0-1 even though I do not use any normalization explicitly in my code and the code that ran on the machines was the same. After performing a division by 255 on the TX2, it works as expected.

It seems like there is an issue with libraries but I am not sure what caused it in the first place.

cheers.

Hi,

Thanks for sharing this information with us!!!