Error: JetPack4.6 +torch v1.9.0, torchvision 0.10.0a0+300a8a4 in Jupyterlab

Hello, I try to get a PyTorch v1.9.0 running in jupyterlab v3.1.13 after installation using the steps provided in the forum’s issue 72048 on a fresh JetPack4.6 SD. Tensorflow is 1.15.5+nv21.9 and dependencies are following the last pytorch nvidia container description.

Unfortunately, the sample scripts from PyTorch or others freeze or hang once the preprocessed image is transferred to the model(preprocessed image). Tensors images can be transferred to GPU, but scripted_predictor(batch) freezes the system. Error: /home/…/module.py:1051:UserWarning: Named tensors and all their associated APIs are an experimental feature etc. . (triggered internally at /media/nvidia/NVME/pytorch-v1.0.0/c10/core/TensorImpl.h:1156.) return forward_call(*input,**kwargs).

Would be glad to get some hints to improve the situation.
Thank u.

Hi,

Could you launch the docker with more memory resources to see if it helps?

Thanks.

Hello, sorry for not being precise before.
I installed from scratch pytorch and tensorflow on JP46 SD image for jetson nano, trying to get library’s versions as outline in the last JP46 related pytorch docker, in the hope that is works out.

By chance I saw - that when I run the test scripts - that a long time (about 15 min) after the error or info messages the training (model(image) or others) suddenly run through the epochs.
A second rerun of the same training cell did not give an error message and starts recently fast, and proceeded to the end.
Might be swapping takes long time or something else, and the installation is actually fine? During the waiting the CPU and GPU activity is low (<15%).
Thank you. T

Hi,

It sounds normal.
There are some initialization tasks applied when the first time launch.
Since you are doing this within a container, the initialization will be triggered every time the container is launched.

Could you get the correct output through it?
If yes, the installation should be good.

Thanks.

Hello, thank you for the response. I installed again on larger SD (first tensorflow then PyTorch), and kept the docker service alive (just in case), as well created a 12GB swapfile. The model processing starts now after 3 min at the first time and within a few seconds the second time.

Thank u.