We are trying to run a simple MNIST training experiment on Xavier Dev Board with Jetpack 5.1.2, CUDA 11.4 and Torch 2.1.0.
We use the torch .whl file PyTorch for Jetson for JetPack5, Pytorch 2.1.0.
Since no torchvision .whl is provided we build it from source according to jetson-containers/packages/pytorch/torchvision at master · dusty-nv/jetson-containers · GitHub.
We add torch and torchvison using the .whl files to our environment and try to run a simple MNIST training experiment.
But at the start of the training we get the error below:
File "/mnt/nvme/.venvs/venv3_8/lib/python3.8/site-packages/torch/autograd/__init__.py", line 204, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: Event device type CUDA does not match blocking stream's device type CPU.
When we apply the same procedure for an Xavier with Jetpack 5.1.2 and CUDA 12.2 and use the same training scripts the training does not fail.
I do not see any container for Pytorch 2.1.0.
The newest version is Pytorch 2.0.
Is there any problem to build or run Pytorch 2.1.0 with Jetpack 5.1.2?
We build both .whl files. But we get errors during training.
Can we use Pytorch 2.1.0 and Torchvision 0.16.2 for model training in Xavier/Orin development boards with Jetpack 5.1.2?