PyTorch for Jetson - version 1.11 now available

After I run the code ‘sudo pip3 install torch-1.7.0-cp36-cp36m-linux_aarch64.whl’
it shows that torch 1.7.0 is successfully installed
But when I ‘import toch’
It still shows ‘No module named torch’
I dont know why
Thanks!

Hi @619914127, are you running python3 to import torch? Can you run python3 -c 'import torch'?

It still dont work
I cant even find it in my pip list

I believe it’s because you are running python and pip which are the Python 2.7 versions - whereas you installed the wheel for Python 3.6 with pip3. Try using python3 and pip3 instead.

I am having Jetson Xavier nx with jetpack 4.4
I am unable to install pytorch 1.5.1 through the commands given above.
PyTorch v1.5.0

Please tell how to install that as soon as possible.

Firstly it get install successfully but when I downloaded the other libraries given below:-
numpy
pillow
scikit-learn
tqdm
albumentations
jupyterlab
matplotlib
natsort
scikit-image>=0.16.1
tensorboardx
tensorboard
torchcontrib
tifffile
pygit2
Archiconda

then it automatically get removed and Now it is not downloading again throwing the error.

Hi @vashisht.akshat.rn.04, I believe your issue is that you are using pip to try an install the wheel, when you should be using pip3 (pip is for Python 2.7 and pip3 is for Python 3.6 - and these wheels are for Python 3.6)

Also, please make sure the wheel you are installing is compatible with the version of JetPack you have. That wheel you linked to is only for JP 4.4 Developer Preview (L4T R32.4.2). If you are on the JP 4.4 production release (L4T R32.4.3) you would want to use a more recent wheel.

I could not import torchaudio.

As the instructions in l4t-pytorch ,

sudo docker pull nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3
sudo docker run -it --rm --runtime nvidia --network host nvcr.io/nvidia/l4t-pytorch:r32.6.1-pth1.9-py3

I started l4t-pytorch container.
But when I ran the code

import torch
import torchaudio

I got a warning message

/usr/local/lib/python3.6/dist-packages/torchaudio-0.9.0a0+33b2469-py3.6-linux-aarch64.egg/torchaudio/backend/utils.py:67: UserWarning: No audio backend is available.
  warnings.warn('No audio backend is available.')

And then, I installed the sox

pip3 install sox

But I still got the same waring message.

Is there anyone knows how to solve this problem, any advice would be appreciated.
Thanks.

Hmm I tried rebuilding the container after adding pip3 install sox (I already had apt-get install sox in the dockerfile), but still got this issue. Will require further investigation…if you figure it out, let me know.

Hi dusty_nv,
Thanks for your reply.

I have no much idea about python, docker and pytorch.
I have read the file docker_build_ml.sh, and I found the code below

# PyTorch v1.9.0
build_pytorch "https://nvidia.box.com/shared/static/h1z9sw4bb1ybi0rm3tu8qdj8hs05ljbm.whl" \
      "torch-1.9.0-cp36-cp36m-linux_aarch64.whl" \
      "l4t-pytorch:r$L4T_VERSION-pth1.9-py3" \
      "v0.10.0" \
      "pillow" \
      "v0.9.0"

It looks like we don’t build the pytorch from the official source. Maybe the official source has also this problem, or this problem was imported in the “modified” source in nvidia.box.com. I have tested the l4t-ml:r32.6.1-py3 image, it has this problem also.

Can I download PyTorch1.9.0 in JetPack 4.6 of Jetson Xavier nx? If yes then pls tell how?

PyTorch is built from the official sources, I just have to build it for ARM/aarch64 with CUDA enabled. torchaudio is built from source in the Dockerfile: https://github.com/dusty-nv/jetson-containers/blob/d58ce7eb0afbb3c2706fc62d26f69e7055384484/Dockerfile.pytorch#L92

Yes, JetPack 4.6 can use the same wheels as JetPack 4.4/4.5:

thanks for answering

Could you please tell me how to install Jetpack 4.6 of Jetson Xavier NX on SSD?

This is an unrelated topic to this thread about PyTorch - please refer to the JetPack/L4T documentation or start a new topic on the Xavier NX forum.

Hi, I have some trouble with PyTorch and the memory release.
My hardware is a Jetson Nano 4 GB, with jetpack version 4.6.
The PyTorch version is the 1.6 downloaded at the beginning of this topic.
The python version is 3.6.
By running this very simple code:

>>> import torch
>>> t = torch.zeros([300, 300, 300, 2], device='cuda')
>>> print(torch.cuda.memory_summary())
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  210944 KB |  210944 KB |  210944 KB |       0 B  |
|       from large pool |  210944 KB |  210944 KB |  210944 KB |       0 B  |
|       from small pool |       0 KB |       0 KB |       0 KB |       0 B  |
|---------------------------------------------------------------------------|
| Active memory         |  210944 KB |  210944 KB |  210944 KB |       0 B  |
|       from large pool |  210944 KB |  210944 KB |  210944 KB |       0 B  |
|       from small pool |       0 KB |       0 KB |       0 KB |       0 B  |
|---------------------------------------------------------------------------|
| GPU reserved memory   |  210944 KB |  210944 KB |  210944 KB |       0 B  |
|       from large pool |  210944 KB |  210944 KB |  210944 KB |       0 B  |
|       from small pool |       0 KB |       0 KB |       0 KB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Allocations           |       1    |       1    |       1    |       0    |
|       from large pool |       1    |       1    |       1    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Active allocs         |       1    |       1    |       1    |       0    |
|       from large pool |       1    |       1    |       1    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       1    |       1    |       1    |       0    |
|       from large pool |       1    |       1    |       1    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|===========================================================================|

>>> del t
>>> torch.cuda.empty_cache()
>>> print(torch.cuda.memory_summary())
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |  210944 KB |  210944 KB |  210944 KB |
|       from large pool |       0 B  |  210944 KB |  210944 KB |  210944 KB |
|       from small pool |       0 B  |       0 KB |       0 KB |       0 KB |
|---------------------------------------------------------------------------|
| Active memory         |       0 B  |  210944 KB |  210944 KB |  210944 KB |
|       from large pool |       0 B  |  210944 KB |  210944 KB |  210944 KB |
|       from small pool |       0 B  |       0 KB |       0 KB |       0 KB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |       0 B  |  210944 KB |  210944 KB |  210944 KB |
|       from large pool |       0 B  |  210944 KB |  210944 KB |  210944 KB |
|       from small pool |       0 B  |       0 KB |       0 KB |       0 KB |
|---------------------------------------------------------------------------|
| Non-releasable memory |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Allocations           |       0    |       1    |       1    |       1    |
|       from large pool |       0    |       1    |       1    |       1    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Active allocs         |       0    |       1    |       1    |       1    |
|       from large pool |       0    |       1    |       1    |       1    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       0    |       1    |       1    |       1    |
|       from large pool |       0    |       1    |       1    |       1    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|===========================================================================|

I allocate a tensor t that takes 210944 KB of memory, then I delete this tensor and release the memory, as confirmed by the summary.
However by looking at the memory state with top command.

  • At the beginning of the code, the memory used is 1221148 KB.
KiB Mem :  4051104 total,  1690428 free,  1221148 used,  1139528 buff/cache
KiB Swap:  8191984 total,  8190804 free,     1180 used.  3135952 avail Mem 
  • After the tensor creation, the memory used is 2265836 KB, with an increment of 1.1 GB more or less, greater than the 200 MB allocated with the tensor, but the rest of the memory allocated should be the Cuda kernel.
KiB Mem :  4051104 total,   645740 free,  2265836 used,  1139528 buff/cache
KiB Swap:  8191984 total,  8190804 free,     1180 used.  1633088 avail Mem 
  • After the tensor destruction, the memory used is essentially the same as before.
KiB Mem :  4051104 total,   643368 free,  2268208 used,  1139528 buff/cache
KiB Swap:  8191984 total,  8190804 free,     1180 used.  1841660 avail Mem 

Now, I would like to know why this happens, and what I am missing about the PyTorch mechanism on Jetson Nano.

Thanks for your kindness.

Much of the RAM is unrelated to tensors and for the cuDNN libraries and PyTorch CUDA kernels. There have been other discussions about it in this topic, but it occurs when the first GPU tensor is created (regardless of it’s size) and these libraries get loaded.

I think you may need to call torch.cuda.empty_cache() for it to actually release the memory back to the OS.

Hello @dusty_nv. So i’ve been trying to build torch v1.9.0 from source for the Jetson TX2 for Python3.7:

  1. Jetson TX2: Jetpack 4.6

  2. Python 3.7.11 (I need torch to work on 3.7. The wheels provided for 3.6 works)

  3. Command: sudo python3.7 setup.py bdist_wheel

  4. Cloned repo: git clone --recursive --branch v1.9.0 http://github.com/pytorch/pytorch

  5. Patched: git apply pytorch-1.9-jetpack-4.5.1.patch

  6. Error during compilation:

[9/791] Building CXX object caffe2/CMakeFiles/caffe2_pybind11_state_gpu.dir/python/pybind_state.cc.o
FAILED: caffe2/CMakeFiles/caffe2_pybind11_state_gpu.dir/python/pybind_state.cc.o 
/usr/bin/ccache /usr/bin/c++  -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dcaffe2_pybind11_state_gpu_EXPORTS -Iaten/src -I../aten/src -I. -I../ -isystem third_party/gloo -isystem ../cmake/../third_party/gloo -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -I../cmake/../third_party/benchmark/include -isystem ../cmake/../third_party/eigen -isystem /usr/include/python3.7m -isystem /usr/local/lib/python3.7/dist-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /usr/lib/aarch64-linux-gnu/openmpi/include -isystem ../cmake/../third_party/cub -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -isystem /usr/local/cuda/include -Icaffe2/aten/src/TH -I../aten/src/TH -Icaffe2/aten/src -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../third_party/miniz-2.0.8 -I../caffe2/core/nomnigraph/include -I../torch/csrc/api -I../torch/csrc/api/include -I../c10/.. -I../c10/cuda/../.. -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -fPIC   -fvisibility=hidden -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -D__NEON__ -DUSE_GCC_GET_CPUID -DTH_HAVE_THREAD -DUSE_NUMPY -pthread -std=gnu++14 -MD -MT caffe2/CMakeFiles/caffe2_pybind11_state_gpu.dir/python/pybind_state.cc.o -MF caffe2/CMakeFiles/caffe2_pybind11_state_gpu.dir/python/pybind_state.cc.o.d -o caffe2/CMakeFiles/caffe2_pybind11_state_gpu.dir/python/pybind_state.cc.o -c ../caffe2/python/pybind_state.cc
c++: internal compiler error: Segmentation fault (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
[10/791] Building CXX object caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o
FAILED: caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o 
/usr/bin/ccache /usr/bin/c++  -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dcaffe2_pybind11_state_EXPORTS -Iaten/src -I../aten/src -I. -I../ -isystem third_party/gloo -isystem ../cmake/../third_party/gloo -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -I../cmake/../third_party/benchmark/include -isystem ../cmake/../third_party/eigen -isystem /usr/include/python3.7m -isystem /usr/local/lib/python3.7/dist-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /usr/lib/aarch64-linux-gnu/openmpi/include -isystem ../cmake/../third_party/cub -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -isystem /usr/local/cuda/include -Icaffe2/aten/src/TH -I../aten/src/TH -Icaffe2/aten/src -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../third_party/miniz-2.0.8 -I../caffe2/core/nomnigraph/include -I../torch/csrc/api -I../torch/csrc/api/include -I../c10/.. -I../c10/cuda/../.. -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -fPIC   -fvisibility=hidden -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -D__NEON__ -DUSE_GCC_GET_CPUID -DTH_HAVE_THREAD -DUSE_NUMPY -pthread -std=gnu++14 -MD -MT caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o -MF caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o.d -o caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o -c ../caffe2/python/pybind_state.cc
c++: internal compiler error: Segmentation fault (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
[14/791] Building CXX object caffe2/CMakeFiles/cpu_rng_test.dir/__/aten/src/ATen/test/cpu_rng_test.cpp.o
ninja: build stopped: subcommand failed.

Hello,

I’m having difficulty installing torchvision v0.10.0 for PyTorch v1.9 on a Jetson Nano 2GB running JetPack 4.6 (fresh install on a 256 GB microSD) using the same instructions (copy and paste, with modifications to reflect versions). When I run:

export BUILD_VERSION=0.10.0
python3 setup.py install --user

I get:

Illegal instruction (core dumped)

Any ideas? Can this only be done on a 4GB Nano?

Thank you.

Hi @venketramana1, sorry I haven’t built PyTorch for Python 3.7 before. You may find this thread about building PyTorch for Python 3.8 useful:

Other recommendations to try would be trying a different version of PyTorch, or upgrading your toolchain/gcc version.