PyTorch for Jetson

Hi dusty_nv,
Thanks for your reply.

I have no much idea about python, docker and pytorch.
I have read the file docker_build_ml.sh, and I found the code below

# PyTorch v1.9.0
build_pytorch "https://nvidia.box.com/shared/static/h1z9sw4bb1ybi0rm3tu8qdj8hs05ljbm.whl" \
      "torch-1.9.0-cp36-cp36m-linux_aarch64.whl" \
      "l4t-pytorch:r$L4T_VERSION-pth1.9-py3" \
      "v0.10.0" \
      "pillow" \
      "v0.9.0"

It looks like we don’t build the pytorch from the official source. Maybe the official source has also this problem, or this problem was imported in the “modified” source in nvidia.box.com. I have tested the l4t-ml:r32.6.1-py3 image, it has this problem also.

Can I download PyTorch1.9.0 in JetPack 4.6 of Jetson Xavier nx? If yes then pls tell how?

PyTorch is built from the official sources, I just have to build it for ARM/aarch64 with CUDA enabled. torchaudio is built from source in the Dockerfile: https://github.com/dusty-nv/jetson-containers/blob/d58ce7eb0afbb3c2706fc62d26f69e7055384484/Dockerfile.pytorch#L92

Yes, JetPack 4.6 can use the same wheels as JetPack 4.4/4.5:

thanks for answering

Could you please tell me how to install Jetpack 4.6 of Jetson Xavier NX on SSD?

This is an unrelated topic to this thread about PyTorch - please refer to the JetPack/L4T documentation or start a new topic on the Xavier NX forum.

Hi, I have some trouble with PyTorch and the memory release.
My hardware is a Jetson Nano 4 GB, with jetpack version 4.6.
The PyTorch version is the 1.6 downloaded at the beginning of this topic.
The python version is 3.6.
By running this very simple code:

>>> import torch
>>> t = torch.zeros([300, 300, 300, 2], device='cuda')
>>> print(torch.cuda.memory_summary())
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  210944 KB |  210944 KB |  210944 KB |       0 B  |
|       from large pool |  210944 KB |  210944 KB |  210944 KB |       0 B  |
|       from small pool |       0 KB |       0 KB |       0 KB |       0 B  |
|---------------------------------------------------------------------------|
| Active memory         |  210944 KB |  210944 KB |  210944 KB |       0 B  |
|       from large pool |  210944 KB |  210944 KB |  210944 KB |       0 B  |
|       from small pool |       0 KB |       0 KB |       0 KB |       0 B  |
|---------------------------------------------------------------------------|
| GPU reserved memory   |  210944 KB |  210944 KB |  210944 KB |       0 B  |
|       from large pool |  210944 KB |  210944 KB |  210944 KB |       0 B  |
|       from small pool |       0 KB |       0 KB |       0 KB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Allocations           |       1    |       1    |       1    |       0    |
|       from large pool |       1    |       1    |       1    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Active allocs         |       1    |       1    |       1    |       0    |
|       from large pool |       1    |       1    |       1    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       1    |       1    |       1    |       0    |
|       from large pool |       1    |       1    |       1    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|===========================================================================|

>>> del t
>>> torch.cuda.empty_cache()
>>> print(torch.cuda.memory_summary())
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |  210944 KB |  210944 KB |  210944 KB |
|       from large pool |       0 B  |  210944 KB |  210944 KB |  210944 KB |
|       from small pool |       0 B  |       0 KB |       0 KB |       0 KB |
|---------------------------------------------------------------------------|
| Active memory         |       0 B  |  210944 KB |  210944 KB |  210944 KB |
|       from large pool |       0 B  |  210944 KB |  210944 KB |  210944 KB |
|       from small pool |       0 B  |       0 KB |       0 KB |       0 KB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |       0 B  |  210944 KB |  210944 KB |  210944 KB |
|       from large pool |       0 B  |  210944 KB |  210944 KB |  210944 KB |
|       from small pool |       0 B  |       0 KB |       0 KB |       0 KB |
|---------------------------------------------------------------------------|
| Non-releasable memory |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Allocations           |       0    |       1    |       1    |       1    |
|       from large pool |       0    |       1    |       1    |       1    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Active allocs         |       0    |       1    |       1    |       1    |
|       from large pool |       0    |       1    |       1    |       1    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       0    |       1    |       1    |       1    |
|       from large pool |       0    |       1    |       1    |       1    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|===========================================================================|

I allocate a tensor t that takes 210944 KB of memory, then I delete this tensor and release the memory, as confirmed by the summary.
However by looking at the memory state with top command.

  • At the beginning of the code, the memory used is 1221148 KB.
KiB Mem :  4051104 total,  1690428 free,  1221148 used,  1139528 buff/cache
KiB Swap:  8191984 total,  8190804 free,     1180 used.  3135952 avail Mem 
  • After the tensor creation, the memory used is 2265836 KB, with an increment of 1.1 GB more or less, greater than the 200 MB allocated with the tensor, but the rest of the memory allocated should be the Cuda kernel.
KiB Mem :  4051104 total,   645740 free,  2265836 used,  1139528 buff/cache
KiB Swap:  8191984 total,  8190804 free,     1180 used.  1633088 avail Mem 
  • After the tensor destruction, the memory used is essentially the same as before.
KiB Mem :  4051104 total,   643368 free,  2268208 used,  1139528 buff/cache
KiB Swap:  8191984 total,  8190804 free,     1180 used.  1841660 avail Mem 

Now, I would like to know why this happens, and what I am missing about the PyTorch mechanism on Jetson Nano.

Thanks for your kindness.

Much of the RAM is unrelated to tensors and for the cuDNN libraries and PyTorch CUDA kernels. There have been other discussions about it in this topic, but it occurs when the first GPU tensor is created (regardless of it’s size) and these libraries get loaded.

I think you may need to call torch.cuda.empty_cache() for it to actually release the memory back to the OS.

Hello @dusty_nv. So i’ve been trying to build torch v1.9.0 from source for the Jetson TX2 for Python3.7:

  1. Jetson TX2: Jetpack 4.6

  2. Python 3.7.11 (I need torch to work on 3.7. The wheels provided for 3.6 works)

  3. Command: sudo python3.7 setup.py bdist_wheel

  4. Cloned repo: git clone --recursive --branch v1.9.0 GitHub - pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration

  5. Patched: git apply pytorch-1.9-jetpack-4.5.1.patch

  6. Error during compilation:

[9/791] Building CXX object caffe2/CMakeFiles/caffe2_pybind11_state_gpu.dir/python/pybind_state.cc.o
FAILED: caffe2/CMakeFiles/caffe2_pybind11_state_gpu.dir/python/pybind_state.cc.o 
/usr/bin/ccache /usr/bin/c++  -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dcaffe2_pybind11_state_gpu_EXPORTS -Iaten/src -I../aten/src -I. -I../ -isystem third_party/gloo -isystem ../cmake/../third_party/gloo -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -I../cmake/../third_party/benchmark/include -isystem ../cmake/../third_party/eigen -isystem /usr/include/python3.7m -isystem /usr/local/lib/python3.7/dist-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /usr/lib/aarch64-linux-gnu/openmpi/include -isystem ../cmake/../third_party/cub -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -isystem /usr/local/cuda/include -Icaffe2/aten/src/TH -I../aten/src/TH -Icaffe2/aten/src -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../third_party/miniz-2.0.8 -I../caffe2/core/nomnigraph/include -I../torch/csrc/api -I../torch/csrc/api/include -I../c10/.. -I../c10/cuda/../.. -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -fPIC   -fvisibility=hidden -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -D__NEON__ -DUSE_GCC_GET_CPUID -DTH_HAVE_THREAD -DUSE_NUMPY -pthread -std=gnu++14 -MD -MT caffe2/CMakeFiles/caffe2_pybind11_state_gpu.dir/python/pybind_state.cc.o -MF caffe2/CMakeFiles/caffe2_pybind11_state_gpu.dir/python/pybind_state.cc.o.d -o caffe2/CMakeFiles/caffe2_pybind11_state_gpu.dir/python/pybind_state.cc.o -c ../caffe2/python/pybind_state.cc
c++: internal compiler error: Segmentation fault (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
[10/791] Building CXX object caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o
FAILED: caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o 
/usr/bin/ccache /usr/bin/c++  -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -Dcaffe2_pybind11_state_EXPORTS -Iaten/src -I../aten/src -I. -I../ -isystem third_party/gloo -isystem ../cmake/../third_party/gloo -isystem ../cmake/../third_party/googletest/googlemock/include -isystem ../cmake/../third_party/googletest/googletest/include -isystem ../third_party/protobuf/src -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -isystem ../third_party/XNNPACK/include -I../cmake/../third_party/benchmark/include -isystem ../cmake/../third_party/eigen -isystem /usr/include/python3.7m -isystem /usr/local/lib/python3.7/dist-packages/numpy/core/include -isystem ../cmake/../third_party/pybind11/include -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem /usr/lib/aarch64-linux-gnu/openmpi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem /usr/lib/aarch64-linux-gnu/openmpi/include -isystem ../cmake/../third_party/cub -Icaffe2/contrib/aten -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -isystem /usr/local/cuda/include -Icaffe2/aten/src/TH -I../aten/src/TH -Icaffe2/aten/src -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -I../third_party/miniz-2.0.8 -I../caffe2/core/nomnigraph/include -I../torch/csrc/api -I../torch/csrc/api/include -I../c10/.. -I../c10/cuda/../.. -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -fPIC   -fvisibility=hidden -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -D__NEON__ -DUSE_GCC_GET_CPUID -DTH_HAVE_THREAD -DUSE_NUMPY -pthread -std=gnu++14 -MD -MT caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o -MF caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o.d -o caffe2/CMakeFiles/caffe2_pybind11_state.dir/python/pybind_state.cc.o -c ../caffe2/python/pybind_state.cc
c++: internal compiler error: Segmentation fault (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
[14/791] Building CXX object caffe2/CMakeFiles/cpu_rng_test.dir/__/aten/src/ATen/test/cpu_rng_test.cpp.o
ninja: build stopped: subcommand failed.

Hello,

I’m having difficulty installing torchvision v0.10.0 for PyTorch v1.9 on a Jetson Nano 2GB running JetPack 4.6 (fresh install on a 256 GB microSD) using the same instructions (copy and paste, with modifications to reflect versions). When I run:

export BUILD_VERSION=0.10.0
python3 setup.py install --user

I get:

Illegal instruction (core dumped)

Any ideas? Can this only be done on a 4GB Nano?

Thank you.

Hi @venketramana1, sorry I haven’t built PyTorch for Python 3.7 before. You may find this thread about building PyTorch for Python 3.8 useful:

Other recommendations to try would be trying a different version of PyTorch, or upgrading your toolchain/gcc version.

Hi @davidb1, I haven’t seen this problem before - can you keep an eye on the memory usage during compilation and mount additional swap (and disable ZRAM). Like this: https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-transfer-learning.md#mounting-swap

Alternatively, you can use the l4t-pytorch container which already has torchvision precompiled, so you needn’t build it yourself.

Hi @dusty_nv, I have tried disabling the ZRAM and increasing the swap to 6GB and still get the same result. The main reason why I am trying to do this is to modify the JetCard (GitHub - NVIDIA-AI-IOT/jetcard: An SD card image for web programming AI projects with NVIDIA Jetson Nano) install.sh for that it will run with JetPack 4.6. There must be a way to do this as the l4t-ml container has PyTorch and torchvision for JetPack 4.6.

Running jtop on another terminal doesn’t show much of an increase in main memory use before the build fails with the illegal instruction and core dump. Doesn’t look like a memory issue. Maybe I should look at setup.py, looks like a lot of work when I can easily use the l4t-ml container.

Hmm. Are you able to try a different version of torchvision and see if you still get the segfault?

Hi @dusty_nv, I’ve previously tried it with PyTorch v1.8 and torchvision v0.9.0 and I still had the same issue, though not with disabling ZRAM and increasing the swap size. I’m currently looking at the Dockerfile scripts at GitHub - dusty-nv/jetson-containers: Machine Learning Containers for NVIDIA Jetson and JetPack-L4T to try and get an idea of what I may be doing wrong. Looking at the Dockerfile, I can’t see much difference between the install instructions above and the Dockerfile.

I might try the install on a Jetson Nano 4 GB to see if that makes a difference.

OK thanks, let us know if that makes a difference. Although I build the containers on Xavier, I know other folks have built torchvision before on Nano 2GB, so not sure what the issue is.

Hi @dusty_nv, I have just tried on the Jetson Nano 2GB, clean install with JetPack 4.6 on a 256GB microSD card, with ZRAM disabled, 8G swap and it is compiling, just using the install commands given above.

Looks like some packages that were installed prior to PyTorch and torchvision, might be the issue. I’ll investigate that later. Thanks.

Why my torchvision only works with sudo python3.

I am working on Jetson Nano B01 board. I follow the above instruction to install PyTorch 1.8.0. The only difference is that when I run pip3 install numpy torch-1.8.0-cp36-cp36m-linux_aarch64.whl and python3 setup.py install --user, sudo is needed. otherwise

Exception:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/usr/lib/python3/dist-packages/pip/commands/install.py", line 360, in run
    prefix=options.prefix_path,
  File "/usr/lib/python3/dist-packages/pip/req/req_set.py", line 784, in install
    **kwargs
  File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 851, in install
    self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
  File "/usr/lib/python3/dist-packages/pip/req/req_install.py", line 1064, in move_wheel_files
    isolated=self.isolated,
  File "/usr/lib/python3/dist-packages/pip/wheel.py", line 247, in move_wheel_files
    prefix=prefix,
  File "/usr/lib/python3/dist-packages/pip/locations.py", line 153, in distutils_scheme
    i.finalize_options()
  File "/usr/share/python-wheels/setuptools-39.0.1-py2.py3-none-any.whl/setuptools/command/install.py", line 38, in finalize_options
    orig.install.finalize_options(self)
  File "/usr/lib/python3.6/distutils/command/install.py", line 351, in finalize_options
    self.create_home_path()
  File "/usr/lib/python3.6/distutils/command/install.py", line 581, in create_home_path
    os.makedirs(path, 0o700)
  File "/usr/lib/python3.6/os.py", line 210, in makedirs
    makedirs(head, mode, exist_ok)
  File "/usr/lib/python3.6/os.py", line 220, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/xinqifan2/.local/lib/python3.6'

After running the above command with sudo for pytorch and torchvision installation. Then, at the verification time. I can import pytorch without sudo, but if I do it for torchvision, it does not work. See below

xinqifan2@xinqifan2-desktop:~$ python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
1.8.0
>>> print('CUDA available: ' + str(torch.cuda.is_available()))
CUDA available: True
>>> print('cuDNN version: ' + str(torch.backends.cudnn.version()))
cuDNN version: 8201
>>> a = torch.cuda.FloatTensor(2).zero_()
>>> print('Tensor a = ' + str(a))
Tensor a = tensor([0., 0.], device='cuda:0')
>>> b = torch.randn(2).cuda()
>>> print('Tensor b = ' + str(b))
Tensor b = tensor([-1.2091,  0.6310], device='cuda:0')
>>> c = a + b
>>> print('Tensor c = ' + str(c))
Tensor c = tensor([-1.2091,  0.6310], device='cuda:0')
>>> import torchvision
>>> print(torchvision.__version__)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'torchvision' has no attribute '__version__' 

However, if I do this with sudo python3, it works

xinqifan2@xinqifan2-desktop:~$ sudo python3
[sudo] password for xinqifan2:
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torchvision
>>> print(torchvision.__version__)
0.9.0a0+01dfa8e
>>> import torch
>>> print(torch.__version__)
1.8.0
>>> print('CUDA available: ' + str(torch.cuda.is_available()))
CUDA available: True

What happened with my python or any packages? Why sudo is needed?

Hi @xinqi.fan, hmm I am not sure. Typically I do not use sudo with it. You may want to file against the torchvision github if you want to dig into it more.