Hi dusty, I face the same problem of torchvision interpoerability with pytorch. If I try using the container, then the memory on eMMC maxes out.
I am using a Xavier AGX module and I have inserted a 128 GB SD Card. Is there a setting to flash the SD card and use it as Root? How can I go about this?
Then I can clone the container and use it directly. Please let me know if there is a way
Also make sure your SD card gets mounted at boot-up with an entry in /etc/fstab, or else when the docker daemon initializes, your SD card containing the docker data-root wonât have been mounted yet.
I was facing the exact same issue yesterday. Issue was resolved after downgrading Pillow:
(rlms) nvidia@xavier:/srv/rlms/detect$ python3
Python 3.6.9 (default, Jun 29 2022, 11:45:57)
[GCC 8.4.0] on linux
Type âhelpâ, âcopyrightâ, âcreditsâ or âlicenseâ for more information.
>>> import PIL
>>> print (PIL.version)
7.1.2
Hi dusty. For some reason, the docker doesnât relocate even after following all the instructions and goes back to var/lib/docker.
I require using yolov5. Is it possible to install the py3.6 wheels for torch and torchvision, then upgrade python to 3.8 and execute yolov5. Does that work?
What I do is add "data-root": "/new_dir_structure/docker" to /etc/docker/daemon.json and then reboot after making sure my drive gets auto-mounted in /etc/fstab
If you are on JetPack 4.x, we donât have the pre-built PyTorch wheels for Python 3.8, but you could build them yourself. On JetPack 5.x, the PyTorch wheels are for Python 3.8. Upgrading Python wonât upgrade PyTorch automatically from Python 3.6 to 3.8 because you still need that wheel.
May I know if there is anyway to reduce memory usage of pytorch? It spent around 2GB memory when I try to initialize CUDA context even I am doing a very simple inferencing. I suspect it is related to the number of ops need to be loaded in to CUDA kernel, so I researched there is a SELECTED_OP_LIST option in pytorch, but doesnât seems to work outside mobile build.
Hi @richardfat7, unfortunately we have not found a way to reduce it. It appears you are correct that SELECTED_OP_LIST only applies to the mobile builds.
Hi @dusty_nv, have you ever experienced below issue? pytorch version is 1.11.0, torchaudio is 0.11.0. Torchaudiion was installed via pip3.
raceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jetson/.local/lib/python3.8/site-packages/torchaudio/__init__.py", line 1, in <module>
from torchaudio import ( # noqa: F401
File "/home/jetson/.local/lib/python3.8/site-packages/torchaudio/_extension.py", line 103, in <module>
_init_extension()
File "/home/jetson/.local/lib/python3.8/site-packages/torchaudio/_extension.py", line 88, in _init_extension
_load_lib("libtorchaudio")
File "/home/jetson/.local/lib/python3.8/site-packages/torchaudio/_extension.py", line 51, in _load_lib
torch.ops.load_library(path)
File "/home/jetson/.local/lib/python3.8/site-packages/torch/_ops.py", line 220, in load_library
ctypes.CDLL(path)
File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/jetson/.local/lib/python3.8/site-packages/torchaudio/lib/libtorchaudio.so: undefined symbol: _ZNK5torch8autograd4Node4nameEv
Hi @dusty_nv,
I have noted that there is torch 1.11 for Jetson Orin but there is no corresponding torchvision for torch 1.11. Is there a compatible torchvision that I can install for torch 1.11 on Jetson Orin?
Jetson Orin only supports JetPack 5.x. You could try building an earlier PyTorch from source, but Iâm not sure if older PyTorch versions would support the minimum CUDA/cuDNN versions from JetPack 5.x.
Hi @Andrey1984, I donât believe these new official wheels are build with distributed enabled, so you would need to build it with distributed turned on if you needed that.
I just follow my normal build process (in the Build From Source section above) and make sure I have libopenmpi-dev installed first. Then unless you explicitly set USE_DISTRIBUTED=0 environment variable, it will enable distributed.
passed this step using the argument
It seems for 1.13.0 pytorch the torchvision version needs to be as follows git clone --branch v0.13.1 https://github.com/pytorch/vision torchvision
right?
building torchvision seems to result in
File "/home/nvidia/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 544, in unix_cuda_flags
cflags + _get_cuda_arch_flags(cflags))
File "/home/nvidia/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1789, in _get_cuda_arch_flags
raise ValueError(f"Unknown CUDA arch ({arch}) or GPU not supported")
ValueError: Unknown CUDA arch (8.7) or GPU not supported