I’m revisiting my Docker process to build PyTorch from source. Are there patches for PyTorch 1.11 and beyond, or have the fixes been migrated into the PyTorch code base?
Hi
Can you tell me how to install MAGMA and its version number? and how to compile it form the source code of pytorch?
Hi @znmeb, I haven’t built PyTorch 1.11, but suffice it to say that my 1.10 patches would be a good starting point. Also, I’m not sure if Python 3.6 is supported past PyTorch 1.10, so you may need JetPack 5.0 (or upgrade Python if you are on Python 4.x) to build it.
Hi @robert.scheffler, do you have CUDA Toolkit installed okay? Can you check the following directory:
ls -ll /usr/local/cuda/lib64/
total 3626264
lrwxrwxrwx 1 root root 17 Nov 15 04:07 libcublasLt.so -> libcublasLt.so.11
lrwxrwxrwx 1 root root 24 Nov 15 04:07 libcublasLt.so.11 -> libcublasLt.so.11.6.5.24
-rw-r--r-- 1 root root 371525152 Nov 15 04:07 libcublasLt.so.11.6.5.24
-rw-r--r-- 1 root root 502851542 Nov 15 04:07 libcublasLt_static.a
lrwxrwxrwx 1 root root 15 Nov 15 04:07 libcublas.so -> libcublas.so.11
lrwxrwxrwx 1 root root 22 Nov 15 04:07 libcublas.so.11 -> libcublas.so.11.6.5.24
-rw-r--r-- 1 root root 168021872 Nov 15 04:07 libcublas.so.11.6.5.24
-rw-r--r-- 1 root root 212415888 Nov 15 04:07 libcublas_static.a
-rw-r--r-- 1 root root 796212 Nov 15 02:30 libcudadevrt.a
lrwxrwxrwx 1 root root 17 Nov 15 02:30 libcudart.so -> libcudart.so.11.0
lrwxrwxrwx 1 root root 21 Nov 15 02:30 libcudart.so.11.0 -> libcudart.so.11.4.167
-rw-r--r-- 1 root root 670808 Nov 15 02:30 libcudart.so.11.4.167
-rw-r--r-- 1 root root 1078022 Nov 15 02:30 libcudart_static.a
lrwxrwxrwx 1 root root 13 Nov 17 03:57 libcudla.so -> libcudla.so.1
lrwxrwxrwx 1 root root 17 Nov 17 03:57 libcudla.so.1 -> libcudla.so.1.0.0
-rw-r--r-- 1 root root 159296 Nov 17 03:57 libcudla.so.1.0.0
lrwxrwxrwx 1 root root 14 Nov 15 03:49 libcufft.so -> libcufft.so.10
lrwxrwxrwx 1 root root 21 Nov 15 03:49 libcufft.so.10 -> libcufft.so.10.6.0.71
-rw-r--r-- 1 root root 174702496 Nov 15 03:49 libcufft.so.10.6.0.71
-rw-r--r-- 1 root root 215629292 Nov 15 03:49 libcufft_static.a
-rw-r--r-- 1 root root 187336232 Nov 15 03:49 libcufft_static_nocallback.a
lrwxrwxrwx 1 root root 15 Nov 15 03:49 libcufftw.so -> libcufftw.so.10
lrwxrwxrwx 1 root root 22 Nov 15 03:49 libcufftw.so.10 -> libcufftw.so.10.6.0.71
-rw-r--r-- 1 root root 740776 Nov 15 03:49 libcufftw.so.10.6.0.71
-rw-r--r-- 1 root root 30202 Nov 15 03:49 libcufftw_static.a
-rw-r--r-- 1 root root 1436538 Nov 15 02:22 libcufilt.a
-rw-r--r-- 1 root root 33242 Nov 15 02:30 libculibos.a
lrwxrwxrwx 1 root root 16 Nov 15 03:52 libcupti.so -> libcupti.so.11.4
lrwxrwxrwx 1 root root 20 Nov 15 03:52 libcupti.so.11.4 -> libcupti.so.2021.2.2
-rw-r--r-- 1 root root 5782696 Nov 15 03:52 libcupti.so.2021.2.2
lrwxrwxrwx 1 root root 15 Nov 12 13:45 libcurand.so -> libcurand.so.10
lrwxrwxrwx 1 root root 23 Nov 12 13:45 libcurand.so.10 -> libcurand.so.10.2.5.165
-rw-r--r-- 1 root root 81480832 Nov 12 13:45 libcurand.so.10.2.5.165
-rw-r--r-- 1 root root 81438022 Nov 12 13:45 libcurand_static.a
lrwxrwxrwx 1 root root 19 Nov 12 13:56 libcusolverMg.so -> libcusolverMg.so.11
lrwxrwxrwx 1 root root 27 Nov 12 13:56 libcusolverMg.so.11 -> libcusolverMg.so.11.2.0.165
-rw-r--r-- 1 root root 258827504 Nov 12 13:56 libcusolverMg.so.11.2.0.165
lrwxrwxrwx 1 root root 17 Nov 12 13:56 libcusolver.so -> libcusolver.so.11
lrwxrwxrwx 1 root root 25 Nov 12 13:56 libcusolver.so.11 -> libcusolver.so.11.2.0.165
-rw-r--r-- 1 root root 218556608 Nov 12 13:56 libcusolver.so.11.2.0.165
-rw-r--r-- 1 root root 211452066 Nov 12 13:56 libcusolver_static.a
lrwxrwxrwx 1 root root 17 Nov 12 13:50 libcusparse.so -> libcusparse.so.11
lrwxrwxrwx 1 root root 25 Nov 12 13:50 libcusparse.so.11 -> libcusparse.so.11.6.0.165
-rw-r--r-- 1 root root 230611448 Nov 12 13:50 libcusparse.so.11.6.0.165
-rw-r--r-- 1 root root 256717656 Nov 12 13:50 libcusparse_static.a
-rw-r--r-- 1 root root 15858550 Nov 12 13:56 liblapack_static.a
-rw-r--r-- 1 root root 909274 Nov 12 13:56 libmetis_static.a
lrwxrwxrwx 1 root root 13 Nov 12 14:00 libnppc.so -> libnppc.so.11
lrwxrwxrwx 1 root root 21 Nov 12 14:00 libnppc.so.11 -> libnppc.so.11.4.0.155
-rw-r--r-- 1 root root 1564840 Nov 12 14:00 libnppc.so.11.4.0.155
-rw-r--r-- 1 root root 26846 Nov 12 14:00 libnppc_static.a
lrwxrwxrwx 1 root root 15 Nov 12 14:00 libnppial.so -> libnppial.so.11
lrwxrwxrwx 1 root root 23 Nov 12 14:00 libnppial.so.11 -> libnppial.so.11.4.0.155
-rw-r--r-- 1 root root 13533736 Nov 12 14:00 libnppial.so.11.4.0.155
-rw-r--r-- 1 root root 15378762 Nov 12 14:00 libnppial_static.a
lrwxrwxrwx 1 root root 15 Nov 12 14:00 libnppicc.so -> libnppicc.so.11
lrwxrwxrwx 1 root root 23 Nov 12 14:00 libnppicc.so.11 -> libnppicc.so.11.4.0.155
-rw-r--r-- 1 root root 6509104 Nov 12 14:00 libnppicc.so.11.4.0.155
-rw-r--r-- 1 root root 6291604 Nov 12 14:00 libnppicc_static.a
lrwxrwxrwx 1 root root 16 Nov 12 14:00 libnppidei.so -> libnppidei.so.11
lrwxrwxrwx 1 root root 24 Nov 12 14:00 libnppidei.so.11 -> libnppidei.so.11.4.0.155
-rw-r--r-- 1 root root 9937808 Nov 12 14:00 libnppidei.so.11.4.0.155
-rw-r--r-- 1 root root 11479354 Nov 12 14:00 libnppidei_static.a
lrwxrwxrwx 1 root root 14 Nov 12 14:00 libnppif.so -> libnppif.so.11
lrwxrwxrwx 1 root root 22 Nov 12 14:00 libnppif.so.11 -> libnppif.so.11.4.0.155
-rw-r--r-- 1 root root 79115976 Nov 12 14:00 libnppif.so.11.4.0.155
-rw-r--r-- 1 root root 82495146 Nov 12 14:00 libnppif_static.a
lrwxrwxrwx 1 root root 14 Nov 12 14:00 libnppig.so -> libnppig.so.11
lrwxrwxrwx 1 root root 22 Nov 12 14:00 libnppig.so.11 -> libnppig.so.11.4.0.155
-rw-r--r-- 1 root root 34841224 Nov 12 14:00 libnppig.so.11.4.0.155
-rw-r--r-- 1 root root 36462618 Nov 12 14:00 libnppig_static.a
lrwxrwxrwx 1 root root 14 Nov 12 14:00 libnppim.so -> libnppim.so.11
lrwxrwxrwx 1 root root 22 Nov 12 14:00 libnppim.so.11 -> libnppim.so.11.4.0.155
-rw-r--r-- 1 root root 8880704 Nov 12 14:00 libnppim.so.11.4.0.155
-rw-r--r-- 1 root root 8057652 Nov 12 14:00 libnppim_static.a
lrwxrwxrwx 1 root root 15 Nov 12 14:00 libnppist.so -> libnppist.so.11
lrwxrwxrwx 1 root root 23 Nov 12 14:00 libnppist.so.11 -> libnppist.so.11.4.0.155
-rw-r--r-- 1 root root 34354008 Nov 12 14:00 libnppist.so.11.4.0.155
-rw-r--r-- 1 root root 36021764 Nov 12 14:00 libnppist_static.a
lrwxrwxrwx 1 root root 15 Nov 12 14:00 libnppisu.so -> libnppisu.so.11
lrwxrwxrwx 1 root root 23 Nov 12 14:00 libnppisu.so.11 -> libnppisu.so.11.4.0.155
-rw-r--r-- 1 root root 658520 Nov 12 14:00 libnppisu.so.11.4.0.155
-rw-r--r-- 1 root root 11458 Nov 12 14:00 libnppisu_static.a
lrwxrwxrwx 1 root root 15 Nov 12 14:00 libnppitc.so -> libnppitc.so.11
lrwxrwxrwx 1 root root 23 Nov 12 14:00 libnppitc.so.11 -> libnppitc.so.11.4.0.155
-rw-r--r-- 1 root root 4551016 Nov 12 14:00 libnppitc.so.11.4.0.155
-rw-r--r-- 1 root root 3593810 Nov 12 14:00 libnppitc_static.a
lrwxrwxrwx 1 root root 13 Nov 12 14:00 libnpps.so -> libnpps.so.11
lrwxrwxrwx 1 root root 21 Nov 12 14:00 libnpps.so.11 -> libnpps.so.11.4.0.155
-rw-r--r-- 1 root root 18404344 Nov 12 14:00 libnpps.so.11.4.0.155
-rw-r--r-- 1 root root 18500000 Nov 12 14:00 libnpps_static.a
lrwxrwxrwx 1 root root 15 Nov 15 04:07 libnvblas.so -> libnvblas.so.11
lrwxrwxrwx 1 root root 22 Nov 15 04:07 libnvblas.so.11 -> libnvblas.so.11.6.5.24
-rw-r--r-- 1 root root 712192 Nov 15 04:07 libnvblas.so.11.6.5.24
-rw-r--r-- 1 root root 14228496 Nov 15 03:52 libnvperf_host.so
-rw-r--r-- 1 root root 2208728 Nov 15 03:52 libnvperf_target.so
-rw-r--r-- 1 root root 18399504 Nov 12 13:46 libnvptxcompiler_static.a
lrwxrwxrwx 1 root root 25 Nov 12 13:48 libnvrtc-builtins.so -> libnvrtc-builtins.so.11.4
lrwxrwxrwx 1 root root 29 Nov 12 13:48 libnvrtc-builtins.so.11.4 -> libnvrtc-builtins.so.11.4.166
-rw-r--r-- 1 root root 6883128 Nov 12 13:48 libnvrtc-builtins.so.11.4.166
lrwxrwxrwx 1 root root 16 Nov 12 13:48 libnvrtc.so -> libnvrtc.so.11.2
lrwxrwxrwx 1 root root 20 Nov 12 13:48 libnvrtc.so.11.2 -> libnvrtc.so.11.4.166
-rw-r--r-- 1 root root 40962912 Nov 12 13:48 libnvrtc.so.11.4.166
lrwxrwxrwx 1 root root 18 Nov 12 14:06 libnvToolsExt.so -> libnvToolsExt.so.1
lrwxrwxrwx 1 root root 22 Nov 12 14:06 libnvToolsExt.so.1 -> libnvToolsExt.so.1.0.0
-rw-r--r-- 1 root root 44088 Nov 12 14:06 libnvToolsExt.so.1.0.0
drwxr-xr-x 2 root root 4096 Mar 24 18:02 stubs
Hi! I’ve compiled torch1.10.0 from source with Clang on Xavier NX for python3.8, and the process took 9 hours.
Here’s the Google Drive link:
And here’s the Baidu Net Disk link:
I’m not sure whether the extraction code is necessary when you download through the Baidu link, if needed, the extraction code is:
vhys
I hope it helps you.
I have installed Jetpack 5.0 on my Jetson Xavier AGX and I am trying to create a torchscript file from Detectron2 weights. I have used the torch-1.12 provided and torch & torchvision seem to be correctly installed when I open a python shell and print their versions. However when I try to create a torchscript model I get errors about missing torchvision ops, similar as posted here: MIssing torchvision::nms error in the C++ CUDA TorchVision API · Issue #5697 · pytorch/vision · GitHub.
Most things I seem to point to an incompatible torch & torchvision version but according to (torchvision · PyPI) the versions in the 1.12 and 1.10 wheels for pytorch I got from here (jetson-containers/docker_build_ml.sh at master · dusty-nv/jetson-containers · GitHub) are compatible.
The full error I get is:
RuntimeError:
object has no attribute nms:
File "/home/jetsonxavier/.local/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 35
"""
_assert_has_ops()
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
'nms' is being compiled since it was called from '_batched_nms_vanilla'
File "/home/jetsonxavier/.local/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 102
for class_id in torch.unique(idxs):
curr_indices = torch.where(idxs == class_id)[0]
curr_keep_indices = nms(boxes[curr_indices], scores[curr_indices], iou_threshold)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
keep_mask[curr_indices[curr_keep_indices]] = True
keep_indices = torch.where(keep_mask)[0]
'_batched_nms_vanilla' is being compiled since it was called from 'batched_nms'
File "/home/jetsonxavier/.local/lib/python3.8/site-packages/torchvision/ops/boxes.py", line 66
# Ideally for GPU we'd use a higher threshold
if boxes.numel() > 4_000 and not torchvision._is_tracing():
return _batched_nms_vanilla(boxes, scores, idxs, iou_threshold)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
else:
return _batched_nms_coordinate_trick(boxes, scores, idxs, iou_threshold)
'batched_nms' is being compiled since it was called from 'batched_nms'
File "/home/jetsonxavier/Projects/F3D/detectron2/detectron2/layers/nms.py", line 20
# just call it directly.
# Fp16 does not have enough range for batched NMS, so adding float().
return box_ops.batched_nms(boxes.float(), scores, idxs, iou_threshold)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
'batched_nms' is being compiled since it was called from 'find_top_rpn_proposals'
File "/home/jetsonxavier/Projects/F3D/detectron2/detectron2/modeling/proposal_generator/proposal_utils.py", line 112
boxes, scores_per_img, lvl = boxes[keep], scores_per_img[keep], lvl[keep]
keep = batched_nms(boxes.tensor, scores_per_img, lvl, nms_thresh)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
# In Detectron1, there was different behavior during training vs. testing.
# (https://github.com/facebookresearch/Detectron/issues/459)
'find_top_rpn_proposals' is being compiled since it was called from 'RPN.predict_proposals'
File "/home/jetsonxavier/Projects/F3D/detectron2/detectron2/modeling/proposal_generator/rpn.py", line 503
with torch.no_grad():
pred_proposals = self._decode_proposals(anchors, pred_anchor_deltas)
return find_top_rpn_proposals(
~~~~~~~~~~~~~~~~~~~~~~~
pred_proposals,
~~~~~~~~~~~~~~~
pred_objectness_logits,
~~~~~~~~~~~~~~~~~~~~~~~
image_sizes,
~~~~~~~~~~~~
self.nms_thresh,
~~~~~~~~~~~~~~~~
self.pre_nms_topk[self.training],
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
self.post_nms_topk[self.training],
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
self.min_box_size,
~~~~~~~~~~~~~~~~~~
self.training,
~~~~~~~~~~~~~ <--- HERE
)
'RPN.predict_proposals' is being compiled since it was called from 'RPN.forward'
File "/home/jetsonxavier/Projects/F3D/detectron2/detectron2/modeling/proposal_generator/rpn.py", line 477
else:
losses = {}
proposals = self.predict_proposals(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
anchors, pred_objectness_logits, pred_anchor_deltas, images.image_sizes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
)
return proposals, losses
This is the error message when I try to create torchscript file. When I try to run a .pth weights file I get the following error message:
RuntimeError: Couldn't load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible,
or if you had errors while compiling torchvision from source. For further information on the compatible versions, check https://github.com/pytorch/vision#installation
for the compatibility matrix. Please check your PyTorch version with torch.__version__ and your torchvision version with torchvision.__version__ and
verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install
They both seem to be related to the torch & torchvision compatibility.
Has anybody encountered the same problems?
- Python 3.6 - [
torch-1.10.0-cp36-cp36m-linux_aarch64.whl
] can not download , why?
This was the error… Thank you so much.
Hi @rmj54, this link is working for me, are you able to try again? Perhaps it’s a network issue from your end?
Hi, this link does not work. 1.10 works but all the rest are down.
Tried this too: wget https://nvidia.box.com/shared/static/p57jwntv436lfrd78inwl7iml6p13fzh.whl -O torch-1.8.0-cp36-cp36m-linux_aarch64.whl
Same story. The download stops after connected.
Tried this too: wget https://nvidia.box.com/shared/static/p57jwntv436lfrd78inwl7iml6p13fzh.whl 1 -O torch-1.8.0-cp36-cp36m-linux_aarch64.whl
Same story. The download stops after connected.
Can you try again, perhaps from your PC browser? I am able to now, so it may have been a temporary issue?
PyTorch 1.11 for JetPack 5.0 Developer Preview and Xavier/Orin has been posted:
PyTorch v1.11.0
- JetPack 5.0 Developer Preview (L4T R34.1.0)
- Python 3.8 -
torch-1.11.0-cp38-cp38-linux_aarch64.whl
I try install with 1.6 1.7 1.10.,After install ,I import torch,the system reply:
module ‘typing’ has no attribute ‘_SpecialForm’
Hi @1263032440, are you using Python 3.6? Can you try using the l4t-pytorch container to rule out an environment issue?
I am using jetpack 5.0 and my pytorch is 1.12.0a0+2c916ef.nv22.3
I am trying the yolov5 and I need torch vision I installed the main branch of torchvision but it gives me the incompatible error.
RuntimeError: Couldn’t load custom C++ ops. This can happen if your PyTorch and torchvision versions are incompatible, or if you had errors while compiling torchvision from source. For further information on the compatible versions, check GitHub - pytorch/vision: Datasets, Transforms and Models specific to Computer Vision for the compatibility matrix. Please check your PyTorch version with torch.version and your torchvision version with torchvision.version and verify if they are compatible, and if not please reinstall torchvision so that it matches your PyTorch install.
where can I find the torchvision compatible with torch 1.12.0a0+2c916ef.nv22.3
yes,i use the 3.6.,and how can i get the container you described?
You can select one of the tags from here that’s compatible with your L4T version (you can check this with cat /etc/nv_tegra_release
)
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-pytorch/tags
And then the command to start the container is listed on this page: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-pytorch
Hi @user22290, can you try this torchvision commit? https://github.com/pytorch/vision/commit/e5a5f0be
I am using the main branch. It has all the changes in this commit. The changes related to this commit is on the unittest/windows, while I am using linux.