I originally had a huge setup, and just decided to wipe the Jetson TX2, reinstall Jetpack, and then use Dusty’s Jetson Reinforcement script. It works ok, but only compiles for Python 2.7, can’t import it into Python 3.
So, that’s not going to work.
I’ll leave a note here when/if I get it working for Python3 on my own fork.
I did run that script with the pip changes, and also tried it with both
sudo python ...
to
sudo python3 ...
.
If I don’t use
sudo python3 ...
, it tells me it can’t find any of the libraries I installed under pip3, which makes sense since if I’m invoking pip, a ‘sudo python’ command will look for the Python 2 instance.
Then, when using
sudo python3 ...
and pip3, got this error:
Errors are:
...about 100 NVLink errors, listing the last few below along with final error log.
nvlink error : entry function '_Z28ncclAllReduceLLKernel_sum_i88ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_i328ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_f168ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_u328ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_f328ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z29ncclAllReduceLLKernel_sum_u648ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
nvlink error : entry function '_Z28ncclAllReduceLLKernel_sum_u88ncclColl' with max regcount of 80 calls function '_Z25ncclReduceScatter_max_u64P14CollectiveArgs' with regcount of 96
Makefile:83: recipe for target '/home/nvidia/jetson-reinforcement/build/pytorch/third_party/build/nccl/obj/collectives/device/devlink.o' failed
make[5]: *** [/home/nvidia/jetson-reinforcement/build/pytorch/third_party/build/nccl/obj/collectives/device/devlink.o] Error 255
Makefile:45: recipe for target 'devicelib' failed
make[4]: *** [devicelib] Error 2
Makefile:24: recipe for target 'src.build' failed
make[3]: *** [src.build] Error 2
CMakeFiles/nccl.dir/build.make:60: recipe for target 'lib/libnccl.so' failed
make[2]: *** [lib/libnccl.so] Error 2
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/nccl.dir/all' failed
make[1]: *** [CMakeFiles/nccl.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2
Failed to run 'bash ../tools/build_pytorch_libs.sh --use-cuda --use-nnpack nccl caffe2 libshm gloo c10d THD'
I’m not sure why the Jetson isn’t liking the setup. What are the regcount errors here?
Unfortunately, that doesn’t get me any further in the process. Everything breaks down around the Onnx and onnxTensorRT installs. I’m heading over to the PyTorch forums to see if they can help figure anything out.
Again, I can get this to work for Python 2, but not Python3.
Ok, so I got it installed, latest version v1.0 release. It does NOT have TensorRT installed on it. I’m still figuring out if this is a problem, or if I should just do the Pytorch->Onnx->TensorRT on a desktop GPU and then transfer that to the Jetson directly.
Here’s how I did it:
Note: Don’t use Dusty’s long install that checks out v3.0. It’s unnecessary anymore.
hello, i tried to install the pytorch follow the commet8, however it errored as:
/home/nvidia/Tools/pytorch/torch/csrc/cuda/nccl.h:21:23: error: variable or field ‘throw_nccl_error’ declared void
void throw_nccl_error(ncclResult_t status);
I have changed NCCL to 'Off’in cmakelists and added USE_NCCL = False in setup.py,why it still make the nccl?How can I solve the problem? Thanks,waiting for reply.
Finally, I was able to work my way around the problem.
The final tutorial is to be used after a clean JetPack flash.
I strongly suggest either JetPack3.2.1 or JetPack3.3
hello, I would like to compile libtorch c++ api with command
mkdir build
cd build && python …/tools/build_libtorch.py
You said nccl is only available for desktop GPU. So I change USE_NCCL=“OFF” in build_pytorch_libs.py
but it failed in progress 98%, It seems something others has used nccl.h log as follows
In file included from /home/nvidia/pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:1:0:
/home/nvidia/pytorch/torch/lib/c10d/…/c10d/ProcessGroupNCCL.hpp:17:18: fatal error: nccl.h: No such file or directory
compilation terminated.
caffe2/torch/lib/c10d/CMakeFiles/c10d.dir/build.make:206: recipe for target ‘caffe2/torch/lib/c10d/CMakeFiles/c10d.dir/ProcessGroupNCCL.cpp.o’ failed
make[2]: *** [caffe2/torch/lib/c10d/CMakeFiles/c10d.dir/ProcessGroupNCCL.cpp.o] Error 1
CMakeFiles/Makefile2:9267: recipe for target ‘caffe2/torch/lib/c10d/CMakeFiles/c10d.dir/all’ failed
make[1]: *** [caffe2/torch/lib/c10d/CMakeFiles/c10d.dir/all] Error 2
Makefile:140: recipe for target ‘all’ failed
make: *** [all] Error 2
Traceback (most recent call last):
File “…/tools/build_libtorch.py”, line 22, in
build_python=False, rerun_cmake=True, build_dir=‘.’)
File “/home/nvidia/pytorch/tools/build_pytorch_libs.py”, line 282, in build_caffe2
check_call([‘make’, ‘-j’, str(max_jobs), ‘install’], cwd=build_dir, env=my_env)
File “/usr/lib/python2.7/subprocess.py”, line 541, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘make’, ‘-j’, ‘6’, ‘install’]’ returned non-zero exit status 2
I solved the problem like this:
I went to the folder that contained nccl.h on my device.
Then, i wrote a single line that copies all .h extension files from that folder to a place I desire
Then I adapt this command to send all these .h to the problematic folder /home/nvidia/pytorch/torch/lib/c10d/…/c10d/
something like
$sudo -s
$cp /path/to/files{file1.h, nccl.h, file3.h, …} /home/nvidia/pytorch/torch/lib/c10d/…/c10d/
Then, I copy the command above and restart the installation.
Then, when it reaches close to 98% (maybe 96, maybe 97) I open a second terminal and paste the command and hit enter. Then, until the installation of the other terminal ends, I keep hitting Up-Arrow and enter every time a new line comes up on the installation. It gets stuck on 98%, then on 99% and even on 100%!
Apparently, the routine that is managing the installation does not recognize our .h files, and so, everytime we re-run the installation, it redefines the paths to the folder of these .h files. Besides that, the same installation refreshes the annoying folder that we are populating with these .h, go figure.
I did it exhaustively and finished it with success.
Tip: Do not attempt to import torch on python after installation if you forget to leave pytorch’s directory, it will not work. Do a cd … before.
hello!
i also have some problems in compile pytorch in tx2,my environment is ubuntu16.04LTS,cuda9+cudnn7,i compiled tensorflow successed,but when i install pytorch use:
git clone --recursive GitHub - pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration
cd pytorch
sudo pip install -U setuptools
sudo pip install -r requirements.txt
sudo python3 setup.py install
then,i got some problems like this:
nvlink error : entry function ‘_Z30ncclReduceTreeLLKernel_sum_f328ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z30ncclReduceRingLLKernel_sum_f328ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z30ncclReduceTreeLLKernel_sum_f168ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z30ncclReduceRingLLKernel_sum_f168ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z30ncclReduceTreeLLKernel_sum_u648ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z30ncclReduceRingLLKernel_sum_u648ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z30ncclReduceTreeLLKernel_sum_i648ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z30ncclReduceRingLLKernel_sum_i648ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z30ncclReduceTreeLLKernel_sum_u328ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z30ncclReduceRingLLKernel_sum_u328ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z30ncclReduceTreeLLKernel_sum_i328ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z30ncclReduceRingLLKernel_sum_i328ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z29ncclReduceTreeLLKernel_sum_u88ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z29ncclReduceRingLLKernel_sum_u88ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z29ncclReduceTreeLLKernel_sum_i88ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z29ncclReduceRingLLKernel_sum_i88ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclBroadcastTreeLLKernel_copy_i88ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclBroadcastRingLLKernel_copy_i88ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllGatherTreeLLKernel_copy_i88ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllGatherRingLLKernel_copy_i88ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceTreeLLKernel_sum_f648ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceRingLLKernel_sum_f648ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceTreeLLKernel_sum_f328ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceRingLLKernel_sum_f328ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceTreeLLKernel_sum_f168ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceRingLLKernel_sum_f168ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceTreeLLKernel_sum_u648ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceRingLLKernel_sum_u648ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceTreeLLKernel_sum_i648ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceRingLLKernel_sum_i648ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceTreeLLKernel_sum_u328ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceRingLLKernel_sum_u328ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceTreeLLKernel_sum_i328ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z33ncclAllReduceRingLLKernel_sum_i328ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z32ncclAllReduceTreeLLKernel_sum_u88ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z32ncclAllReduceRingLLKernel_sum_u88ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z32ncclAllReduceTreeLLKernel_sum_i88ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
nvlink error : entry function ‘_Z32ncclAllReduceRingLLKernel_sum_i88ncclColl’ with max regcount of 80 calls function ‘_Z29ncclReduceScatterRing_max_f64P14CollectiveArgs’ with regcount of 96
Makefile:68: recipe for target ‘/home/nvidia/pytorch/build/nccl/obj/collectives/device/devlink.o’ failed
make[5]: *** [/home/nvidia/pytorch/build/nccl/obj/collectives/device/devlink.o] Error 255
Makefile:44: recipe for target ‘/home/nvidia/pytorch/build/nccl/obj/collectives/device/colldevice.a’ failed
make[4]: *** [/home/nvidia/pytorch/build/nccl/obj/collectives/device/colldevice.a] Error 2
Makefile:25: recipe for target ‘src.build’ failed
make[3]: *** [src.build] Error 2
CMakeFiles/nccl_external.dir/build.make:110: recipe for target ‘nccl_external-prefix/src/nccl_external-stamp/nccl_external-build’ failed
make[2]: *** [nccl_external-prefix/src/nccl_external-stamp/nccl_external-build] Error 2
CMakeFiles/Makefile2:67: recipe for target ‘CMakeFiles/nccl_external.dir/all’ failed
make[1]: *** [CMakeFiles/nccl_external.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs…
[ 65%] Built target caffe2
Makefile:138: recipe for target ‘all’ failed
make: *** [all] Error 2
Traceback (most recent call last):
File “setup.py”, line 719, in
build_deps()
File “setup.py”, line 285, in build_deps
build_dir=‘build’)
File “/home/nvidia/pytorch/tools/build_pytorch_libs.py”, line 281, in build_caffe2
check_call([‘make’, ‘-j’, str(max_jobs), ‘install’], cwd=build_dir, env=my_env)
File “/usr/lib/python3.5/subprocess.py”, line 581, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[‘make’, ‘-j’, ‘6’, ‘install’]’ returned non-zero exit status 2
What post/tutorial/notes are you following?
I have not seen this exact error before, but I have a few thoughts:
This line
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/nccl_external.dir/all' failed
could be trying to explain that your building failed when processing you NCLL dependencies.
Go to your CMakesList.txt and other possible configuration files and check if USE_NCCL is set to False (USE_NCLL=False).
You can easily check if all NCCL options are set to False when you start a new build. Before the log outputs the percentages of the building process
It is challenging to turn the NCCL off, because sometimes you turn it off in the configuration script, but then the build process reveals NCCL on (Really confusing).
Everything that concerns NCCL on your PyTorch installation should be TURNED OFF on your TX2, because NCCL is meant only for state-of-the-art x86 devices (more or less, I could be wrong, but it definitely does not work on NVIDIA Jetson TX2).
If you feel lost, I suggest you take a look at this guide that I wrote with trial and error. It is a little outdated, but I repeated the process this month and it still is very helpful (90% or more :] ).
Hi brenozanchetta have you tried building the newer version of pytorch 1.0.1.post2 on Jetpack 3.3 or above. I have previously succesfully compilled pytorch 0.4 on CUDA 8, but building this version of pytorch messed my Jetson TX2 causing me to have to flash it again with Jetpack 3.3.
14beemjamal, I have built pytorch 1.1.0a0+95ca667 on Jetpack 3.3, but higly recommend 3.2.1.
The key point is to turn NCCL off.
NCCL does not work on Jetson TX2, therefore causes installation errors.
Hope this could help.
Did you try the steps I mentioned before?