PyTorch for Jetson

Next Problem training in FP16:

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

in response to:
import torch.config
print(torch.config.show()

prints:

pyTorch built with:
  - GCC 7.5
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_62,code=sm_62;-gencode;arch=compute_72,code=sm_72
  - CuDNN 8.0
  - Magma 2.5.3

The error detail is:

~/envs/fastai2/lib/python3.6/site-packages/fastai2/learner.py in all_batches(self)
    151     def all_batches(self):
    152         self.n_iter = len(self.dl)
--> 153         for o in enumerate(self.dl): self.one_batch(*o)
    154 
    155     def one_batch(self, i, b):

~/envs/fastai2/lib/python3.6/site-packages/fastai2/learner.py in one_batch(self, i, b)
    161             self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
    162             if not self.training: return
--> 163             self.loss.backward();                            self('after_backward')
    164             self.opt.step();                                 self('after_step')
    165             self.opt.zero_grad()

~/envs/fastai2/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    182                 products. Defaults to ``False``.
    183         """
--> 184         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    185 
    186     def register_hook(self, hook):

~/envs/fastai2/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
    123     Variable._execution_engine.run_backward(
    124         tensors, grad_tensors, retain_graph, create_graph,
--> 125         allow_unreachable=True)  # allow_unreachable flag
    126 

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
Exception raised from try_all at ../aten/src/ATen/native/cudnn/Conv.cpp:692 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xa0 (0x7f9817f3f0 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x2c00a6c (0x7f349c7a6c in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x2bf326c (0x7f349ba26c in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0x2bf4280 (0x7f349bb280 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0x2bf7d04 (0x7f349bed04 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #5: at::native::cudnn_convolution_backward_weight(c10::ArrayRef<long>, at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool) + 0x70 (0x7f349bef70 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0x2c58b40 (0x7f34a1fb40 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x2cc58bc (0x7f34a8c8bc in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #8: at::native::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::ArrayRef<long>, long, bool, bool, std::array<bool, 2ul>) + 0x288 (0x7f349bf990 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #9: <unknown function> + 0x2c587a8 (0x7f34a1f7a8 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #10: <unknown function> + 0x2cc595c (0x7f34a8c95c in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #11: <unknown function> + 0x1e8a91c (0x7f6398491c in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x1e8bdfc (0x7f63985dfc in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::generated::CudnnConvolutionBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x298 (0x7f6373fb98 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #14: <unknown function> + 0x212f988 (0x7f63c29988 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x10b0 (0x7f63c24ae0 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x424 (0x7f63c25644 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0xa0 (0x7f63c1cdf0 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x60 (0x7f906ba118 in /home/bart/envs/fastai2/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #19: <unknown function> + 0xbbe94 (0x7f99a0ce94 in /usr/lib/aarch64-linux-gnu/libstdc++.so.6)
frame #20: <unknown function> + 0x7088 (0x7f9bb7c088 in /lib/aarch64-linux-gnu/libpthread.so.0)

Any thougths as to why cuDNN 8.0 can’t do the (I’m guessing) 2d convolution?

Is there a way to test if FP32 training works? I have verified that resnet-18 and ssd-mobilenet can be trained with PyTorch + cuDNN 8, but I think that was FP32.

Note that since cuDNN 8.0 is currently a preview release in JetPack 4.4 DP, and cuDNN 8.0 hasn’t been released for other platforms yet (i.e. x86), official cuDNN 8.0 support hasn’t yet been added by the PyTorch maintainers. I made a couple of patches myself to get it building, but there may be additional fixes that PyTorch needs to make to enable full functionality. Thank you for reporting this.

The code is using RESNET 50. I haven’t tried the FP32 version yet because that will mean a smaller batch size, but I will try it. In the mean time is there any way I can use cuDNN 7.1 on the NX for pytorch?

The Same Model worked in FP32 mode with a smaller batch size.

Hi Dusty_nv

Something seems to have changed in the pytorch source over the past few days that is making the patch process fail. I have built pytorch (to get MAGMA support) literally tens of times during the past few weeks, and it worked every time, but a few days ago it started failing at the patch step as shown below.

Could you check the patch to see if it needs updating so that it can again be applied without an error?

streicher@jetson:~$ git clone --recursive https://github.com/pytorch/pytorch
<lots of text goes here as the files are downloaded from github, cut for brevity>
streicher@jetson:~$ cd ~/pytorch
streicher@jetson:~/pytorch$ wget https://gist.githubusercontent.com/dusty-nv/ce51796085178e1f38e3c6a1663a93a1/raw/44dc4b13095e6eb165f268e3c163f46a4a11110d/pytorch-diff-jetpack-4.4.patch -O pytorch-diff-jetpack-4.4.patch
--2020-06-03 23:35:37-- https://gist.githubusercontent.com/dusty-nv/ce51796085178e1f38e3c6a1663a93a1/raw/44dc4b13095e6eb165f268e3c163f46a4a11110d/pytorch-diff-jetpack-4.4.patch
Resolving gist.githubusercontent.com (gist.githubusercontent.com)... 151.101.192.133, 151.101.128.133, 151.101.64.133, ...
Connecting to gist.githubusercontent.com (gist.githubusercontent.com)|151.101.192.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3009 (2.9K) [text/plain]
Saving to: 'pytorch-diff-jetpack-4.4.patch’
pytorch-diff-jetpack-4.4.patch 100%[===========================================================================>] 2.94K --.-KB/s in 0.001s
2020-06-03 23:35:37 (3.16 MB/s) - 'pytorch-diff-jetpack-4.4.patch’ saved [3009/3009]
streicher@jetson: ~/pytorch$ patch -p1 < pytorch-diff-jetpack-4.4.patch
patching file aten/src/ATen/cuda/CUDAContext.cpp
patching file aten/src/ATen/cuda/detail/KernelUtils.h
patching file aten/src/THCUNN/common.h
patching file caffe2/operators/rnn/recurrent_op_cudnn.cc
Reversed (or previously applied) patch detected! Assume -R? [n]

If I answer yes at this prompt, it results in a failed build 82% through as follows:

[ 82%] Building CXX object caffe2/CMakeFiles/torch_cuda.dir/operators/rnn/recurrent_network_blob_fetcher_op_gpu.cc.o
In file included from /home/streicher/pytorch/caffe2/core/context_gpu.h:20:0,
                 from /home/streicher/pytorch/caffe2/operators/rnn/recurrent_op_cudnn.h:5,
                 from /home/streicher/pytorch/caffe2/operators/rnn/recurrent_op_cudnn.cc:1:
/home/streicher/pytorch/caffe2/operators/rnn/recurrent_op_cudnn.cc: In instantiation of ‘void caffe2::RecurrentBaseOp<T>::initialize(const caffe2::Tensor&, caffe2::Tensor*, caffe2::Tensor*, caffe2::Tensor*, caffe2::Tensor*) [with T = float]’:
/home/streicher/pytorch/caffe2/operators/rnn/recurrent_op_cudnn.cc:382:13:   required from ‘bool caffe2::RecurrentParamAccessOp<T, mode>::RunOnDevice() [with T = float; caffe2::RecurrentParamOpMode mode = (caffe2::RecurrentParamOpMode)1]’
/home/streicher/pytorch/caffe2/operators/rnn/recurrent_op_cudnn.cc:589:1:   required from here
/home/streicher/pytorch/caffe2/operators/rnn/recurrent_op_cudnn.cc:102:40: error: ‘cudnnSetRNNDescriptor’ was not declared in this scope
     CUDNN_ENFORCE(cudnnSetRNNDescriptor(
                   ~~~~~~~~~~~~~~~~~~~~~^
         cudnn_wrapper_.inline_cudnn_handle(),
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         rnnDesc_,
         ~~~~~~~~~                       
         hiddenSize,
         ~~~~~~~~~~~                     
         numLayers,
         ~~~~~~~~~~                      
         dropoutDesc_,
         ~~~~~~~~~~~~~                   
         rnnInput,
         ~~~~~~~~~                       
         rnnDirection,
         ~~~~~~~~~~~~~                   
         rnnMode,
         ~~~~~~~~                        
         CUDNN_RNN_ALGO_STANDARD, // TODO: verify correctness / efficiency.
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         cudnnTypeWrapper<T>::type));
         ~~~~~~~~~~~~~~~~~~~~~~~~~~      
/home/streicher/pytorch/caffe2/core/common_cudnn.h:71:28: note: in definition of macro ‘CUDNN_ENFORCE’
     cudnnStatus_t status = condition;                     \
                            ^~~~~~~~~
/home/streicher/pytorch/caffe2/operators/rnn/recurrent_op_cudnn.cc:102:40: note: suggested alternative: ‘cudnnSetLRNDescriptor’
     CUDNN_ENFORCE(cudnnSetRNNDescriptor(
                   ~~~~~~~~~~~~~~~~~~~~~^
         cudnn_wrapper_.inline_cudnn_handle(),
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         rnnDesc_,
         ~~~~~~~~~                       
         hiddenSize,
         ~~~~~~~~~~~                     
         numLayers,
         ~~~~~~~~~~                      
         dropoutDesc_,
         ~~~~~~~~~~~~~                   
         rnnInput,
         ~~~~~~~~~                       
         rnnDirection,
         ~~~~~~~~~~~~~                   
         rnnMode,
         ~~~~~~~~                        
         CUDNN_RNN_ALGO_STANDARD, // TODO: verify correctness / efficiency.
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         cudnnTypeWrapper<T>::type));
         ~~~~~~~~~~~~~~~~~~~~~~~~~~      
/home/streicher/pytorch/caffe2/core/common_cudnn.h:71:28: note: in definition of macro ‘CUDNN_ENFORCE’
     cudnnStatus_t status = condition;                     \
                            ^~~~~~~~~
caffe2/CMakeFiles/torch_cuda.dir/build.make:4602: recipe for target 'caffe2/CMakeFiles/torch_cuda.dir/operators/rnn/recurrent_op_cudnn.cc.o' failed
make[2]: *** [caffe2/CMakeFiles/torch_cuda.dir/operators/rnn/recurrent_op_cudnn.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
CMakeFiles/Makefile2:8200: recipe for target 'caffe2/CMakeFiles/torch_cuda.dir/all' failed
make[1]: *** [caffe2/CMakeFiles/torch_cuda.dir/all] Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2
Traceback (most recent call last):
  File "setup.py", line 732, in <module>
    build_deps()
  File "setup.py", line 316, in build_deps
    cmake=cmake)
  File "/home/streicher/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
    cmake.build(my_env)
  File "/home/streicher/pytorch/tools/setup_helpers/cmake.py", line 345, in build
    self.run(build_args, my_env)
  File "/home/streicher/pytorch/tools/setup_helpers/cmake.py", line 141, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '4']' returned non-zero exit status 2.

Answering no gives the following result:

streicher@jetson:~/pytorch$ patch -p1 < pytorch-diff-jetpack-4.4.patch
patching file aten/src/ATen/cuda/CUDAContext.cpp
patching file aten/src/ATen/cuda/detail/KernelUtils.h
patching file aten/src/THCUNN/common.h
patching file caffe2/operators/rnn/recurrent_op_cudnn.cc
Reversed (or previously applied) patch detected!  Assume -R? [n] n
Apply anyway? [n] n
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file caffe2/operators/rnn/recurrent_op_cudnn.cc.rej
patching file cmake/public/cuda.cmake
Hunk #1 FAILED at 147.
1 out of 1 hunk FAILED -- saving rejects to file cmake/public/cuda.cmake.rej
streicher@jetson:~/pytorch$ cat caffe2/operators/rnn/recurrent_op_cudnn.cc.rej
--- caffe2/operators/rnn/recurrent_op_cudnn.cc
+++ caffe2/operators/rnn/recurrent_op_cudnn.cc
@@ -99,7 +99,7 @@ void RecurrentBaseOp<T>::initialize(
   // RNN setup
   {
 #if CUDNN_VERSION_MIN(7, 0, 0)
-    CUDNN_ENFORCE(cudnnSetRNNDescriptor(
+    CUDNN_ENFORCE(cudnnSetRNNDescriptor_v6(
         cudnn_wrapper_.inline_cudnn_handle(),
         rnnDesc_,
         hiddenSize,
streicher@jetson:~/pytorch$ 

Hi @streicher, it looks like you are cloning from PyTorch master, which does change. For stability, I recommend building one of the release branches, like v1.5.0 or v1.4.0. I only build these release branches, so I don’t keep the patch updated for master (until a new release comes along).

I have updated the gist with the patch for PyTorch v1.5.0 here: PyTorch patch for building on JetPack >= 4.4 · GitHub

Hi @dusty_nv. Thank you so much for the quick response and for updating the patch. I can confirm that pytorch v1.5.0 builds successfully using the procedure and patch above. Thank you also for the guidance on cloning a specific branch explicitly, I will be sure to do this in future :-)

I tried to build pytorch from source and I saw that cuda was not enabled, after the building was done I tried to get cuda with no success. Do I need to do something else in the building?

Hi @itay010197, do you have CUDA toolkit installed on your Jetson? Can you check under /usr/local/cuda?

If you are using the Nano or NX devkit images that you flashed to your SD card, it should already be there. If you flashed your Jetson with SDK Manager, you should check that the CUDA toolkit installation was successful.

When you first start building PyTorch, after a minute it outputs the build options / configuration that it was able to detect on your system. That would be helpful to see if you continue having issues.

Hi Dusty. I’ve installed torch/torchvision according to the instructions in the post. Here are the version outputs:
Python 3.6.9
torch 1.4.0
torchvision 0.5.0a0+85b8fbf

The tests you’ve suggested work fine. However, when I try to fire up an object detection model, I get the following error:

import torch
import torchvision
torch.ops.torchvision.nms
Traceback (most recent call last):
File “”, line 1, in
File “/home/adrian/.local/lib/python3.6/site-packages/torch/_ops.py”, line 61, in getattr
op = torch._C._jit_get_operation(qualified_op_name)
RuntimeError: No such operator torchvision::nms

Do you have any familiarity with this issue?

Thanks. – Adrian

Hi @dusty_nv. I have cuda in that location.
I am running the l4t base image on xavier and tried to do 2 things:

  1. install pytorch for python3.6 with one of your compiled whl and it worked with access to cuda
  2. install python3.8, pip and all of the dependencies in order to build pythorch from the source. After the build was complete, I wasn’t able to access cuda with the whl file i got.

I saw that on the build options cuda was disabled and I don’t know why.

Hey dusty,

first of all, thank you very much for having torch precompiled and available for us to save some time. I have been trying to go through some pretrained models and am having issues with an inverse function that apparently requires Magma with pytorch.

As far as I understand, Magma normally is bundled with pytorch when installed through pip or conda, however, pytorch from source does not contain magma, and we can not install magma on the pytorch through conda, because magma is not available either in a compiled form. I have the source and am struggling to figure out what to write in the make.inc file in order to get aarch64 as an output.

Any idea if I am on the right path? Or is magma available through the docker container?

best regards
Ben

Hmm, is cuda in your $PATH environment variable? Try typing nvcc --version into a console and see if it finds it. If not, try adding these to your ~/.bashrc file and close/re-open your terminal window:

export PATH=/usr/local/cuda/bin${PATH:+:{PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Hi hamon.riazy, you are correct that MAGMA doesn’t come pre-built for aarch64, so you would need to build it from source. You might want to try using the cmake method as opposed to make.inc.

If you just want to try the model, before the inverse function gets called, call tensor.cpu(), and then it won’t try to use MAGMA. You can call tensor.cuda() on the output if you want to put it back on the GPU then.

1 Like

Are there any dependencies between

  • Jetpack (now 4.4)
  • Cuda Version (how can I ask the system) its seems in V 10.and
  • PyTorch version (now 1.5 is most recent.
    Its said on the very begin, Jetpack 4.4 can have PyTorch Version 1.4 and Version 1.5.
    As I have no preferences (and problems with installing Torch 1.4) I should go for the most recent software: 4.4 + 1.5 + 10.
    Would it help to have a Linux Shell install script for the most recent version (as this should also be the preferred setup I suppose).
    When ist only me don’t spend much time on it.

Hi @klauszinser, JetPack 4.4 DP has CUDA 10.2, and it supports the PyTorch 1.4 and 1.5 wheels linked to above.

There is an installer script included with jetson-inference: jetson-inference/install-pytorch.sh at master · dusty-nv/jetson-inference · GitHub

When running on JP 4.4, it will give the option to install PyTorch 1.4 and torchvision 0.5.

Thank you Dusty. Very quick.
The SDCard is already on the way to be written new.
So:
sudo apt-get update
sudo apt-get upgrade
then running your shell script.
With some luck there is also something for torchvision.

That script also installs torchvision for you.

I also forgot to mention before, there are also the l4t-pytorch and l4t-ml containers that come with PyTorch (and torchvision) pre-installed.

Yesterday I installed. The most recent choice of the shell script was PyTorch 1.4 which I choose with Python3. It took a few hours but definitely more than 1 hour. Also I saw some errors. If needed they are available.
Finally it was working. (Before I had wasted a lot of time with other attempted solutions which did not work).
As you mentioned, torchvision was installed.
Put all on the most recent software status:
sudo apt-get update
sudo apt-get upgrade
worked.
Installing Jupyter notebook worked.
Also matplotlib (which I have not tested right now).
The Python Jupyter program had to be switched - as it was not done automatically in the software - from CPU to GPU.
Most important afterwards was to install a swap file.
To monitor the GPU usage I found jtop Nano from Raffaelo Bonghi jetson-stats · PyPI
Now I am quiet happy. Possiblly I will investigate the script to use PyTorch 1.5
The 4GByte RAM seems a kind of bottleneck. Existing Jetsons and whats coming should solve this issue.
Having a bad experiecence with SD Cards from the RaspberryPI will push me to use an SSD Drive.
Using Container (Docker Images) right now not.
In short, all fine. Thank you.

Hey dusty,

thanks for the tip. I managed to get the magma libs compiled and installed in usr/local/magma but pytorch still seems to be looking for it. Is there an env variable that I need to set or do I need to rebuild pytorch completely?

best
Ben