PyTorch for Jetson

I believe that step of building torchvision can take quite a while - how long has it been stuck for?

If you run sudo tegratstats in the background, is it still responsive?

Can you provide a cookbook on how to install torchaudio in Jetson TX2+Jetpack4.4? I was able to pytorch1.6 and torchvision0.7 according to the instruction.

I was able to install torchaudio in l4t-pytorch container, the procedure was straightforward:

Yup, in fact I found the same docker build script and followed the step, which worked out. I think my error with my own build was related to torchaudio 0.7.
Thanks!

Hey Dusty,

Thank you for the reply and apologies for posting it in multiple places.

I have left this step for hours, and unfortunatelly the whole system freezes after a while. I had to reset the Nano a couple of times.

All other libraries are installed and working, including compiled openCV, torch and torchvision 0.2.0 (The latest version I can pip install)

Hmm, is it using a lot of memory while compiling that step? You can keep an eye on that with sudo tegrastats. You may want to try mounting SWAP if it is running out.

Also, if you continue to have trouble you could use the l4t-pytorch docker container, it already has these components pre-installed.

Thank you, mounting a SWAP helped passing that step.
Nevertheless Ive got another error that I couldnt figure it out.

Im trying to install it in a separate env. would that be an issue?

 [9/13] c++ -MMD -MF /home/izertis/torchvision/build/temp.linux-aarch64-3.6/home/izertis/torchvision/torchvision/csrc/vision.o.d -pthread -D                                                                      NDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DWITH_CUDA                                                                       -I/home/izertis/torchvision/torchvision/csrc -I/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include -I/home/izerti                                                                      s/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/izertis/.virtualenvs/py3torch/lib/python3.                                                                      6/site-packages/torch/include/TH -I/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/incl                                                                      ude -I/home/izertis/.virtualenvs/py3torch/include -I/usr/include/python3.6m -c -c /home/izertis/torchvision/torchvision/csrc/vision.cpp -o                                                                       /home/izertis/torchvision/build/temp.linux-aarch64-3.6/home/izertis/torchvision/torchvision/csrc/vision.o -DTORCH_API_INCLUDE_EXTENSION_H -                                                                      DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
In file included from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/Parallel.h:149:0,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneab                                                                      le.h:5,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/extension.h:4,
             from /home/izertis/torchvision/torchvision/csrc/cpu/vision_cpu.h:2,
             from /home/izertis/torchvision/torchvision/csrc/DeformConv.h:3,
             from /home/izertis/torchvision/torchvision/csrc/vision.cpp:11:
/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/ParallelOpenMP.h:84:0: warning: ignoring #pragma omp par                                                                      allel [-Wunknown-pragmas]
 #pragma omp parallel for if ((end - begin) >= grain_size)

In file included from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/boxing/KernelFunction_impl.h:                                                                      2:0,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/boxing/KernelFunction.h:226,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/dispatch/DispatchTable.h:9,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/dispatch/OperatorEntry.h:3,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:3,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/jit/runtime/operator.h:6,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/jit/ir/ir.h:7,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/jit/api/function_impl.h:4,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/jit/api/method.h:5,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/jit/api/object.h:5,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/jit/frontend/tracer.h:9,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/autograd/generated/variable_                                                                      factories.h:12,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/types.h:7,
             from /home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/script.h:3,
             from /home/izertis/torchvision/torchvision/csrc/vision.cpp:2:
/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h: In i                                                                      nstantiation of ‘typename c10::guts::infer_function_traits<Functor>::type::return_type c10::impl::call_functor_with_args_from_stack_(Functo                                                                      r*, c10::Stack*, std::index_sequence<INDEX ...>) [with Functor = c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<long int (*)(), long in                                                                      t, c10::guts::typelist::typelist<> >; bool AllowDeprecatedTypes = false; long unsigned int ...ivalue_arg_indices = {}; typename c10::guts::                                                                      infer_function_traits<Functor>::type::return_type = long int; c10::Stack = std::vector<c10::IValue>; std::index_sequence<INDEX ...> = std::                                                                      integer_sequence<long unsigned int>]’:
/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:250:7                                                                      7:   required from ‘typename c10::guts::infer_function_traits<Functor>::type::return_type c10::impl::call_functor_with_args_from_stack(Func                                                                      tor*, c10::Stack*) [with Functor = c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<long int (*)(), long int, c10::guts::typelist::typeli                                                                      st<> >; bool AllowDeprecatedTypes = false; typename c10::guts::infer_function_traits<Functor>::type::return_type = long int; c10::Stack = s                                                                      td::vector<c10::IValue>]’
/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:292:7                                                                      9:   required from ‘c10::impl::make_boxed_from_unboxed_functor<KernelFunctor, AllowDeprecatedTypes>::call(c10::OperatorKernel*, const c10::                                                                      OperatorHandle&, c10::Stack*)::<lambda()> [with KernelFunctor = c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<long int (*)(), long int                                                                      , c10::guts::typelist::typelist<> >; bool AllowDeprecatedTypes = false]’
/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:292:9                                                                      0:   required from ‘struct c10::impl::make_boxed_from_unboxed_functor<KernelFunctor, AllowDeprecatedTypes>::call(c10::OperatorKernel*, cons                                                                      t c10::OperatorHandle&, c10::Stack*) [with KernelFunctor = c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<long int (*)(), long int, c10                                                                      ::guts::typelist::typelist<> >; bool AllowDeprecatedTypes = false; c10::Stack = std::vector<c10::IValue>]::<lambda()>’
/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:287:3                                                                      8:   required from ‘static void c10::impl::make_boxed_from_unboxed_functor<KernelFunctor, AllowDeprecatedTypes>::call(c10::OperatorKernel*,                                                                       const c10::OperatorHandle&, c10::Stack*) [with KernelFunctor = c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<long int (*)(), long int                                                                      , c10::guts::typelist::typelist<> >; bool AllowDeprecatedTypes = false; c10::Stack = std::vector<c10::IValue>]’
/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/boxing/KernelFunction_impl.h:87:9:   required from                                                                       ‘static c10::KernelFunction c10::KernelFunction::makeFromUnboxedFunctor(std::unique_ptr<c10::OperatorKernel>) [with bool AllowLegacyTypes =                                                                       false; KernelFunctor = c10::impl::detail::WrapFunctionIntoRuntimeFunctor_<long int (*)(), long int, c10::guts::typelist::typelist<> >]’
/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/boxing/KernelFunction_impl.h:137:114:   required fr                                                                      om ‘static c10::KernelFunction c10::KernelFunction::makeFromUnboxedRuntimeFunction(FuncType*) [with bool AllowLegacyTypes = false; FuncType                                                                       = long int()]’
/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/library.h:62:64:   required from ‘torch::CppFunction::C                                                                      ppFunction(Func*, std::enable_if_t<c10::guts::is_function_type<FuncType_>::value, std::nullptr_t>) [with Func = long int(); std::enable_if_                                                                      t<c10::guts::is_function_type<FuncType_>::value, std::nullptr_t> = std::nullptr_t]’
/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/torch/library.h:304:17:   required from ‘torch::Library& torc                                                                      h::Library::def(NameOrSchema&&, Func&&) & [with NameOrSchema = const char (&)[14]; Func = long int (*)()]’
/home/izertis/torchvision/torchvision/csrc/vision.cpp:56:40:   required from here
/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/include/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:239:2                                                                      2: warning: variable ‘num_ivalue_args’ set but not used [-Wunused-but-set-variable]
 constexpr size_t num_ivalue_args = sizeof...(ivalue_arg_indices);
                  ^~~~~~~~~~~~~~~
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1510, in _run_ninja_build
env=env)
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "setup.py", line 255, in <module>
'clean': clean,
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/setuptools/__init__.py", line 163, in setup
return distutils.core.setup(**attrs)
  File "/usr/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
  File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
  File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/setuptools/command/install.py", line 67, in run
self.do_egg_install()
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/setuptools/command/install.py", line 109, in do_egg_install
self.run_command('bdist_egg')
  File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
  File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/setuptools/command/bdist_egg.py", line 175, in run
cmd = self.call_command('install_lib', warn_dir=0)
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/setuptools/command/bdist_egg.py", line 161, in call_command
self.run_command(cmdname)
  File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
  File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
  File "/usr/lib/python3.6/distutils/command/install_lib.py", line 109, in build
self.run_command('build_ext')
  File "/usr/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
  File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 87, in run
_build_ext.run(self)
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
  File "/usr/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 644, in build_extensions
build_ext.build_extensions(self)
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
  File "/usr/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
  File "/usr/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 208, in build_extension
_build_ext.build_extension(self, ext)
  File "/usr/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 473, in unix_wrap_ninja_compile
with_cuda=with_cuda)
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1228, in _write_ninja_file_and_                                                                      compile_objects
error_prefix='Error compiling objects for extension')
  File "/home/izertis/.virtualenvs/py3torch/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1524, in _run_ninja_build
raise RuntimeError(message)
RuntimeError: Error compiling objects for extension

Hi @dusty_nv.

This request is about the build instructions you give for building pytorch and torchvision form source. When following your build instructions, I end up with a torchvision installation that reports an odd build number. The installation script included in torchvision looks for the environment variable called BUILD_VERSION. If this variable is not found, torchvision adds a random hash to the source version, so that you end up with a version number like “0.7.0a0+78ed10c” when listing in pip . This odd version number makes fastai installation fail, as pip does not see “0.7.0a0+78ed10c” as validly greater or equal than “0.7.0”. This is easily fixed by adding (in my case) “export BUILD_VERSION=0.7.0” just before calling “python3 setup.py install”. May I ask that you consider adding some text around this in your build instructions. As is, the build instructions already show how to pass the correct build version to the pytorch build, it is just missing the instruction or how to also do it for torchvision.

Hi @gui.steffen, I haven’t built torchvision in a virtualenv before, so it may or may not be related - are you able to test if it builds for you outside of virtualenv?

It may also be related to a mismatch in PyTorch / torchvision version, if you are using a different torchvision version than what is in the instructions above or torchvision master.

Thanks @streicher, I have just added the instructions to export BUILD_VERSION=... to the original post above.

Hi, I’m using a jetson nano, jetpack 4.3 and following Paul McWhorters series. Everything works fine up until lesson 55, transfer learning. I then get a ‘imageNet failed to load network’ error. Paul’s tutorial uses torch v1.1.0. I reloaded everything (following the nvidia instructions and ended up with torch v1.4.0 and torchvision v0.5.0. I’ve uninstalled torchvision and reverted to version 0.3.0. I then uninstalled torch (successful) but when I followed the instructions for loading torch v1.1.0 (as per this forum) I still had v1.4.0.?

Hi @emersok1, can you provide the error log and command that you are trying to run?

After you uninstall it, if you run a python interpreter and try to import torch does it still work?

You might want to try uninstalling a couple times:

# use pip3 instead of pip if using python3
pip uninstall torch
sudo pip uninstall torch

Hi Dusty, Thanks for getting back to me. Since then I’ve successfully uninstalled torch and torchvision (confirmed by failed imports) and installed torch 1.1.0 and torchvision 0.3.0. I downloaded your 100 epoch cat-dog model and performed the oxxn export as per the tutorial. When I run the transfer learning tutorial commands under “Processing Images with TensorRT” I get a lot of preamble thena warning saying that INT64 weights have been used, then a comment stating “Successfully casted down to INT32”, Under — End node – I get
ERROR: onnx2trt_utils.hpp:347 In function convert_axis:
[8] Assertio failed: axis >= 0 && axis <nbDims

Then in RED text
[TRT] failed to parse ONNX model ‘cat_dog/resnet18.onnx’
[TRT] device GPU, failed to load ‘cat_dog/resnet18.onnx’
[TRT] failed to load cat_dog/resnet18.onnx
[TRT imageNet – failed to initialize.
imagenet: failed to initialize imageNet

I’m very much out of depth and I’m not sure where to find an error log?
Up to this point everything had worked fine ( live camera detection etc)
I’d really like to be able to create my own trained models but have stalled at this point. I don’t know if I have given you enough information to get me past this point,
Thanks again, Keith

If you are using torchvision 0.3.0 and training ResNet, you should try this fork of torchvision (v0.3.0 branch), which includes patches to make it work:

https://github.com/dusty-nv/vision/tree/v0.3.0

Note that is what the install-pytorch.sh script from jetson-inference will install:

https://github.com/dusty-nv/jetson-inference/blob/4e6b7ff37935a9bc64271119f42cd25366ce8c79/tools/install-pytorch.sh#L329

git clone -bv0.3.0 https://github.com/dusty-nv/vision torchvision
cd torchvision
sudo python3 setup.py install

For newer versions of PyTorch/torchvision and JetPack, it should not be required (and the script installs the upstream torchvision)

Hi,

The path you defined to version 0.3.0 was the one I was using. To check I uninstalled torchvision and re-installed with the path defined in your posting. Same result

I’ve attached the python programme as per Paul McWhorter’s lesson 55 but with the path to load the net modified to point at the cat_dog model. I’ve also attached a copy of the terminal output I get when I run this.

Thanks, Keith

(Attachment lesson55_deeplearning_10ctest.py is missing)

(Attachment terminal_output.odt is missing)

terminal_output_word.docx (19.1 KB)

Outlook didn’t like the python file so I’ve copied it to word

lesson55_deeplearning_10ctest.py.docx (14.3 KB)

Can you test if you ran run this cat_dog model that I trained for 100 epochs?

After you re-installed torchvision, did you train the model again and re-export the ONNX?

You can train it for just 1 epoch to test that you can actually load it in TensorRT (as opposed to waiting many epochs)

Hi Dusty,

Thank you for your extreme patience. The model I used was as per your latest reply.

As I have stumbled my way through this I’ve created quite a few duplicate directories and files here and there as I have re-loaded code. I had a tidy up and (I think this may be the critical point after having reverted to torch 1.1.0 and torchvision 0.3.0) compiled the project again. The bottom line is that it works fine now!!! I’ve yet to create my own model from my own data but I’m pretty confident it will work.

Paul McWhorter has mentioned several times in his videos that he has been very impressed by the level of support provided by you and your team at nvidia and I strongly agree with him

Thanks again

Keith

Keith, glad you got it working - yes, if you are able to load your cat_dog model, you should also be able to load a model trained on your own data. They use the same resnet-18 network definition that gets exported from PyTorch to ONNX and imported into TensorRT.

I tried uninstalling pytorch, building magma from source and then reinstalling the pytorch wheel. Is there anything else I have to do, to get magma support for pytorch?