Trying to regenerate onnx for Jetson Nano

Hi hope all goes well. For the purpose of an essay for the university, I am trying to regenerate the onnx that is provided by NVidia for fcn-resnet18-deepscene-576x320. The objective is to train the model with my own images, but I want first to make sure I can run the onnx I am generating before doing anything else.
(For your curiosity the essay is an evaluation of the Jetson Nano’s capacacity to run inference (semantic segmentation) in realtime with high resolution video. )

I can train and generate the onnx but it gives me the unfamous assertion Assertion failed: axis >= 0 && axis < nbDims. I understand that torch.view needs to be replaced by torch.flatten(). But I am using the code / models provided by NVidia, and checked that the fix is there (torch.view replaced by torch.flatten() ). But I still get the error

ERROR: onnx2trt_utils.hpp:347 In function convert_axis:
[8] Assertion failed: axis >= 0 && axis < nbDims

And I do not understand where I fail.
I already tried many many many many different things, and spent many hours on this … still no success.
I am so close to close the loop of my experience to go from training to onnx.
If someone has a slim idea or suggestion…
Another important thing : I am very limited in my environment. It’s either the Jetson nano, or a server provided by Compute Canada. No DIGITS.
And I am far from being an expert (today) with Pytorch, DeepLearning models and ONNX. I am a student.

Here are all the steps below I am following on a Compute Canada AI/ML server. I do not do that on the Jetson Nano. I tried to export to onnx a simple model on a docker with JetpPack 4.4, but the Jetson nano freezes. (; docker pull

# These commands are run on a Compute Canada (server) instance
# move to the home directory
cd ~

# retrieve the deepscene freiburg_forest_multispectral_annotated (1.2Gb); really quick from compute canada (>13Mb/sec)

# untar-zip the deepscene freiburg_forest_multispectral_annotated (1.2Gb) directly inside the home folder
tar xvf freiburg_forest_multispectral_annotated.tar.gz 

module python/2.7 cuda/10.1 cudnn 
# clean the python virtual env
rm -rf $SLURM_TMPDIR/env

# create the virtual env
virtualenv --no-download $SLURM_TMPDIR/env

# activate the python virtual env
source $SLURM_TMPDIR/env/bin/activate

# install requirements
pip install --no-index torch==1.3.0
pip install --no-index scikit-learn
pip install --no-index six
pip install --no-index pillow==6.1.0
pip install --no-index ~/Cython-0.29.17.tar.gz 
pip install --no-index ~/pycocotools-2.0.0.tar.gz

# retrieve the fork for vision-0.3.0 from dusty-nv
cd ~
git clone

# retrieve the code for training of semantic segmentation networks with PyTorch for jetson nano
cd ~
git clone

cd ~/vision-0.3.0-dusty-ng
rm -rf build/
python build install

# run torchvision test models
python test/ 

# remap the deepscene images for the model
cd ~/pytorch-segmentation-master
python ~/downloads/freiburg_forest_annotated/train/GT_color ~/downloads/freiburg_forest_annotated/train/GT_index
python ~/downloads/freiburg_forest_annotated/train/GT_color ~/downloads/freiburg_forest_annotated/train/GT_index
python ~/downloads/freiburg_forest_annotated/test/GT_color ~/downloads/freiburg_forest_annotated/test/GT_index

# train
cd ~/pytorch-segmentation-master
python -a fcn_resnet18 --dataset deepscene --model-dir ./model_output --dist-url 'tcp://' /home/vincelf/downloads/freiburg_forest_annotated

# export to onnx
python --input model_output/model_best.pth --output resnet18-vlf.onnx

# test the onnx with trtexec
trtexec --onnx=/home/vincelf/pytorch-segmentation-master/resnet18-vlf.onnx --explicitBatch --verbose
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=/home/vincelf/pytorch-segmentation-master/resnet18-vlf.onnx --explicitBatch --verbose
[04/29/2020-23:09:32] [I] === Model Options ===
[04/29/2020-23:09:32] [I] Format: ONNX
[04/29/2020-23:09:32] [I] Model: /home/vincelf/pytorch-segmentation-master/resnet18-vlf.onnx
[04/29/2020-23:09:32] [I] Output:
[04/29/2020-23:09:32] [I] === Build Options ===
[04/29/2020-23:09:32] [I] Max batch: explicit
[04/29/2020-23:09:32] [I] Workspace: 16 MB
[04/29/2020-23:09:32] [I] minTiming: 1
[04/29/2020-23:09:32] [I] avgTiming: 8
[04/29/2020-23:09:32] [I] Precision: FP32
[04/29/2020-23:09:32] [I] Calibration: 
[04/29/2020-23:09:32] [I] Safe mode: Disabled
[04/29/2020-23:09:32] [I] Save engine: 
[04/29/2020-23:09:32] [I] Load engine: 
[04/29/2020-23:09:32] [I] Inputs format: fp32:CHW
[04/29/2020-23:09:32] [I] Outputs format: fp32:CHW
[04/29/2020-23:09:32] [I] Input build shapes: model
[04/29/2020-23:09:32] [I] === System Options ===
[04/29/2020-23:09:32] [I] Device: 0
[04/29/2020-23:09:32] [I] DLACore: 
[04/29/2020-23:09:32] [I] Plugins:
[04/29/2020-23:09:32] [I] === Inference Options ===
[04/29/2020-23:09:32] [I] Batch: Explicit
[04/29/2020-23:09:32] [I] Iterations: 10 (200 ms warm up)
[04/29/2020-23:09:32] [I] Duration: 10s
[04/29/2020-23:09:32] [I] Sleep time: 0ms
[04/29/2020-23:09:32] [I] Streams: 1
[04/29/2020-23:09:32] [I] Spin-wait: Disabled
[04/29/2020-23:09:32] [I] Multithreading: Enabled
[04/29/2020-23:09:32] [I] CUDA Graph: Disabled
[04/29/2020-23:09:32] [I] Skip inference: Disabled
[04/29/2020-23:09:32] [I] === Reporting Options ===
[04/29/2020-23:09:32] [I] Verbose: Enabled
[04/29/2020-23:09:32] [I] Averages: 10 inferences
[04/29/2020-23:09:32] [I] Percentile: 99
[04/29/2020-23:09:32] [I] Dump output: Disabled
[04/29/2020-23:09:32] [I] Profile: Disabled
[04/29/2020-23:09:32] [I] Export timing to JSON file: 
[04/29/2020-23:09:32] [I] Export profile to JSON file: 
[04/29/2020-23:09:32] [I] 
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - GridAnchor_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - NMS_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - Reorg_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - Region_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - Clip_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - LReLU_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - PriorBox_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - Normalize_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - RPROI_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - BatchedNMS_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - FlattenConcat_TRT
Input filename:   /home/vincelf/pytorch-segmentation-master/resnet18-vlf.onnx
ONNX IR version:  0.0.4
Opset version:    9
Producer name:    pytorch
Producer version: 1.3
Model version:    0
Doc string:       
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
[04/29/2020-23:09:33] [V] [TRT] 129:Constant -> 
[04/29/2020-23:09:33] [V] [TRT] 130:Shape -> (4)
WARNING: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Successfully casted down to INT32.
While parsing node number 2 [Gather -> "131"]:
--- Begin node ---
input: "130"
input: "129"
output: "131"
op_type: "Gather"
attribute {
  name: "axis"
  i: 0
  type: INT

--- End node ---
ERROR: onnx2trt_utils.hpp:347 In function convert_axis:
[8] Assertion failed: axis >= 0 && axis < nbDims
[04/29/2020-23:09:33] [E] Failed to parse onnx file
[04/29/2020-23:09:33] [E] Parsing model failed
[04/29/2020-23:09:33] [E] Engine could not be created
&&&& FAILED TensorRT.trtexec # trtexec --onnx=/home/vincelf/pytorch-segmentation-master/resnet18-vlf.onnx --explicitBatch --verbose


Nano has limited memory.
Not sure if this cause the freeze when using docker.

For the TensorRT converter issue, have you tried this suggestion?


Hi AstaLLL, and thanks for your attention.

I saw that topic, thanks.

I have not changed any code, and I checked that this fix exists And it does, if I refer to

I am actually using code provided by NVidia. So I am a bit “reluctant” to change it. I really would like to take it “as it is”, and regenerate onnx from it, without having to change (fix) something on my side. I want to do like NVidia does to generate the onnx they deliver. They can do it. So I am also supposed to be able…

Hopefully I will, at one point.


Hmm, other than that I used PyTorch 1.1 to train the segmentation models, I am not sure of the difference since you are using the forked torchvision. But you may want to try PyTorch 1.1 in the cloud.

Also, I believe the torch.view() restriction in PyTorch’s ResNet model definition is no longer needed with TensorRT 7.1 / JetPack 4.4, because I no longer need my torchvision fork anymore for that. So if the above doesn’t work, you may want to try PyTorch 1.4 and the upstream torchvision 0.5.0.

I do have some patches in my torchvision fork for supporting FCN-ResNet18 and exporting them to ONNX, so you would want to pick up these changes too:

Hi dusty_nv, hope all goes well. I am thankful you took some time to reply.

I tried all your recommendation tonight. But still no success.

  • pytorch-1.4 + torchvision-0.5.0 upstream requires Python 3 … but pytorch-segmentation project seems not compatible with Python 3;
(env) [vincelf@blg5408 pytorch-segmentation-master]$ python -a fcn_resnet18 --dataset deepscene --model-dir ./model_output --dist-url 'tcp://' /home/vincelf/downloads/freiburg_forest_annotated

Traceback (most recent call last):
  File "", line 24, in <module>
    from datasets.cityscapes_utils import get_cityscapes
  File "/home/vincelf/pytorch-segmentation-master/datasets/", line 18
    print self.classes
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(print self.classes)?
  • I made sure I am building & training with the patches that you pointed out; thanks for the notice though.
  • I tried with torch 1.1 (& cuda 10.0), but I get a runtime error when exporting the onnx; can this be fixed somehow, if this is the way to go ?
exporting model to ONNX...
Traceback (most recent call last):
  File "", line 73, in <module>
    torch.onnx.export(model, input, opt.output, verbose=True, input_names=input_names, output_names=output_names)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/onnx/", line 25, in export
    return utils.export(*args, **kwargs)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/onnx/", line 131, in export
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/onnx/", line 363, in _export
    _retain_param_name, do_constant_folding)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/onnx/", line 266, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/onnx/", line 225, in _trace_and_get_graph_from_model
    trace, torch_out = torch.jit.get_trace_graph(model, args, _force_outplace=True)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/jit/", line 231, in get_trace_graph
    return LegacyTracedModule(f, _force_outplace, return_inputs)(*args, **kwargs)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/nn/modules/", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/jit/", line 295, in forward
    out_vars, _ = _flatten(out)
RuntimeError: Only tuples, lists and Variables supported as JIT inputs, but got OrderedDict

I am just wondering how NVidia produces the onnx… can you tell ?

If you have any other idea for me in my situation, please let me know. Like if by chance you have a docker I could use on the Nano to generate from A-to-Z the Onnx for deepscene (train to onnx)… without freezing the nano :)

Thanks in advance for all your help,


pytorch 1.4 was the last release to support Python 2.7, so I believe it should still work with Python 2.7. pytorch 1.5 only supports Python 3.

Hmm this error shouldn’t be occuring, because in the script, the export_onnx flag is set when constructing the model - I added this flag myself, to disable use of OrderedDict so it could be exported to ONNX.

Can you confirm that is in fact being used?

Hi dusty_nv,

Great news tonight ! I finally can run the inference on the nano with my own onnx. For now the one I am generating is the same as the one NVidia provides (resnet18 + deepscene). But I will be able to train with my own datasets and experiment, which is great.

The key point is the usage of torch-1.1.0 with cuda-10.0 to build the fork of torchvision 0.3.0 (training and onnx export are done remotely on a server providing better ML resources than the Jetson nano).

Then I tried the onnx on the Jetson nano. Hopefully I did that, even if the result of trtexec on the remote server returns an error / assert

  • trtexec (tensorrt, with the onnx generated using torch 1.1.0/cuda-10.0, returns another error/ assert than the one I get using torch-1.3/ cuda-10.1 (ERROR: builtin_op_importers.cpp:695 In function importBatchNormalization: [6] Assertion failed: scale_weights.shape == weights_shape).

Anyway… I am good to continue now.

You had a good catch about the ‘export_onnx’ flag in the DeepScene constructor. I am not sure why I removed it, I completely forgot about it. A residue of previous tries and errors. I checked out the original code from github.

Regarding some minor fixes that I needed to do to be able to move on (it’s related to Python 2.7 if I recall), I share the details below.

diff -bur pytorch-segmentation-master/datasets/ pytorch-segmentation-master-vlf/datasets/
--- pytorch-segmentation-master/datasets/	2019-08-30 13:37:06.000000000 -0400
+++ pytorch-segmentation-master-vlf/datasets/	2020-05-09 00:27:09.000000000 -0400
@@ -74,9 +74,9 @@
 	for n in range(len(files)):
 		worker_args.append((os.path.join(args.input, files[n]), os.path.join(args.output, files[n]), args.colorized))
-	#for n in worker_args:
-	#    remap_labels(n)
+	for n in worker_args:
+	    remap_labels(n)
-	with ProcessPool(processes=args.workers) as pool:
-, worker_args)
+	#with ProcessPool(processes=args.workers) as pool:
+	#, worker_args)
diff -bur pytorch-segmentation-master/ pytorch-segmentation-master-vlf/
--- pytorch-segmentation-master/	2019-08-30 13:37:06.000000000 -0400
+++ pytorch-segmentation-master-vlf/	2020-05-09 01:38:03.000000000 -0400
@@ -290,7 +290,7 @@
     args.dist_backend = 'nccl'
     print('| distributed init (rank {}): {}'.format(
-        args.rank, args.dist_url), flush=True)
+        args.rank, args.dist_url))
     torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
                                          world_size=args.world_size, rank=args.rank)

In the meantime, I tried to train and generate the onnx using Python 3, but no success. But that’s OK for now… until it’s not OK anymore (soon ?).

My goal is to do transfer learning and domain adaptation with my own images using a model for “real-time” semantic segmentation of bicycle path/road in bad conditions, like half wet, dirty, a bit of snow, etc. And deepscene dataset looks like a good starting point (forest pathway).

Thanks a lot for your help, you unblocked me big time, and I am sincerely very grateful.

Wishing you a lot of fun.