Trying to regenerate onnx for Jetson Nano

vincelf · May 30, 2020, 2:09am

Hi hope all goes well. For the purpose of an essay for the university, I am trying to regenerate the onnx that is provided by NVidia for fcn-resnet18-deepscene-576x320. The objective is to train the model with my own images, but I want first to make sure I can run the onnx I am generating before doing anything else.
(For your curiosity the essay is an evaluation of the Jetson Nano’s capacacity to run inference (semantic segmentation) in realtime with high resolution video. )

I can train and generate the onnx but it gives me the unfamous assertion Assertion failed: axis >= 0 && axis < nbDims. I understand that torch.view needs to be replaced by torch.flatten(). But I am using the code / models provided by NVidia, and checked that the fix is there (torch.view replaced by torch.flatten() ). But I still get the error

ERROR: onnx2trt_utils.hpp:347 In function convert_axis:
[8] Assertion failed: axis >= 0 && axis < nbDims

And I do not understand where I fail.
I already tried many many many many different things, and spent many hours on this … still no success.
I am so close to close the loop of my experience to go from training to onnx.
If someone has a slim idea or suggestion…
Another important thing : I am very limited in my environment. It’s either the Jetson nano, or a server provided by Compute Canada. No DIGITS.
And I am far from being an expert (today) with Pytorch, DeepLearning models and ONNX. I am a student.

Here are all the steps below I am following on a Compute Canada AI/ML server. I do not do that on the Jetson Nano. I tried to export to onnx a simple model on a docker with JetpPack 4.4, but the Jetson nano freezes. (https://pytorch.org/docs/stable/onnx.html; docker pull nvcr.io/nvidia/l4t-ml:r32.4.2-py3)

# These commands are run on a Compute Canada (server) instance
# move to the home directory
cd ~

# retrieve the deepscene freiburg_forest_multispectral_annotated (1.2Gb); really quick from compute canada (>13Mb/sec)
wget http://deepscene.cs.uni-freiburg.de/static/datasets/freiburg_forest_multispectral_annotated.tar.gz

# untar-zip the deepscene freiburg_forest_multispectral_annotated (1.2Gb) directly inside the home folder
tar xvf freiburg_forest_multispectral_annotated.tar.gz 

module python/2.7 cuda/10.1 cudnn 
# clean the python virtual env
rm -rf $SLURM_TMPDIR/env

# create the virtual env
virtualenv --no-download $SLURM_TMPDIR/env

# activate the python virtual env
source $SLURM_TMPDIR/env/bin/activate

# install requirements
pip install --no-index torch==1.3.0
pip install --no-index scikit-learn
pip install --no-index six
pip install --no-index pillow==6.1.0
pip install --no-index ~/Cython-0.29.17.tar.gz 
pip install --no-index ~/pycocotools-2.0.0.tar.gz

# retrieve the fork for vision-0.3.0 from dusty-nv
cd ~
git clone https://github.com/dusty-nv/vision.git

# retrieve the code for training of semantic segmentation networks with PyTorch for jetson nano
cd ~
git clone https://github.com/dusty-nv/pytorch-segmentation.git

cd ~/vision-0.3.0-dusty-ng
rm -rf build/
python setup.py build install

# run torchvision test models
python test/test_models.py 

# remap the deepscene images for the model
cd ~/pytorch-segmentation-master
python deepscene_remap.py ~/downloads/freiburg_forest_annotated/train/GT_color ~/downloads/freiburg_forest_annotated/train/GT_index
python deepscene_remap.py ~/downloads/freiburg_forest_annotated/train/GT_color ~/downloads/freiburg_forest_annotated/train/GT_index
python deepscene_remap.py ~/downloads/freiburg_forest_annotated/test/GT_color ~/downloads/freiburg_forest_annotated/test/GT_index

# train
cd ~/pytorch-segmentation-master
python train.py -a fcn_resnet18 --dataset deepscene --model-dir ./model_output --dist-url 'tcp://127.0.0.1:5556' /home/vincelf/downloads/freiburg_forest_annotated

# export to onnx
python onnx_export.py --input model_output/model_best.pth --output resnet18-vlf.onnx

# test the onnx with trtexec
trtexec --onnx=/home/vincelf/pytorch-segmentation-master/resnet18-vlf.onnx --explicitBatch --verbose
&&&& RUNNING TensorRT.trtexec # trtexec --onnx=/home/vincelf/pytorch-segmentation-master/resnet18-vlf.onnx --explicitBatch --verbose
[04/29/2020-23:09:32] [I] === Model Options ===
[04/29/2020-23:09:32] [I] Format: ONNX
[04/29/2020-23:09:32] [I] Model: /home/vincelf/pytorch-segmentation-master/resnet18-vlf.onnx
[04/29/2020-23:09:32] [I] Output:
[04/29/2020-23:09:32] [I] === Build Options ===
[04/29/2020-23:09:32] [I] Max batch: explicit
[04/29/2020-23:09:32] [I] Workspace: 16 MB
[04/29/2020-23:09:32] [I] minTiming: 1
[04/29/2020-23:09:32] [I] avgTiming: 8
[04/29/2020-23:09:32] [I] Precision: FP32
[04/29/2020-23:09:32] [I] Calibration: 
[04/29/2020-23:09:32] [I] Safe mode: Disabled
[04/29/2020-23:09:32] [I] Save engine: 
[04/29/2020-23:09:32] [I] Load engine: 
[04/29/2020-23:09:32] [I] Inputs format: fp32:CHW
[04/29/2020-23:09:32] [I] Outputs format: fp32:CHW
[04/29/2020-23:09:32] [I] Input build shapes: model
[04/29/2020-23:09:32] [I] === System Options ===
[04/29/2020-23:09:32] [I] Device: 0
[04/29/2020-23:09:32] [I] DLACore: 
[04/29/2020-23:09:32] [I] Plugins:
[04/29/2020-23:09:32] [I] === Inference Options ===
[04/29/2020-23:09:32] [I] Batch: Explicit
[04/29/2020-23:09:32] [I] Iterations: 10 (200 ms warm up)
[04/29/2020-23:09:32] [I] Duration: 10s
[04/29/2020-23:09:32] [I] Sleep time: 0ms
[04/29/2020-23:09:32] [I] Streams: 1
[04/29/2020-23:09:32] [I] Spin-wait: Disabled
[04/29/2020-23:09:32] [I] Multithreading: Enabled
[04/29/2020-23:09:32] [I] CUDA Graph: Disabled
[04/29/2020-23:09:32] [I] Skip inference: Disabled
[04/29/2020-23:09:32] [I] === Reporting Options ===
[04/29/2020-23:09:32] [I] Verbose: Enabled
[04/29/2020-23:09:32] [I] Averages: 10 inferences
[04/29/2020-23:09:32] [I] Percentile: 99
[04/29/2020-23:09:32] [I] Dump output: Disabled
[04/29/2020-23:09:32] [I] Profile: Disabled
[04/29/2020-23:09:32] [I] Export timing to JSON file: 
[04/29/2020-23:09:32] [I] Export profile to JSON file: 
[04/29/2020-23:09:32] [I] 
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - GridAnchor_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - NMS_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - Reorg_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - Region_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - Clip_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - LReLU_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - PriorBox_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - Normalize_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - RPROI_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - BatchedNMS_TRT
[04/29/2020-23:09:32] [V] [TRT] Plugin Creator registration succeeded - FlattenConcat_TRT
----------------------------------------------------------------
Input filename:   /home/vincelf/pytorch-segmentation-master/resnet18-vlf.onnx
ONNX IR version:  0.0.4
Opset version:    9
Producer name:    pytorch
Producer version: 1.3
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
[04/29/2020-23:09:33] [V] [TRT] 129:Constant -> 
[04/29/2020-23:09:33] [V] [TRT] 130:Shape -> (4)
WARNING: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
Successfully casted down to INT32.
While parsing node number 2 [Gather -> "131"]:
--- Begin node ---
input: "130"
input: "129"
output: "131"
op_type: "Gather"
attribute {
  name: "axis"
  i: 0
  type: INT
}

--- End node ---
ERROR: onnx2trt_utils.hpp:347 In function convert_axis:
[8] Assertion failed: axis >= 0 && axis < nbDims
[04/29/2020-23:09:33] [E] Failed to parse onnx file
[04/29/2020-23:09:33] [E] Parsing model failed
[04/29/2020-23:09:33] [E] Engine could not be created
&&&& FAILED TensorRT.trtexec # trtexec --onnx=/home/vincelf/pytorch-segmentation-master/resnet18-vlf.onnx --explicitBatch --verbose

AastaLLL · June 1, 2020, 2:53am

Hi,

Nano has limited memory.
Not sure if this cause the freeze when using docker.

For the TensorRT converter issue, have you tried this suggestion?

github.com/onnx/onnx-tensorrt

"Assertion failed: axis >= 0 && axis < nbDims" in convert_axis()

opened 01:22PM - 28 Feb 19 UTC

closed 08:47AM - 10 Nov 20 UTC

lampsonSong

bug triaged repro requested

When converting a onnx model to trt model, one node is Gather. The conversion wo…uld get an error : **Assertion failed: axis >= 0 && axis < nbDims.** The conversion code is in builtin_op_importers.cpp, the code is : > #if NV_TENSORRT_MAJOR >= 4 DEFINE_BUILTIN_OP_IMPORTER(Gather) { nvinfer1::ITensor& data = convertToTensor(inputs.at(0), ctx); nvinfer1::ITensor& indices = convertToTensor(inputs.at(1), ctx); OnnxAttrs attrs(node); int axis = attrs.get<int>("axis", 0); int nbDims = inputs.at(0).shape().nbDims; TRT_CHECK(convert_axis(axis, nbDims)); RETURN_FIRST_OUTPUT(ctx->network()->addGather(data, indices, axis)); } #endif // NV_TENSORRT_MAJOR >= 4 We can find a convert_axis() method in the above code. In onnx2trt_utils.hpp, I found the source code >// Convert an ONNX axis into a TRT axis inline Status convert_axis(int& axis, int nbDims) { // Support negative indexing if (axis < 0) { axis += nbDims; } // If axis was positive, subtract 1 to strip batch dimension else { axis = axis - 1; } ASSERT(axis >= 0 && axis < nbDims, ErrorCode::kUNSUPPORTED_NODE); return Status::success(); } However, my input here is axis = 0, nbDims = 1, then axis = axis -1 = -1. The assertion fails. ### The question is how to handle the situation when axis = 0. I am not clear about the axis transformation between onnx and tensorRT. Thanks.

Thanks.

vincelf · June 1, 2020, 3:09am

Hi AstaLLL, and thanks for your attention.

I saw that topic, thanks.

I have not changed any code, and I checked that this fix exists https://github.com/onnx/onnx-tensorrt/issues/125#issuecomment-502931336. And it does, if I refer to https://github.com/dusty-nv/vision/blob/v0.3.0/torchvision/models/resnet.py.

I am actually using code provided by NVidia. So I am a bit “reluctant” to change it. I really would like to take it “as it is”, and regenerate onnx from it, without having to change (fix) something on my side. I want to do like NVidia does to generate the onnx they deliver. They can do it. So I am also supposed to be able…

Hopefully I will, at one point.

Vincent

dusty_nv · June 1, 2020, 3:02pm

Hmm, other than that I used PyTorch 1.1 to train the segmentation models, I am not sure of the difference since you are using the forked torchvision. But you may want to try PyTorch 1.1 in the cloud.

Also, I believe the torch.view() restriction in PyTorch’s ResNet model definition is no longer needed with TensorRT 7.1 / JetPack 4.4, because I no longer need my torchvision fork anymore for that. So if the above doesn’t work, you may want to try PyTorch 1.4 and the upstream torchvision 0.5.0.

I do have some patches in my torchvision fork for supporting FCN-ResNet18 and exporting them to ONNX, so you would want to pick up these changes too:

vincelf · June 2, 2020, 3:38am

Hi dusty_nv, hope all goes well. I am thankful you took some time to reply.

I tried all your recommendation tonight. But still no success.

pytorch-1.4 + torchvision-0.5.0 upstream requires Python 3 … but pytorch-segmentation project seems not compatible with Python 3;

(env) [vincelf@blg5408 pytorch-segmentation-master]$ python train.py -a fcn_resnet18 --dataset deepscene --model-dir ./model_output --dist-url 'tcp://127.0.0.1:5556' /home/vincelf/downloads/freiburg_forest_annotated

pytorch-segmentation/datasets/__init__.py
Traceback (most recent call last):
  File "train.py", line 24, in <module>
    from datasets.cityscapes_utils import get_cityscapes
  File "/home/vincelf/pytorch-segmentation-master/datasets/cityscapes_utils.py", line 18
    print self.classes
             ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(print self.classes)?

I made sure I am building & training with the patches that you pointed out; thanks for the notice though.
I tried with torch 1.1 (& cuda 10.0), but I get a runtime error when exporting the onnx; can this be fixed somehow, if this is the way to go ?

exporting model to ONNX...
Traceback (most recent call last):
  File "onnx_export.py", line 73, in <module>
    torch.onnx.export(model, input, opt.output, verbose=True, input_names=input_names, output_names=output_names)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/onnx/__init__.py", line 25, in export
    return utils.export(*args, **kwargs)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/onnx/utils.py", line 131, in export
    strip_doc_string=strip_doc_string)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/onnx/utils.py", line 363, in _export
    _retain_param_name, do_constant_folding)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/onnx/utils.py", line 266, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/onnx/utils.py", line 225, in _trace_and_get_graph_from_model
    trace, torch_out = torch.jit.get_trace_graph(model, args, _force_outplace=True)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/jit/__init__.py", line 231, in get_trace_graph
    return LegacyTracedModule(f, _force_outplace, return_inputs)(*args, **kwargs)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/localscratch/vincelf.7646845.0/env/lib/python2.7/site-packages/torch/jit/__init__.py", line 295, in forward
    out_vars, _ = _flatten(out)
RuntimeError: Only tuples, lists and Variables supported as JIT inputs, but got OrderedDict

I am just wondering how NVidia produces the onnx… can you tell ?

If you have any other idea for me in my situation, please let me know. Like if by chance you have a docker I could use on the Nano to generate from A-to-Z the Onnx for deepscene (train to onnx)… without freezing the nano :)

Thanks in advance for all your help,

Vincent

dusty_nv · June 2, 2020, 3:19pm

pytorch 1.4 was the last release to support Python 2.7, so I believe it should still work with Python 2.7. pytorch 1.5 only supports Python 3.

Hmm this error shouldn’t be occuring, because in the export_onnx.py script, the export_onnx flag is set when constructing the model - I added this flag myself, to disable use of OrderedDict so it could be exported to ONNX.

github.com

dusty-nv/pytorch-segmentation/blob/16882772bc767511d892d134918722011d1ea771/onnx_export.py#L45


      
          
          print('checkpoint accuracy: {:.3f}% mean IoU, {:.3f}% accuracy'.format(checkpoint['mean_IoU'], checkpoint['accuracy']))
          
          # create the model architecture
          print('using model:  ' + arch)
          print('num classes:  ' + str(num_classes))
          
          model = models.segmentation.__dict__[arch](num_classes=num_classes,
                                                     aux_loss=None,
                                                     pretrained=False,
                                                     export_onnx=True)
          																 
          # load the model weights
          model.load_state_dict(checkpoint['model'])
          
          model.to(device)
          model.eval()
          
          print(model)
          print('')

Can you confirm that is in fact being used?

vincelf · June 3, 2020, 5:21am

Hi dusty_nv,

Great news tonight ! I finally can run the inference on the nano with my own onnx. For now the one I am generating is the same as the one NVidia provides (resnet18 + deepscene). But I will be able to train with my own datasets and experiment, which is great.

The key point is the usage of torch-1.1.0 with cuda-10.0 to build the fork of torchvision 0.3.0 (training and onnx export are done remotely on a server providing better ML resources than the Jetson nano).

Then I tried the onnx on the Jetson nano. Hopefully I did that, even if the result of trtexec on the remote server returns an error / assert

trtexec (tensorrt 6.0.1.5), with the onnx generated using torch 1.1.0/cuda-10.0, returns another error/ assert than the one I get using torch-1.3/ cuda-10.1 (ERROR: builtin_op_importers.cpp:695 In function importBatchNormalization: [6] Assertion failed: scale_weights.shape == weights_shape).

Anyway… I am good to continue now.

You had a good catch about the ‘export_onnx’ flag in the DeepScene constructor. I am not sure why I removed it, I completely forgot about it. A residue of previous tries and errors. I checked out the original code from github.

Regarding some minor fixes that I needed to do to be able to move on (it’s related to Python 2.7 if I recall), I share the details below.

diff -bur pytorch-segmentation-master/datasets/deepscene_remap.py pytorch-segmentation-master-vlf/datasets/deepscene_remap.py
--- pytorch-segmentation-master/datasets/deepscene_remap.py	2019-08-30 13:37:06.000000000 -0400
+++ pytorch-segmentation-master-vlf/datasets/deepscene_remap.py	2020-05-09 00:27:09.000000000 -0400
@@ -74,9 +74,9 @@
 	for n in range(len(files)):
 		worker_args.append((os.path.join(args.input, files[n]), os.path.join(args.output, files[n]), args.colorized))
 
-	#for n in worker_args:
-	#    remap_labels(n)
+	for n in worker_args:
+	    remap_labels(n)
 
-	with ProcessPool(processes=args.workers) as pool:
-		pool.map(remap_labels, worker_args)
+	#with ProcessPool(processes=args.workers) as pool:
+	#	pool.map(remap_labels, worker_args)

diff -bur pytorch-segmentation-master/utils.py pytorch-segmentation-master-vlf/utils.py
--- pytorch-segmentation-master/utils.py	2019-08-30 13:37:06.000000000 -0400
+++ pytorch-segmentation-master-vlf/utils.py	2020-05-09 01:38:03.000000000 -0400
@@ -290,7 +290,7 @@
     torch.cuda.set_device(args.gpu)
     args.dist_backend = 'nccl'
     print('| distributed init (rank {}): {}'.format(
-        args.rank, args.dist_url), flush=True)
+        args.rank, args.dist_url))
     torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
                                          world_size=args.world_size, rank=args.rank)

In the meantime, I tried to train and generate the onnx using Python 3, but no success. But that’s OK for now… until it’s not OK anymore (soon ?).

My goal is to do transfer learning and domain adaptation with my own images using a model for “real-time” semantic segmentation of bicycle path/road in bad conditions, like half wet, dirty, a bit of snow, etc. And deepscene dataset looks like a good starting point (forest pathway).

Thanks a lot for your help, you unblocked me big time, and I am sincerely very grateful.

Wishing you a lot of fun.

Vincent