Segmentation model provided with jetson.inference, performance and output layer size

jp16 · December 25, 2020, 4:48pm

Hello,

I trained my own FCN_resnet18 segmentation network for a cityskape like dataset.
The model is working, but I have huge performance issue. Network FPS is 30, which is really good, but I am only able to make 4 inference loop per second.

According to profile information, all the time is used to post process the inference result in segnet function.

My code is strongly inspired by the segmentation training example from jetson.inference. My understanding was that the example provided with various tools like segnet.py were generated using this tool. Obviously not.

Provided models like fcn-resnet18-cityscapes-512x256 have an output layer of 16x8xnum_class, were default FCN_resnet have output layers with the same size as input one (512x256xnum_class in my case).

Could we have access to the code used to generate this models structure ?
Does someone know which transformation can be done to reduce output layer size ?

Thanks for your help & merry Xmas to you

AastaLLL · December 28, 2020, 8:03am

Hi,

The grid-size is decided in training time and cannot be changed afterward.
If you want a buffer size output, please try the net.Mask() function.

Here is a related topic for your reference:

Thanks.

dusty_nv · December 28, 2020, 6:48pm

The training code I use for fcn-resnet18 models was here: GitHub - dusty-nv/pytorch-segmentation: Training of semantic segmentation networks with PyTorch

I would check this more recent PR before trying to use it: Dev by Onixaz · Pull Request #4 · dusty-nv/pytorch-segmentation · GitHub

Basically I disable the deconv layer when exporting the model to ONNX, because the deconv is slow and it is a linear operator (i.e. learning rate is 0) and the same function is accomplished by my faster CUDA kernel for bilinear interpolation. This is why you see the raw output grid size of my models is smaller, because it is pre-deconv and then I do that in post-processing.

Why the post-processing is extra slow for you, may be because if you do the bilinear interpolation on a full grid size (i.e. 512x256 as opposed to 16x8) it takes a lot of extra time (as isn’t even needed, because the deconv was already done in your model). So you should run segnet.py with --filter-mode=point option to disable that extra post-processing (or disable deconv when exporting your model).

jp16 · December 29, 2020, 9:05am

Thanks for your responses

This is the way to go as overlay and mask generation are fast and smooth.
All time being used in the https://github.com/dusty-nv/jetson-inference/blob/master/c/segNet.cpp function segNet::classify. From a time perspective it’s also coherent. Your output grid is 32 time smaller in both dimension, 32x32=1024, and I see a 1000x slower post processing.

I was unable to find were you disable this deconv layer in your code nor to find a criteria that would avoid ONNX exporter to add this extra stuff around traditional layers. Do you have a code extract or a hint that you can share as a starting point for me ?

It’s really unclear to me how to control output layer grid size. Were does this 32 ratio comes from between your input picture size and your output grid ? Was expecting it to be a consequence of net structure, but I am not sure, as a 768*364 picture still produce a 16x8 grid (would have expected a 24x12 with a linear 1/32 ratio). So scale is some how linked to a range ?

I just realize my previous Neural Net experience was before Nvidia creation. I am eager to catch up all incredible improvement you guys are introducing. Jetson performance and range of application just blows my mind.

dusty_nv · December 29, 2020, 9:03pm

This is where the deconv interpolation gets skipped in the case of ONNX export:

https://github.com/dusty-nv/pytorch-segmentation/blob/9f95e8d30a6a13a17160d98b52ed920f02d84576/models/segmentation/_utils.py#L29

This flag is then passed to the model constructor during the ONNX export script:

https://github.com/dusty-nv/pytorch-segmentation/blob/9f95e8d30a6a13a17160d98b52ed920f02d84576/scripts/onnx_export.py#L45

This is the default ratio for the FCN-ResNet models, I didn’t set that myself and am unsure how to change it. I think it is dependent on the setup of convolution layers.

Topic		Replies	Views
Changing the GridSize of segnet Jetson Xavier NX jetson-inference	6	731	October 18, 2021
Jetson-inference - running a custom semantic segmentation model Jetson AGX Xavier cuda , jetson-inference	9	1723	October 5, 2021
Semantic Segmentation_Outdoor Navigation with Segnet and Jetson Nano Jetson Nano jetson-inference	7	784	October 18, 2021
Using custom model on segnet-camera.py of jetson-inference Jetson Nano jetson-inference	6	968	October 18, 2021
Unable to get segmentation to work with Jetson TX2 Jetson TX2	25	6646	October 18, 2021
Semantic Segmentation_Outdoor Navigation with Segnet and Jetson Nano Jetson TX2 jetson-inference	6	1018	October 18, 2021
Will there be a centralised workflow for creating custom models for semantic segmentation on Jetson devices? Jetson Nano machine-learning , segmentation	4	219	May 21, 2024
Semantic Segmentation - Border Jetson Nano jetson-inference	10	2560	November 17, 2021
Custom ResNet Jetson Xavier Jetson Xavier NX jetson-inference	12	3094	October 18, 2021
How to run Schematic Segmentation samples in Nano Jetson Nano	18	3896	October 18, 2021

Segmentation model provided with jetson.inference, performance and output layer size

Related topics