Segmentation model provided with jetson.inference, performance and output layer size


I trained my own FCN_resnet18 segmentation network for a cityskape like dataset.
The model is working, but I have huge performance issue. Network FPS is 30, which is really good, but I am only able to make 4 inference loop per second.

According to profile information, all the time is used to post process the inference result in segnet function.

My code is strongly inspired by the segmentation training example from jetson.inference. My understanding was that the example provided with various tools like were generated using this tool. Obviously not.

Provided models like fcn-resnet18-cityscapes-512x256 have an output layer of 16x8xnum_class, were default FCN_resnet have output layers with the same size as input one (512x256xnum_class in my case).

Could we have access to the code used to generate this models structure ?
Does someone know which transformation can be done to reduce output layer size ?

Thanks for your help & merry Xmas to you


The grid-size is decided in training time and cannot be changed afterward.
If you want a buffer size output, please try the net.Mask() function.

Here is a related topic for your reference:


The training code I use for fcn-resnet18 models was here:

I would check this more recent PR before trying to use it:

Basically I disable the deconv layer when exporting the model to ONNX, because the deconv is slow and it is a linear operator (i.e. learning rate is 0) and the same function is accomplished by my faster CUDA kernel for bilinear interpolation. This is why you see the raw output grid size of my models is smaller, because it is pre-deconv and then I do that in post-processing.

Why the post-processing is extra slow for you, may be because if you do the bilinear interpolation on a full grid size (i.e. 512x256 as opposed to 16x8) it takes a lot of extra time (as isn’t even needed, because the deconv was already done in your model). So you should run with --filter-mode=point option to disable that extra post-processing (or disable deconv when exporting your model).

Thanks for your responses

This is the way to go as overlay and mask generation are fast and smooth.
All time being used in the function segNet::classify. From a time perspective it’s also coherent. Your output grid is 32 time smaller in both dimension, 32x32=1024, and I see a 1000x slower post processing.

I was unable to find were you disable this deconv layer in your code nor to find a criteria that would avoid ONNX exporter to add this extra stuff around traditional layers. Do you have a code extract or a hint that you can share as a starting point for me ?

It’s really unclear to me how to control output layer grid size. Were does this 32 ratio comes from between your input picture size and your output grid ? Was expecting it to be a consequence of net structure, but I am not sure, as a 768*364 picture still produce a 16x8 grid (would have expected a 24x12 with a linear 1/32 ratio). So scale is some how linked to a range ?

I just realize my previous Neural Net experience was before Nvidia creation. I am eager to catch up all incredible improvement you guys are introducing. Jetson performance and range of application just blows my mind.

This is where the deconv interpolation gets skipped in the case of ONNX export:

This flag is then passed to the model constructor during the ONNX export script:

This is the default ratio for the FCN-ResNet models, I didn’t set that myself and am unsure how to change it. I think it is dependent on the setup of convolution layers.

1 Like