Need working example of deployment of Mask RCNN to Jetson

Hi,
I’m working with the nvidia ML tools for the first time, and I’m finding myself up to my eyeballs in documentation, none of which seems to work.
I’m running the TAO Mask-RCNN segmentation. I have a working model thanks to the very nice jupyter notebook example.

However, I’ve hit several dead ends when trying to deploy the model on a Jetson device. There seem to be many deployment methods, and none of them work.
I have tried exporting a 32 bit version of the engine file from my desktop and running it on the Jetson with TensorRT in python with runtime.deserialize_cuda_engine(f.read())
this yields:
Reading engine from file model.step-25000.engine
[11/17/2022-15:07:26] [TRT] [E] 1: [stdArchiveReader.cpp::StdArchiveReader::40] Error Code 1: Serialization (Serialization assertion stdVersionRead == serializationVersion failed.Version tag does not match. Note: Current Version: 213, Serialized Engine Version: 205)
[11/17/2022-15:07:26] [TRT] [E] 4: [runtime.cpp::deserializeCudaEngine::49] Error Code 4: Internal Error (Engine deserialization failed.)

I have tried exporting an int8 version from TAO, then running it through tao-convert on the orin and attempting to load the resulting file. It seems to take in 32 bit float input (which makes no sense to me) and returns zero detections. (The outputs are also undocumented, so I’m not entirely sure how to interpret them. Is there any source available for how TAO runs the model?)

I have tried reading the etlt file directly into deepstream. There is no example file for this anywhere, but following examples in the TLT user guide, I trie d running deepstream-app with a custom configuration file. that yielded:

:00:00.221617212  9295 0xaaaadbbdf130 INFO                 nvinfer gstnvinfer.cpp:646:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1923> [UID = 1]: Trying to create engine from model files
WARNING: [TRT]: The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible.
ERROR: [TRT]: 3: conv1/Conv2D:kernel weights has count 3528 but 0 was expected
ERROR: [TRT]: 4: conv1/Conv2D: count of 3528 weights in kernel, but kernel dimensions (7,7) with 0 input channels, 24 output channels and 1 groups were specified. Expected Weights count is 0 * 7*7 * 24 / 1 = 0

along with many other errors.

I have no doubt I’m doing something wrong, but neither exporting .engine files or importing .etlt files seems to work.

In many of my dead-end attempts, I got fatal warnings about implicit batch size not being implimented in the model. I suspect that my attempts to export .engine files were foiled because of mismatching TRT versions on the Jetson (which only supports up to 8.4.1) and TAO (which is on 8.5 I think).

I am running the most recent release of TAO (downloaded yesterday).
I am trying to run on an Orin Jetson devkit, which I just upgraded to JetPack 5.0.2.

Can anyone point me at a complete working system to run Mask RCNN on jetson? Eventually I’ll want it running in my own codebase, so the deepstream-app route is not ideal, but I want to start with anything that works.

Thanks,
Nathaniel Tagg

The error in the log implicates that the mismatching version of trt version. For example, if you build a tensorrt engine with trt 8.2 version , but run inference under trt 8.4 environment, the error will happen.
So, if you are going to run inference in Jetson device, please download Jetson version of tao-converter and run it against the .etlt model to generate a tensorrt engine. Refer to the command MaskRCNN — TAO Toolkit 3.22.05 documentation

To run a maskrcnn model with deepstream, please refer to user guide
MaskRCNN — TAO Toolkit 3.22.05 documentation and then follow peoplesegnet in GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream. Peoplesegnet is a purpose-build model which is based on Mask_rcnn network. As mentioned in user guide, users can config the .etlt model or tensorrt engine in the config file.

You can run command similar to below.

For outputs layer , please refer to GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream Or MaskRCNN — TAO Toolkit 3.22.05 documentation

More, besides deepstream, users can also run inference with triton server. Integrating TAO CV Models with Triton Inference Server — TAO Toolkit 3.22.05 documentation

Refer to the the command for Peoplesegnet . Also, users can see how the preprocessing or postprocessing works.

I have tried running the tao-converter on the jetson, using the relative version. However, when I attempt to run this model with code from here:
https://github.com/NVIDIA/TensorRT/raw/main/quickstart/SemanticSegmentation/tutorial-runtime.ipynb
I can actually make it load and run… and find that it returns all zeros in the detection outputs and what looks to me like random noise in the mask outputs. It is unclear to me whether the problem is the etlt file, the conversion, loading the model, preprocessing the data, or running the model.

Asking me to run a file “like this one” requires me to go line by line through the entire file and attempt to guess or dig through the obscure documentation to see which lines are relvant and need to be changed. I tried this (with the yaml file version) and go nowhere.

The other documentation is contradictory. I tried following the TAO 3.22.05 documentation, but it only disucsses the deepstream-app approach. I tried this approach, the best I could do is:

WARNING: [TRT]: The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible.
ERROR: [TRT]: UffParser: Could not read buffer.
parseModel: Failed to parse UFF model

if I ran against the (jetson-generated) .enigine file, or

WARNING: [TRT]: The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible.
ERROR: [TRT]: 3: conv1/Conv2D:kernel weights has count 3528 but 0 was expected
ERROR: [TRT]: 4: conv1/Conv2D: count of 3528 weights in kernel, but kernel dimensions (7,7) with 0 input channels, 24 output channels and 1 groups were specified. Expected Weights count is 0 * 7*7 * 24 / 1 = 0

if I ran against the .etlt file.

If I follow the deepstream_tao_apps example: the first line yields

$ ./apps/tao_segmentation/ds-tao-segmentation -c configs/peopleSemSegNet_tao/pgie_peopleSemSegNet_tao_config.txt -i file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4

(process:14281): GLib-WARNING **: 10:51:52.337: GError set over the top of a previous GError or uninitialized memory.
This indicates a bug in someone's code. You must ensure an error is NULL before it's set.
The overwriting error message was: Key file does not have group ?property?

(process:14281): GLib-WARNING **: 10:51:52.337: GError set over the top of a previous GError or uninitialized memory.
This indicates a bug in someone's code. You must ensure an error is NULL before it's set.
The overwriting error message was: Key file does not have group ?property?
Request sink_0 pad from streammux
batchSize 1...
Failed to load config file: No such file or directory
** ERROR: <gst_nvinfer_parse_config_file:1303>: failed
Now playing: configs/peopleSemSegNet_tao/pgie_peopleSemSegNet_tao_config.txt
Opening in BLOCKING MODE 
0:00:00.294304982 14281 0xaaaac6c32200 WARN                 nvinfer gstnvinfer.cpp:800:gst_nvinfer_start:<primary-nvinference-engine> error: Configuration file parsing failed
0:00:00.294351287 14281 0xaaaac6c32200 WARN                 nvinfer gstnvinfer.cpp:800:gst_nvinfer_start:<primary-nvinference-engine> error: Config file path: configs/peopleSemSegNet_tao/pgie_peopleSemSegNet_tao_config.txt
Running...
ERROR from element primary-nvinference-engine: Configuration file parsing failed
Error details: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(800): gst_nvinfer_start (): /GstPipeline:ds-custom-pipeline/GstNvInfer:primary-nvinference-engine:
Config file path: configs/peopleSemSegNet_tao/pgie_peopleSemSegNet_tao_config.txt
Returned, stopping playback
Deleting pipeline

This runs quickly, indicating it is not even successfully configuring iteslf.

The second line yields this:

$ apps/tao_segmentation/ds-tao-segmentation configs/apps/seg_app_unet.yml

terminate called after throwing an instance of 'YAML::BadFile'

So that’s not really a great exmample to start me on.

UPDATE Oh, wait… you actually have to specify “./apps/” instead of “apps/”. Wow, I didn’t even know a script would be able to tell the difference between those two.

If I run the second command verbatim, I get a lot of warning messages then

ERROR: [TRT]: 3: Cannot find binding of given name: softmax_1
0:02:08.665653188 14013 0xaaaad422f430 WARN                 nvinfer gstnvinfer.cpp:643:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1876> [UID = 1]: Could not find output layer 'softmax_1' in engine
nvbufsurface: Could not get EGL display connection
nvbufsurface: Can't get EGL display
0:02:08.806336931 14013 0xaaaad422f430 WARN                 nvinfer gstnvinfer.cpp:943:gst_nvinfer_start:<primary-nvinference-engine> error: Failed to set buffer pool to active
Running...

**PERF:  FPS 0 (Avg)	
Fri Nov 18 10:39:04 2022
**PERF:  0.00(0.00)	
ERROR from element primary-nvinference-engine: Failed to set buffer pool to active
Error details: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(943): gst_nvinfer_start (): /GstPipeline:ds-custom-pipeline/GstNvInfer:primary-nvinference-engine

This takes a long time, with many warnings about fp16 values.

This is even before I attempt to change anything to use a MaskRCNN model or my own input files; this is out of the box.

Any insights?

Ah, I realize that in the first example, the pgie…txt file doesn’t even exist.
Wow, that would have been a helpful error message. I see if I look in the subfolder vanilla/ there is a file there to run.
Well, this file actually appears to do something. It reports 25 fps but the screen is updating at about 2fps, so I’ll have to dig into it. I’ll try to get maskrcnn working with it first.

EDIT: Using the .yml file does NOT work, only the .txt file. Sigh.

OK.
More,

Your mentioned ipynb is not from official tao jupyter notebook. Please download from
https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html#computer-vision

The latest version is 1.4.1 . See TAO Toolkit Computer Vision Sample Workflows | NVIDIA NGC

Your mentioned ipynb is not from official tao jupyter notebook. Please download from
TAO Toolkit Quick Start Guide — TAO Toolkit 3.22.05 documentation

Yes, I am using those notebooks for training on my desktop. The tutorial notebook I referenced was something I was trying to use for deployment.

My bad. Please refer to peoplesegnet. Peoplesegnet is based on Maskrcnn network.

OK, that helped a lot. By running this example, I was able to reverse-engineer enough to get a minimal working model with my own code.

Also very helpful was this: