How to find the input/output layers names of tlt/etlt model

rmpt · March 3, 2021, 7:40pm

Hello,

I’m trying to test Nvidia Gesturenet (https://ngc.nvidia.com/catalog/models/nvidia:tlt_gesturenet) on deepstream.

I need to fill uff-input-blob-name and output-blob-names with the correct input and output layers names of this model, but I can’t find this information anywhere.

Is there any tool that we can use to visualize the network architecture and layers names of a TLT model?

Thank you

Morganh · March 4, 2021, 2:02am

A quick way for you to get the uff-input-blob-name and output-blob-names is that you can find them in the config file in deepstream.
For example, for detectnet_v2,
https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/object_detection/detectnet_v2.html#deepstream-configuration-file

uff-input-blob-name=input_1
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd

rmpt · March 4, 2021, 12:06pm

Hi @Morganh

Thanks, in fact I tried that before open this topic :)

There’s no mention on the Gesturenet documentation (NGC or TLT 3.0 pages) of what underlying network model it is using.
As in your suggestion, I took by example configurations of TLT models, and had tried to use with Gesturenet the following config that is working fine with some other TLT detectors:

tlt-model-key=nvidia_tlt
input-dims=3;160;160;0
uff-input-blob-name=input_1
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd

but in the Gesturenet case I get the following error when deepstream is trying to convert the .etlt model to a TensorRT engine:

mar 04 11:39:29 : Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1715> [UID = 2]: Trying to create engine from model files
mar 04 11:39:33: ERROR: [TRT]: UffParser: Could not read buffer.
mar 04 11:39:33: parseModel: Failed to parse UFF model
mar 04 11:39:33: ERROR: failed to build network since parsing model errors.
mar 04 11:39:33: ERROR: Failed to create network using custom network creation function
mar 04 11:39:33: ERROR: Failed to get cuda engine from custom library API

This kind of error generally happens when the tlt-model-key or uff-input-blob-name/output-blob-names are wrong, that’s why I asked in this topic if there’s some way to validate them.

Morganh · March 4, 2021, 12:25pm

For Gesturenet, please follow “TLT COMPUTER VISION INFERENCE PIPELINE” Section in tlt 3.0 user guide .
https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/tlt_cv_inf_pipeline/overview.html

rmpt · March 4, 2021, 12:50pm

Thanks but I’m afraid that does not answer my question at all :(

It’s still not clear why Deepstream is capable of convert FaceDetect TLT model to a TensorRT engine, but is not capable of doing the same with the Gesturenet

Additional note: I can not use tlt-convert as workaround for the problem I’m trying to solve

Morganh · March 4, 2021, 1:51pm

For gesturenet inference, there are two ways by default.

See https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/gesture_recognition.html#running-inference-on-the-model . You can download the notebook for reference. Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation
See https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/gesture_recognition.html#deploying-to-the-tlt-cv-inference-pipeline
You can deploy a model trained through TLT workflow to the TLT CV Inference Pipeline.

Morganh · March 4, 2021, 2:57pm

For the trt engine method you mentioned, you can download via

ngc registry resource download-version “nvidia/tlt_cv_inference_pipeline_quick_start:v0.1-dp”

https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/tlt_cv_inf_pipeline/requirements_and_installation.html#download-the-tlt-cv-inference-pipeline-quick-start

Then, in tlt_cv_compile.sh, you can refer to below to generate trt engine.

      tlt-converter -k ${ENCODING_KEY} -t fp16 \
            -p input_1,1x3x160x160,1x3x160x160,2x3x160x160 \
            -e /models/triton_model_repository/hcgesture_tlt/1/model.plan \
            /models/tlt_cv_gesture_v${tlt_jarvis_ngc_version}/gesture.etlt

rmpt · March 4, 2021, 2:59pm

I did a try for the step number 2) that you shared about using the TLT 3.0 export, but is not working with the unpruned Gesturenet model available on NGC.

I’ve mounted the following folder to tlt’s docker:

        "source": "~/models/nvidia_ngc/GestureNet/unpruned",
        "destination": "/gesturenet"

and used the following command:

~/models/nvidia_ngc/GestureNet/unpruned$ tlt gesturenet export -m /gesturenet/model.tlt -k nvidia_tlt -o /gesturenet/model.etlt -t ‘tfonnx’

And it fails with: Failed to convert: inputs/outputs specified do not exist

2021-03-04 14:52:57,940 [WARNING] tf2onnx.tfonnx: Argument verbose for process_tf_graph is deprecated. Please use --verbose option instead.
2021-03-04 14:52:58,693 [ERROR] tf2onnx.tfonnx:
Failed to convert: inputs/outputs specified do not exist, make sure your passedin format: input/output_node_name:port_id. Problematical inputs/outputs are: {‘None:0’}

Traceback (most recent call last):
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/classifynet/scripts/export.py”, line 114, in
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/classifynet/scripts/export.py”, line 110, in main
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/utilities/tlt_utils.py”, line 316, in save_etlt_file
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/utilities/tlt_utils.py”, line 407, in pb_to_onnx
File “/usr/local/lib/python3.6/dist-packages/tf2onnx/tfonnx.py”, line 407, in process_tf_graph
raise ValueError(“Inputs/Outputs Not Found”)
ValueError: Inputs/Outputs Not Found
Traceback (most recent call last):
File “/usr/local/bin/gesturenet”, line 8, in
sys.exit(main())
File “/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/classifynet/entrypoint/classifynet.py”, line 12, in main

rmpt · March 4, 2021, 3:06pm

The corrupted etlt file generated today is 1MB bigger than the one shared to download in nvidia GTC as “unpruned” model, which might indicate the one’s in NGC is corrupted also:

45M mar 4 14:51 model.etlt

and the one directly downloaded from NGC:

44M fev 24 01:24 model.etlt

Not sure about this until someone can share if effectively was able to run the etlt model available on the NGC market…

Ah, what I wrote about tlt-export is that I cannot use it, is not practical for production solutions using DeepStream and multiple kinds of HW targets with different TensorRT/Cuda versions.

Morganh · March 4, 2021, 3:26pm

Please remove -t ‘tfonnx’(It is an optional argument. ) and run again.
I will sync with internal team for the error when you run with ‘tfonnx’.