Fpenet retraining output file onnx but deepstream is using tlt

The deployable_v1.0 of the fpenet give me a tlt file and an a calibration.txt file that i use in deepstream nvinfer

when i run the pipeline it will generate a .engine file for my particular jetson.

once i re-train the fpenet (on a server machine with a TESLA gpu) the output directory shows me:

/getting_started_v5.0.0/experiments/fpenet/models/exp1$ ls -la | grep -v ckzip | grep -v hdf5
total 998412
drwxr-xr-x 3 root   root       4096 Oct  5 08:41 .
drwxr-xr-x 3 ubuntu ubuntu     4096 Oct  5 08:13 ..
drwxr-xr-x 2 root   root       4096 Oct  5 08:14 events
-rw-r--r-- 1 root   root   27602387 Oct  5 08:36 events.out.tfevents.1696493696.a1a0f1ba5a3c
-rw-r--r-- 1 root   root       2816 Oct  5 08:14 experiment_spec.yaml
-rw-r--r-- 1 root   root   20696673 Oct  5 08:15 graph.pbtxt
-rw-r--r-- 1 root   root       2238 Oct  5 08:40 int8_calibration.bin
-rw-r--r-- 1 root   root    1479923 Oct  5 08:40 int8_calibration.tensorfile
-rw-r--r-- 1 root   root    6353401 Oct  5 08:38 kpi_testing_all_data.json
-rw-r--r-- 1 root   root       3449 Oct  5 08:38 kpi_testing_error_per_point.csv
-rw-r--r-- 1 root   root        428 Oct  5 08:38 kpi_testing_error_per_region.csv
-rw-r--r-- 1 root   root    1036060 Oct  5 08:41 model.int8.engine
-rw-r--r-- 1 root   root    2350995 Oct  5 08:40 model.onnx
-rw-r--r-- 1 root   root       5986 Oct  5 08:41 result.txt
-rw-r--r-- 1 root   root      28269 Oct  5 08:41 status.json
-rw-r--r-- 1 root   root      11048 Oct  5 08:36 validation.log

so i know .onnx is a generic model format, and .engine is an optimised version , but i assume this is optimised for the TESLA gpu…

i found that

trtexec --onnx=<model.onnx> --saveEngine=<model.plan>

will create a tensorrt model file… but that still is not a tlt file…

should i change my deepstream pipeline to use the tensorrt model file ? or is there a way to transpose onnx to tlt ? or tensorrt to tlt ?

You can generate tensorrt engine and config it in
model-engine-file
https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/configs/nvinfer/facial_tao/faciallandmark_sgie_config.txt#L51C1-L51C18.

Q1: isn’t the engine file very specific to the hardware ?

I am running the re-training and export on a TESLA gpu (AWS) but the resulting model will run the inference on an TX2 NX.

Q2: is the engine file generic enough to be generated on 1 type of hardware and run on another ?

my confusion is because when i specify a etlt to be used by nvinfer, on first start up it will generate the engine file, so in my mind the engine file is just a runtime binary of the tlt for that specific hardware (a bit like a just-in-time compiler).
Q3: Am i wrong in this assumption ?

so on my jetson i have:

-rw-r--r-- 2 root root 2367579 Sep 21 03:53 faciallandmarks.etlt
-rw-r--r-- 2 root root 2085781 Sep 21 03:53 faciallandmarks.etlt_b32_gpu0_fp16.engine
-rw-r--r-- 2 root root    4612 Sep 21 03:53 faciallandmarks_cal.txt

where the .etlt_b32_gpu0_fp16.engine file gets auto generated on first start-up by nvinfer

and the nvinfer config file looks like:

[property]
gpu-id=0
model-engine-file=faciallandmarks.etlt_b32_gpu0_fp16.engine
tlt-model-key=nvidia_tlt
tlt-encoded-model=faciallandmarks.etlt
int8-calib-file=faciallandmarks_cal.txt

Q4: Are you saying that i can take the model.int8.engine from the retraining and only specify the model-engine-file in the nvinfer config ? If so, then i worry about the generated one having int8 in the name and the currently generated nvinfer engine having tfp16 in the name.
Q5: can you please clarify what should be the correct ‘type’ of the engine file for running on the TX2 NX and how this can be generated by the retraining ?

  1. The engine file is very specific to the hardware. In TX2, please use trtexec to generate engine using the onnx file. If run the inference in TX2 and one docker, please generate engine inside the docker.
  2. No, usually the engine is not generic.
  3. Please comment out tlt-model-key, tlt-encoded-model.
  4. The trtexec can generate fp16 or int8 engine. Then we can config it into model-engine-file. Deepstream will run it directly when it finds that engine is already available.
  5. Refer to https://docs.nvidia.com/tao/tao-toolkit/text/trtexec_integration/trtexec_fpenet.html.
1 Like

in the link supplied about trtexec it has an example of :

trtexec --onnx=./model.onnx \
        --maxShapes=input_face_images:16x1x80x80 \
        --minShapes=input_face_images:1x1x80x80 \
        --optShapes=input_face_images:8x1x80x80 \
        --calib=./int8_calibration.tensorfile \
        --fp16 \
        --int8 \
        --saveEngine=./model.engine

so i run this on my TX2 NX.

Q1: for my TX2 NX would i need to use --fp16 or --int8 or both ?
Q2: is int8_calibration.tensorfile the right calibration file to use for the --calib option ? or should i use int8_calibration.bin ?

my (updated) nvinfer configuration now looks like:

[property]
gpu-id=0
model-engine-file=model.engine
#dynamic batch size
batch-size=32
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=1
interval=0
output-blob-names=softargmax/strided_slice:0;softargmax/strided_slice_1:0
#0=Detection 1=Classifier 2=Segmentation 100=other
network-type=100
# Enable tensor metadata output
output-tensor-meta=1
#1-Primary  2-Secondary
process-mode=2
gie-unique-id=2
operate-on-gie-id=1
net-scale-factor=1.0
offsets=0.0
input-object-min-width=5
input-object-min-height=5
#0=RGB 1=BGR 2=GRAY
model-color-format=2

[class-attrs-all]
threshold=0.0

but when the nvinfer gets started (worked before with model.etlt )
with the following errors:

0:00:23.501706052  5351   0x559a60bd90 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<nvinfer-second-10> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 2]: deserialized trt engine from :/etc/mtdata/aimodels/driversafety/1000/faciallandmarks.engine
INFO: [Implicit Engine Info]: layers num: 4
0   INPUT  kFLOAT input_face_images 1x80x80
1   OUTPUT kFLOAT conv_keypoints_m80 80x80x80
2   OUTPUT kFLOAT softargmax      80x2
3   OUTPUT kFLOAT softargmax:1    80

0:00:23.502988010  5351   0x559a60bd90 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<nvinfer-second-10> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::checkBackendParams() <nvdsinfer_context_impl.cpp:1833> [UID = 2]: Backend has maxBatchSize 1 whereas 32 has been requested
0:00:23.503042505  5351   0x559a60bd90 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<nvinfer-second-10> NvDsInferContext[UID 2]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2012> [UID = 2]: deserialized backend context :/etc/mtdata/aimodels/driversafety/1000/faciallandmarks.engine failed to match config params, trying rebuild
0:00:23.509009871  5351   0x559a60bd90 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<nvinfer-second-10> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1914> [UID = 2]: Trying to create engine from model files
ERROR: failed to build network since there is no model file matched.
ERROR: failed to build network.
0:00:23.870678388  5351   0x559a60bd90 ERROR                nvinfer gstnvinfer.cpp:632:gst_nvinfer_logger:<nvinfer-second-10> NvDsInferContext[UID 2]: Error in NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1934> [UID = 2]: build engine file failed
0:00:23.871718879  5351   0x559a60bd90 ERROR                nvinfer gstnvinfer.cpp:632:gst_nvinfer_logger:<nvinfer-second-10> NvDsInferContext[UID 2]: Error in NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2020> [UID = 2]: build backend context failed
0:00:23.871798749  5351   0x559a60bd90 ERROR                nvinfer gstnvinfer.cpp:632:gst_nvinfer_logger:<nvinfer-second-10> NvDsInferContext[UID 2]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1257> [UID = 2]: generate backend failed, check config file settings
0:00:23.871852284  5351   0x559a60bd90 WARN                 nvinfer gstnvinfer.cpp:841:gst_nvinfer_start:<nvinfer-second-10> error: Failed to create NvDsInferContext instance
0:00:23.871876764  5351   0x559a60bd90 WARN                 nvinfer gstnvinfer.cpp:841:gst_nvinfer_start:<nvinfer-second-10> error: Config file path: /etc/mtdata/aimodels/driversafety/1000/faciallandmark.txt, NvDsInfer Error: NVDSINFER_CONFIG_FAILED
0:00:23.873498011  5351   0x559a60bd90 ERROR              gstdstate gstd_state.c:241:gstd_state_update:<GstdState@0x7f78a8bac0> Failed to change the state of the pipeline
Opening in BLOCKING MODE
Opening in BLOCKING MODE

Q3: what is the way forward to move from the retraining to using it in nvinfer ?

please ignore my question… i had to change the batch-size=32 to batch-size=1

A few more questions:

the retrained model.engine seems to run a lot slower (fps) than the original supplied model.etlt
in other retraining (for facenet) i’ve seen that there is a truncate (and subsequent retraining step)

Q1 Is there a truncate/retraining step that isn’t shown in the fpenet notebook ?

Q2: If there isn’t a truncate/retraining step then how can i make the model.engine (Created from model.onnx via trtexec ) behave as responsively as the supplied (deployable) model.etlt ?

I also would like to know:

in the link supplied about trtexec it has an example of :

trtexec --onnx=./model.onnx \
        --maxShapes=input_face_images:16x1x80x80 \
        --minShapes=input_face_images:1x1x80x80 \
        --optShapes=input_face_images:8x1x80x80 \
        --calib=./int8_calibration.tensorfile \
        --fp16 \
        --int8 \
        --saveEngine=./model.engine

so i run this on my TX2 NX.

Q3: for my TX2 NX would i need to use --fp16 or --int8 or both ?
Q4: is int8_calibration.tensorfile the right calibration file to use for the --calib option ? or should i use int8_calibration.bin ?

  1. There isn’t truncate/retraining step.
  2. Please check the model size. Also, please double check while running both etlt model with the same environment. You can generate tensorrt engine with the same trtexec command. You can use code to decode the official etlt model to onnx file.
  3. For fp16 mode, you can run with --fp16 only. If for int8 mode, you can run with --int8 only. But TX2 has not int8 feature. So, you can only with fp16 mode or default fp32 mode.
  4. Should be xxx.bin file.

Even after converting on the TX2 using the correct values for a TX2:

/usr/src/tensorrt/bin/trtexec --onnx=./model.onnx \
                              --maxShapes=input_face_images:16x1x80x80 \ 
                              --minShapes=input_face_images:1x1x80x80 \
                              --optShapes=input_face_images:8x1x80x80 \
                              --calib=./int8_calibration.bin \
                              --fp16  \
                              --saveEngine=./model.engine \
                              --workspace=2048

the performance of the inference using nvinfer is still a lot worse than the supplied deployable_v1.0 model.
I run it in the same environment on the same device as the etlt file, and from a guess the fps has dropped in half.

Q1: are the --maxShapes --minShapes --optShapes values correct ?

Q2: When you say check the model size, do you mean the actual file size of the onnx file ? or the number of nodes in the network ? i wouldn’t think the network shape would change by retraining, just the values ?

Q3:The code pointed to exports to a uff format… is that the same as onnx ? Just wondering if those files can be compared using file-size ? Or do you mean to use trtexec to transform the nvidia etlt file into an engine file as well ?

Q4: Has anything else been done to the deployable_v1.0 model to make it so responsive ?

Q5: I noticed the etlt model gets converted to a .engine file as well on first run by nvinfer. Does the conversion by nvinfer also use trtexec and would the etlt conversion to .engine produce a different result as going from .onnx to .engine ?

When loading the retrained model in nvinfer i receive the following warning:

WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.

log:

0:00:23.367981731  1011   0x55c2703d90 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<nvinfer-second-10> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1900> [UID = 2]: deserialized trt engine from :/etc/aimodels/driversafety/100000/faciallandmarks.etlt_b1_gpu0_fp16.engine
INFO: [FullDims Engine Info]: layers num: 4
0   INPUT  kFLOAT input_face_images 1x80x80         min: 1x1x80x80       opt: 8x1x80x80       Max: 16x1x80x80
1   OUTPUT kFLOAT conv_keypoints_m80 80x80x80        min: 0               opt: 0               Max: 0
2   OUTPUT kFLOAT softargmax      80x2            min: 0               opt: 0               Max: 0
3   OUTPUT kFLOAT softargmax:1    80              min: 0               opt: 0               Max: 0

0:00:23.370253357  1011   0x55c2703d90 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<nvinfer-second-10> NvDsInferContext[UID 2]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2004> [UID = 2]: Use deserialized engine model: /etc/aimodels/driversafety/100000/faciallandmarks.etlt_b1_gpu0_fp16.engine
0:00:23.547472860  1011   0x55c2703d90 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<nvinfer-second-10> [UID 2]: Load new model:/etc/aimodels/driversafety/100000/faciallandmark.txt sucessfully
0:00:23.547740699  1011   0x55c2703d90 WARN                 nvinfer gstnvinfer.cpp:635:gst_nvinfer_logger:<nvinfer-prim-6> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1161> [UID = 1]: Warning, OpenCV has been deprecated. Using NMS for clustering instead of cv::groupRectangles with topK = 20 and NMS Threshold = 0.5
WARNING: [TRT]: Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.

Q6: I wouldn’t expect this warning because i converted it on the TX2 with the trtexec shown above. Do you know why it is still warning me ?

  1. Please use below code to decode the ngc fpenet etlt model.
    $ docker run --runtime=nvidia -it --rm -v /home/morganh:/home/morganh nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 /bin/bash
    then,
    #wget --content-disposition ‘https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/fpenet/deployable_v1.0/files?redirect=true&path=model.etlt’ -O fpenet_model_v1.0.etlt

# python decode_etlt.py -m fpenet_model_v1.0.etlt -o fpenet_model_v1.0.onnx -k nvidia_tlt

import argparse
import struct
# import encoding
from nvidia_tao_tf1.encoding import encoding

def parse_command_line(args):
    '''Parse command line arguments.'''
    parser = argparse.ArgumentParser(description='ETLT Decode Tool')
    parser.add_argument('-m',
                        '--model',
                        type=str,
                        required=True,
                        help='Path to the etlt file.')
    parser.add_argument('-o',
                        '--uff',
                        required=True,
                        type=str,
                        help='The path to the uff file.')
    parser.add_argument('-k',
                        '--key',
                        required=True,
                        type=str,
                        help='encryption key.')
    return parser.parse_args(args)


def decode(tmp_etlt_model, tmp_uff_model, key):
    with open(tmp_uff_model, 'wb') as temp_file, open(tmp_etlt_model, 'rb') as encoded_file:
        size = encoded_file.read(4)
        size = struct.unpack("<i", size)[0]
        input_node_name = encoded_file.read(size)
        encoding.decode(encoded_file, temp_file, key.encode())


def main(args=None):
    args = parse_command_line(args)
    decode(args.model, args.uff, args.key)
    print("Decode successfully.")


if __name__ == "__main__":
    main()

You will get the output and it is really onnx file. Some changes from your command. The input is input_face_images:0. Please run below to get the fps result under batch-size 1.

# /usr/src/tensorrt/bin/trtexec --onnx=fpenet_model_v1.0.onnx --maxShapes=input_face_images:0:1x1x80x80 --minShapes=input_face_images:0:1x1x80x80 --optShapes=input_face_images:0:1x1x80x80 --fp16 --saveEngine=fpenet_fp16.engine

The fps = bs * 1000 / <GPU Compute Time>

Please run trtexec against your onnx file and check the result.

  1. Yes, the file size.
  2. The ngc model is also onnx file. It is not uff file.
  3. Please use above code to decode the .etlt file to .onnx file.
  4. The Tensorrt engine is suggested to build and run with the same TensorRT version. So, it is suggested to build the engine where you are going to run the engine.
  5. The same as above item 5.
1 Like

I run

inside the docker image you sent:

# python3 decode_etlt.py -m ./faciallandmarks.etlt  -o fpenet_model_v1.0.onnx -k nvidia_tlt
Traceback (most recent call last):
  File "decode_etlt.py", line 3, in <module>
    import encoding
ModuleNotFoundError: No module named 'encoding'

so i tried

# pip install encoding
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
ERROR: Could not find a version that satisfies the requirement encoding (from versions: none)
ERROR: No matching distribution found for encoding

is encoding.py perhaps inside your /home/morganh ?

Please use /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/encoding/encoding.py.

Thank you , that let me decode .etlt to onnx

when i try to convert my newly-trained onnx to an engine file using the suggested command it fails:

# /usr/src/tensorrt/bin/trtexec --onnx=model.onnx --maxShapes=input_face_images:0:1x1x80x80 --minShapes=input_face_images:0:1x1x80x80 --optShapes=input_face_images:0:1x1x80x80 --fp16 --saveEngine=model.onxx_b1_gpu0_fp16.engine

....
....
[10/17/2023-01:15:30] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[10/17/2023-01:15:30] [W] [TRT] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[10/17/2023-01:15:30] [W] [TRT] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[10/17/2023-01:15:30] [I] Finish parsing network model
[10/17/2023-01:15:30] [E] Cannot find input tensor with name "input_face_images:0" in the network inputs! Please make sure the input tensor names are correct.
[10/17/2023-01:15:30] [E] Network And Config setup failed
[10/17/2023-01:15:30] [E] Building engine failed
[10/17/2023-01:15:30] [E] Failed to create engine from model.
[10/17/2023-01:15:30] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201]

You can change check the input of your newly-trained onnx model.
The input node may be different from the ngc model.
Yours may be input_face_images

 /usr/src/tensorrt/bin/trtexec --onnx=model.onnx --maxShapes=input_face_images:1x1x80x80 --minShapes=input_face_images:1x1x80x80 --optShapes=input_face_images:1x1x80x80 --fp16 --saveEngine=./faciallandmarks.onnx_b1_gpu0_fp16.engine --workspace=2048

seems to work.
still working at low speed in inference…

the decoded deployable_v1.0 model etlt to onnx file turns out to be:
-rw-r--r-- 1 root root 2367559 Oct 16 09:45 fpenet_model_v1.0.onnx

and the retrained onnx is:
-rw-r--r-- 1 root root 2350995 Oct 10 23:28 model.onnx

so it is a tiny bit smaller.

Can you share the full trtexec log of both onnx file?

I’ve attached the log files for both onnx files, but here is the performance summary:

trtexec from the unencode etlt onnx file → engine:

[10/17/2023-05:12:10] [I] === Performance summary ===
[10/17/2023-05:12:10] [I] Throughput: 267.627 qps
[10/17/2023-05:12:10] [I] Latency: min = 3.60352 ms, max = 3.91394 ms, mean = 3.7281 ms, median = 3.72467 ms, percentile(99%) = 3.85126 ms
[10/17/2023-05:12:10] [I] End-to-End Host Latency: min = 3.60742 ms, max = 3.92151 ms, mean = 3.73592 ms, median = 3.73309 ms, percentile(99%) = 3.85907 ms
[10/17/2023-05:12:10] [I] Enqueue Time: min = 1.47156 ms, max = 5.27368 ms, mean = 2.06999 ms, median = 1.85895 ms, percentile(99%) = 4.88745 ms
[10/17/2023-05:12:10] [I] H2D Latency: min = 0.00268555 ms, max = 0.00439453 ms, mean = 0.00308575 ms, median = 0.00305176 ms, percentile(99%) = 0.00390625 ms
[10/17/2023-05:12:10] [I] GPU Compute Time: min = 3.59766 ms, max = 3.90759 ms, mean = 3.72125 ms, median = 3.71783 ms, percentile(99%) = 3.84412 ms
[10/17/2023-05:12:10] [I] D2H Latency: min = 0.00195312 ms, max = 0.00463867 ms, mean = 0.00376351 ms, median = 0.00390625 ms, percentile(99%) = 0.0045166 ms
[10/17/2023-05:12:10] [I] Total Host Walltime: 3.00418 s
[10/17/2023-05:12:10] [I] Total GPU Compute Time: 2.99188 s
[10/17/2023-05:12:10] [I] Explanations of the performance metrics are printed in the verbose logs.

trtexec from the retrained onnx file → engine:

[10/17/2023-04:50:15] [I] === Performance summary ===
[10/17/2023-04:50:15] [I] Throughput: 240.164 qps
[10/17/2023-04:50:15] [I] Latency: min = 4.05151 ms, max = 4.36108 ms, mean = 4.15549 ms, median = 4.15063 ms, percentile(99%) = 4.31812 ms
[10/17/2023-04:50:15] [I] End-to-End Host Latency: min = 4.05957 ms, max = 4.36804 ms, mean = 4.1631 ms, median = 4.15845 ms, percentile(99%) = 4.32611 ms
[10/17/2023-04:50:15] [I] Enqueue Time: min = 1.52905 ms, max = 5.32227 ms, mean = 1.96418 ms, median = 1.89978 ms, percentile(99%) = 3.84082 ms
[10/17/2023-04:50:15] [I] H2D Latency: min = 0.00341797 ms, max = 0.00463867 ms, mean = 0.00372779 ms, median = 0.00366211 ms, percentile(99%) = 0.0043335 ms
[10/17/2023-04:50:15] [I] GPU Compute Time: min = 3.93848 ms, max = 4.24658 ms, mean = 4.04162 ms, median = 4.03735 ms, percentile(99%) = 4.20575 ms
[10/17/2023-04:50:15] [I] D2H Latency: min = 0.107422 ms, max = 0.119507 ms, mean = 0.110135 ms, median = 0.110107 ms, percentile(99%) = 0.113892 ms
[10/17/2023-04:50:15] [I] Total Host Walltime: 3.00628 s
[10/17/2023-04:50:15] [I] Total GPU Compute Time: 2.91805 s
[10/17/2023-04:50:15] [I] Explanations of the performance metrics are printed in the verbose logs.

not sure where to get the
fps = bs * 1000 / <GPU Compute Time>
bs from ?

diff-extracted-from-etlt.log (31.4 KB)
diff-retrained-model.log (24.3 KB)

I have also just tried both these engines in the nvinfer deepstream pipeline and the
trtexec from the unencode etlt onnx file -> engine is still very responsive

and

trtexec from the retrained onnx file -> engine is running at a lower fps (or frames are being dropped)

You can find something like[10/17/2023-04:50:15] [I] GPU Compute Time: in the log. Then select the mean one as GPU Compute Time.

The bs is the batch-size.

From the log, the fps of your onnx file is about 90% of ngc model.

If possible, please share your retrained-model.onnx as well.

so for the retrained model :
fps = 1 x 1000 / 2.91805 = 342

and for the etlt extracted one:
fps = 1 x 1000 / 2.99188 = 334

neither of which is not even close to the 15 fps that i’m passing in into the deepstream pipeline… so that shouldn’t cause any dropped frames.

please see the 2 videos for what i am seeing in a deepstream pipeline

retrained engine in a deepstream pipeline (notice missing frames) https://www.youtube.com/watch?v=M0sG0AG7z74

etlt extracted engine in a deepstream pipeline: https://www.youtube.com/watch?v=rQ68ASAqcT8

what does

From the log, the fps of your onnx file is about 90% of ngc model.

mean ? is that good or bad ?

From the log you shared,
[10/17/2023-05:12:10] [I] GPU Compute Time: min = 3.59766 ms, max = 3.90759 ms, mean = 3.72125 ms

=> 1*1000/3.72125 = 269 fps (ngc onnx)

[10/17/2023-04:50:15] [I] GPU Compute Time: min = 3.93848 ms, max = 4.24658 ms, mean = 4.04162 ms
=> 1*1000/4.04162 = 248 fps (your onnx)

So, the engine of your onnx file is about 8%~10% slower than the ngc one.

Please run official fpenet with deepstream_tao_apps https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/master/apps/tao_others/deepstream-faciallandmark-app . The fps result is expected to match above result.

I think the deepstream pipeline you shared is a custom one.