Issue with Deepstream Inference of custom 3D action recognition model

• Hardware Platform : GPU (GTX 1050 Ti)
• CUDA Version: 11.4 update 3
• CuDNN Version: 8.2.1
• DeepStream Version: 6.0.1
• TensorRT Version: 8.2.1
• Deepstream Graph Composer : 0.0.1
• NVIDIA GPU Driver Version (valid for GPU only): 470.82.01
• Issue Type: Question/issue
• How to reproduce the issue ? : Config files are provided below

Hi @AastaLLL
I am facing problem while using custom 3D action recognition model- 3D resnet-18.

As mentioned in the beginning of this issue thread Using Custom action recognition Model in Deepstream 3D action recognition, when i converted onnx model to TRT using trtexec i get the below output:

Input: 1x3x16x112x112
Output: 1x1039
&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --onnx=resnet-18-kinetcis-moments.onnx --saveEngine=3d_action_recognition_fp32.engine
[05/04/2022-11:49:43] [I] === Model Options ===
[05/04/2022-11:49:43] [I] Format: ONNX
[05/04/2022-11:49:43] [I] Model: resnet-18-kinetcis-moments.onnx
[05/04/2022-11:49:43] [I] Output:
[05/04/2022-11:49:43] [I] === Build Options ===
[05/04/2022-11:49:43] [I] Max batch: explicit batch
[05/04/2022-11:49:43] [I] Workspace: 16 MiB
[05/04/2022-11:49:43] [I] minTiming: 1
[05/04/2022-11:49:43] [I] avgTiming: 8
[05/04/2022-11:49:43] [I] Precision: FP32
[05/04/2022-11:49:43] [I] Calibration: 
[05/04/2022-11:49:43] [I] Refit: Disabled
[05/04/2022-11:49:43] [I] Sparsity: Disabled
[05/04/2022-11:49:43] [I] Safe mode: Disabled
[05/04/2022-11:49:43] [I] DirectIO mode: Disabled
[05/04/2022-11:49:43] [I] Restricted mode: Disabled
[05/04/2022-11:49:43] [I] Save engine: 3d_action_recognition_fp32.engine
[05/04/2022-11:49:43] [I] Load engine: 
[05/04/2022-11:49:43] [I] Profiling verbosity: 0
[05/04/2022-11:49:43] [I] Tactic sources: Using default tactic sources
[05/04/2022-11:49:43] [I] timingCacheMode: local
[05/04/2022-11:49:43] [I] timingCacheFile: 
[05/04/2022-11:49:43] [I] Input(s)s format: fp32:CHW
[05/04/2022-11:49:43] [I] Output(s)s format: fp32:CHW
[05/04/2022-11:49:43] [I] Input build shapes: model
[05/04/2022-11:49:43] [I] Input calibration shapes: model
[05/04/2022-11:49:43] [I] === System Options ===
[05/04/2022-11:49:43] [I] Device: 0
[05/04/2022-11:49:43] [I] DLACore: 
[05/04/2022-11:49:43] [I] Plugins:
[05/04/2022-11:49:43] [I] === Inference Options ===
[05/04/2022-11:49:43] [I] Batch: Explicit
[05/04/2022-11:49:43] [I] Input inference shapes: model
[05/04/2022-11:49:43] [I] Iterations: 10
[05/04/2022-11:49:43] [I] Duration: 3s (+ 200ms warm up)
[05/04/2022-11:49:43] [I] Sleep time: 0ms
[05/04/2022-11:49:43] [I] Idle time: 0ms
[05/04/2022-11:49:43] [I] Streams: 1
[05/04/2022-11:49:43] [I] ExposeDMA: Disabled
[05/04/2022-11:49:43] [I] Data transfers: Enabled
[05/04/2022-11:49:43] [I] Spin-wait: Disabled
[05/04/2022-11:49:43] [I] Multithreading: Disabled
[05/04/2022-11:49:43] [I] CUDA Graph: Disabled
[05/04/2022-11:49:43] [I] Separate profiling: Disabled
[05/04/2022-11:49:43] [I] Time Deserialize: Disabled
[05/04/2022-11:49:43] [I] Time Refit: Disabled
[05/04/2022-11:49:43] [I] Skip inference: Disabled
[05/04/2022-11:49:43] [I] Inputs:
[05/04/2022-11:49:43] [I] === Reporting Options ===
[05/04/2022-11:49:43] [I] Verbose: Disabled
[05/04/2022-11:49:43] [I] Averages: 10 inferences
[05/04/2022-11:49:43] [I] Percentile: 99
[05/04/2022-11:49:43] [I] Dump refittable layers:Disabled
[05/04/2022-11:49:43] [I] Dump output: Disabled
[05/04/2022-11:49:43] [I] Profile: Disabled
[05/04/2022-11:49:43] [I] Export timing to JSON file: 
[05/04/2022-11:49:43] [I] Export output to JSON file: 
[05/04/2022-11:49:43] [I] Export profile to JSON file: 
[05/04/2022-11:49:43] [I] 
[05/04/2022-11:49:43] [I] === Device Information ===
[05/04/2022-11:49:43] [I] Selected Device: NVIDIA GeForce GTX 1050 Ti
[05/04/2022-11:49:43] [I] Compute Capability: 6.1
[05/04/2022-11:49:43] [I] SMs: 6
[05/04/2022-11:49:43] [I] Compute Clock Rate: 1.62 GHz
[05/04/2022-11:49:43] [I] Device Global Memory: 4040 MiB
[05/04/2022-11:49:43] [I] Shared Memory per SM: 96 KiB
[05/04/2022-11:49:43] [I] Memory Bus Width: 128 bits (ECC disabled)
[05/04/2022-11:49:43] [I] Memory Clock Rate: 3.504 GHz
[05/04/2022-11:49:43] [I] 
[05/04/2022-11:49:43] [I] TensorRT version: 8.2.1
[05/04/2022-11:49:43] [I] [TRT] [MemUsageChange] Init CUDA: CPU +177, GPU +0, now: CPU 189, GPU 816 (MiB)
[05/04/2022-11:49:44] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 189 MiB, GPU 816 MiB
[05/04/2022-11:49:44] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 252 MiB, GPU 816 MiB
[05/04/2022-11:49:44] [I] Start parsing network model
[05/04/2022-11:49:44] [I] [TRT] ----------------------------------------------------------------
[05/04/2022-11:49:44] [I] [TRT] Input filename:   resnet-18-kinetcis-moments.onnx
[05/04/2022-11:49:44] [I] [TRT] ONNX IR version:  0.0.6
[05/04/2022-11:49:44] [I] [TRT] Opset version:    12
[05/04/2022-11:49:44] [I] [TRT] Producer name:    pytorch
[05/04/2022-11:49:44] [I] [TRT] Producer version: 1.8
[05/04/2022-11:49:44] [I] [TRT] Domain:           
[05/04/2022-11:49:44] [I] [TRT] Model version:    0
[05/04/2022-11:49:44] [I] [TRT] Doc string:       
[05/04/2022-11:49:44] [I] [TRT] ----------------------------------------------------------------
[05/04/2022-11:49:44] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/04/2022-11:49:44] [I] Finish parsing network model
[05/04/2022-11:49:44] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +249, GPU +102, now: CPU 630, GPU 918 (MiB)
[05/04/2022-11:49:45] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +219, GPU +82, now: CPU 849, GPU 1000 (MiB)
[05/04/2022-11:49:45] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/04/2022-11:49:53] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[05/04/2022-11:49:55] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[05/04/2022-11:49:55] [I] [TRT] Total Host Persistent Memory: 992
[05/04/2022-11:49:55] [I] [TRT] Total Device Persistent Memory: 0
[05/04/2022-11:49:55] [I] [TRT] Total Scratch Memory: 15221760
[05/04/2022-11:49:55] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2 MiB, GPU 129 MiB
[05/04/2022-11:49:55] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.810701ms to assign 4 blocks to 40 nodes requiring 31278080 bytes.
[05/04/2022-11:49:55] [I] [TRT] Total Activation Memory: 31278080
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1035, GPU 1231 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1035, GPU 1239 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +129, now: CPU 0, GPU 129 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1163, GPU 1065 (MiB)
[05/04/2022-11:49:55] [I] [TRT] Loaded engine size: 128 MiB
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1163, GPU 1205 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1163, GPU 1213 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +128, now: CPU 0, GPU 128 (MiB)
[05/04/2022-11:49:55] [I] Engine built in 11.9132 sec.
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 843, GPU 1205 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 843, GPU 1213 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +30, now: CPU 0, GPU 158 (MiB)
[05/04/2022-11:49:55] [I] Using random values for input 0
[05/04/2022-11:49:55] [I] Created input binding for 0 with dimensions 1x3x16x112x112
[05/04/2022-11:49:55] [I] Using random values for output 198
[05/04/2022-11:49:55] [I] Created output binding for 198 with dimensions 1x1039
[05/04/2022-11:49:55] [I] Starting inference
[05/04/2022-11:49:58] [I] Warmup completed 8 queries over 200 ms
[05/04/2022-11:49:58] [I] Timing trace has 132 queries over 3.06989 s
[05/04/2022-11:49:58] [I] 
[05/04/2022-11:49:58] [I] === Trace details ===
[05/04/2022-11:49:58] [I] Trace averages of 10 runs:
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.9716 ms - Host latency: 23.3739 ms (end to end 46.3644 ms, enqueue 3.48358 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 24.5993 ms - Host latency: 25.0815 ms (end to end 47.3465 ms, enqueue 3.86793 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.9112 ms - Host latency: 23.317 ms (end to end 45.7564 ms, enqueue 3.77248 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.8363 ms - Host latency: 23.2428 ms (end to end 45.3281 ms, enqueue 3.84795 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 23.6491 ms - Host latency: 24.0537 ms (end to end 46.9746 ms, enqueue 3.49109 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.9503 ms - Host latency: 23.3536 ms (end to end 45.4984 ms, enqueue 3.59299 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.805 ms - Host latency: 23.2089 ms (end to end 45.3626 ms, enqueue 3.51901 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.6869 ms - Host latency: 23.0948 ms (end to end 45.0141 ms, enqueue 3.885 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.9644 ms - Host latency: 23.3701 ms (end to end 45.6225 ms, enqueue 3.41711 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.6783 ms - Host latency: 23.0807 ms (end to end 45.0477 ms, enqueue 3.46648 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.6976 ms - Host latency: 23.107 ms (end to end 45.038 ms, enqueue 3.55405 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.912 ms - Host latency: 23.3164 ms (end to end 45.4568 ms, enqueue 3.59172 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.7565 ms - Host latency: 23.1657 ms (end to end 45.1867 ms, enqueue 3.67351 ms)
[05/04/2022-11:49:58] [I] 
[05/04/2022-11:49:58] [I] === Performance summary ===
[05/04/2022-11:49:58] [I] Throughput: 42.9982 qps
[05/04/2022-11:49:58] [I] Latency: min = 22.4573 ms, max = 26.7571 ms, mean = 23.4397 ms, median = 23.1427 ms, percentile(99%) = 26.7419 ms
[05/04/2022-11:49:58] [I] End-to-End Host Latency: min = 40.9007 ms, max = 52.6338 ms, mean = 45.6848 ms, median = 45.1901 ms, percentile(99%) = 50.8307 ms
[05/04/2022-11:49:58] [I] Enqueue Time: min = 1.35107 ms, max = 5.44977 ms, mean = 3.63209 ms, median = 3.55768 ms, percentile(99%) = 5.09448 ms
[05/04/2022-11:49:58] [I] H2D Latency: min = 0.374634 ms, max = 0.631592 ms, mean = 0.407851 ms, median = 0.401947 ms, percentile(99%) = 0.626831 ms
[05/04/2022-11:49:58] [I] GPU Compute Time: min = 22.0508 ms, max = 26.3568 ms, mean = 23.0283 ms, median = 22.7378 ms, percentile(99%) = 26.112 ms
[05/04/2022-11:49:58] [I] D2H Latency: min = 0.00268555 ms, max = 0.0212402 ms, mean = 0.00348317 ms, median = 0.00317383 ms, percentile(99%) = 0.00463867 ms
[05/04/2022-11:49:58] [I] Total Host Walltime: 3.06989 s
[05/04/2022-11:49:58] [I] Total GPU Compute Time: 3.03974 s
[05/04/2022-11:49:58] [I] Explanations of the performance metrics are printed in the verbose logs.
[05/04/2022-11:49:58] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --onnx=resnet-18-kinetcis-moments.onnx --saveEngine=3d_action_recognition_fp32.engine

As you can see i am getting input and output dimensions similar to NGC’s 3D action recognition model.

Now in deepstream 6.0 action recognition app i have modified the config_preprocess_3d_action_recognition.txt , config_infer_secondary_action_recognition.txt and deepstream_custom_action_recognition_config.txt respectively as below.
1. config_preprocess_3d_action_recognition.txt

[property]
enable=1
target-unique-ids=1

# network-input-shape: batch, channel, sequence, height, width
# 3D sequence of 16 images
network-input-shape= 1;3;16;112;112

# 0=RGB, 1=BGR, 2=GRAY
network-color-format=0
# 0=NCHW, 1=NHWC, 2=CUSTOM
network-input-order=2
# 0=FP32, 1=UINT8, 2=INT8, 3=UINT32, 4=INT32, 5=FP16
tensor-data-type=0
tensor-name=0

processing-width=112
processing-height=112

# 0=NVBUF_MEM_DEFAULT 1=NVBUF_MEM_CUDA_PINNED 2=NVBUF_MEM_CUDA_DEVICE
# 3=NVBUF_MEM_CUDA_UNIFIED  4=NVBUF_MEM_SURFACE_ARRAY(Jetson)
scaling-pool-memory-type=0

# 0=NvBufSurfTransformCompute_Default 1=NvBufSurfTransformCompute_GPU
# 2=NvBufSurfTransformCompute_VIC(Jetson)
scaling-pool-compute-hw=0

# Scaling Interpolation method
# 0=NvBufSurfTransformInter_Nearest 1=NvBufSurfTransformInter_Bilinear 2=NvBufSurfTransformInter_Algo1
# 3=NvBufSurfTransformInter_Algo2 4=NvBufSurfTransformInter_Algo3 5=NvBufSurfTransformInter_Algo4
# 6=NvBufSurfTransformInter_Default
scaling-filter=0

# model input tensor pool size
tensor-buf-pool-size=8

custom-lib-path=/opt/nvidia/deepstream/deepstream/lib/libnvds_custom_sequence_preprocess.so
#custom-lib-path=./custom_sequence_preprocess/libnvds_custom_sequence_preprocess.so
custom-tensor-preparation-function=CustomSequenceTensorPreparation

# 3D conv custom params
[user-configs]
#channel-scale-factors=0.003921569;0.003921569;0.003921569
channel-scale-factors=0.007843137;0.007843137;0.007843137
channel-mean-offsets=110.79;103.3;96.26
stride=1
subsample=0

[group-0]
src-ids=0
process-on-roi=1
roi-params-src-0=0;0;1920;1080

2. config_infer_secondary_action_recognition.txt

[property]
gpu-id=0

model-engine-file=/home/rajat/3d_action_recognition/3d_resnet_model_pytorch/Activity-Recognition-TensorRT/3d_action_recognition_fp32.engine
#onnx-file=/home/rajat/3d_action_recognition/3d_resnet_model_pytorch/Activity-Recognition-TensorRT/resnet-18-kinetcis-moments.onnx

labelfile-path=action_labels.txt
batch-size=1
process-mode=1

# requries preprocess metadata input
input-tensor-from-meta=1

## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
network-type=1

gie-unique-id=5

# Let application to parse the inference tensor output
output-tensor-meta=1
tensor-meta-pool-size=8

3. deepstream_custom_action_recognition_config.txt

[action-recognition]

# stream/file source list
uri-list=file:///home/rajat/videoplayback.mp4
# eglglessink settings
display-sync=0


# <preprocess-config> is the config file path for nvdspreprocess plugin
# <infer-config> is the config file path for nvinfer plugin

# Enable 3D preprocess and inference
preprocess-config=/home/rajat/config_preprocess_3d_action_recognition.txt
infer-config=/home/rajat/config_infer_secondary_action_recognition.txt


# nvstreammux settings
muxer-height=1080
muxer-width=1920

# nvstreammux batched push timeout in usec
muxer-batch-timeout=40000


# nvmultistreamtiler settings
tiler-height=1080
tiler-width=1920

# Log debug level. 0: disabled. 1: debug. 2: verbose.
debug=2

# Enable fps print on screen. 0: disable. 1: enable
enable-fps=1

Now when i run the deepstream the model is loaded successfully but does not give any output tensors or labels, however the pipeline runs showing only the FPS in the output window(no predictions labels).

Below is the output when i run the deepstream app.

num-sources = 1
Now playing: file:///home/rajat/videoplayback.mp4,
0:00:00.135077369 12648 0x564672438060 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 5]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1914> [UID = 5]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: FP16 not supported by platform. Using FP32 mode.
0:00:02.478384792 12648 0x564672438060 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 5]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1946> [UID = 5]: serialize cuda engine to file: /home/rajat/sandlogic/internship_project/3d_action_recognition/3d_resnet_model_pytorch/Activity-Recognition-TensorRT/resnet-18-kinetcis-moments.onnx_b1_gpu0_fp32.engine successfully
INFO: [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT 0               3x16x112x112    
1   OUTPUT kFLOAT 198             1039            

0:00:02.485296433 12648 0x564672438060 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-nvinference-engine> [UID 5]: Load new model:/home/rajat/sandlogic/internship_project/ad_insertion_use_case_deepstream/config_infer_secondary_action_recognition.txt sucessfully
sequence_image_process.cpp:696, [DEBUG: CUSTOM_LIB] Initializing Custom sequence preprocessing lib
sequence_image_process.cpp:456, [DEBUG: CUSTOM_LIB] considered as 3d sequence input since network_input_shape size is 5
sequence_image_process.cpp:496, [INFO: CUSTOM_LIB] 3D custom sequence network info(NCSHW), [N: 1, C: 3, S: 16, H: 112, W:112]
sequence_image_process.cpp:524, [INFO: CUSTOM_LIB] Sequence preprocess buffer manager initialized with stride: 1, subsample: 0
sequence_image_process.cpp:526, [INFO: CUSTOM_LIB] SequenceImagePreprocess initialized successfully
Using user provided processing height = 112 and processing width = 112
Decodebin child added: source
Decodebin child added: decodebin0
Running...
Decodebin child added: qtdemux0
Decodebin child added: multiqueue0
Decodebin child added: h264parse0
Decodebin child added: capsfilter0
Decodebin child added: nvv4l2decoder0
In cb_newpad
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:176, [DEBUG: CUSTOM_LIB] Create New accumulating ROI block, batch size: 1, total block size: 1
sequence_image_process.cpp:195, [DEBUG: CUSTOM_LIB] Adding new ROI of source: 0 mapped to dst blockId: 0, offset:0, total ROI numbers: 1
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 0
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 0
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 1
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 1
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 2
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 2
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 3
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 3
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 4
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 4
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 5
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 5
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 6
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 6
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 7

NOTE: I have also tried adding nvdsinfer.patch mentioned in Using Custom action recognition Model in Deepstream 3D action recognition but i am still not getting output.

The major difference i noticed between the sample 3d-action-recognition and this custom model is while inference as shown below in the debug logs.
i am not getting output tensors in custom model inference.

Left: Custom model inference Right: NGC model

i will attach label file used for the model
action_labels.txt (13.5 KB)

Size of onnx model is high for me to share in this issue please let me know if its necessary.

Please help me in resolving this and letting me know if i skipped any config or any further changes i need to do for getting this up running correctly.

Thanks & Regards,
Rajat M R

Hi,

Thanks for filing a new topic.
Please also share the ONNX model with us.

Thanks.

Hi,
Here is the link to 3D-ResNet-18 ONNX model: resnet-18-kinetcis-moments.onnx - Google Drive

Thanks & Regards
Rajat M R

Thanks.

Will update here once we got more information on this.

Hi,

Thanks for your patience.

We can get the output of your model with the deepstream-3d-action-recognition example.
Please noted that the label is not shown in the console but on the top of the display window.

Please noted that you might need to update the mean/scale/RGB parameter in the config for your model.
Below is the modification we made for your reference:

diff --git a/config_infer_primary_3d_action.txt b/config_infer_primary_3d_action.txt
index acaf33a..d0f8163 100644
--- a/config_infer_primary_3d_action.txt
+++ b/config_infer_primary_3d_action.txt
@@ -60,12 +60,13 @@
 [property]
 gpu-id=0

-tlt-encoded-model=../files/resnet18_3d_rgb_hmdb5_32.etlt
-tlt-model-key=nvidia_tao
-model-engine-file=../files/resnet18_3d_rgb_hmdb5_32.etlt_b4_gpu0_fp16.engine
+#tlt-encoded-model=../files/resnet18_3d_rgb_hmdb5_32.etlt
+#tlt-model-key=nvidia_tao
+#model-engine-file=../files/resnet18_3d_rgb_hmdb5_32.etlt_b4_gpu0_fp16.engine
+model-engine-file=resnet-18-kinetcis-moments.onnx_b1_gpu0_fp16.engine

 labelfile-path=labels.txt
-batch-size=4
+batch-size=1
 process-mode=1

 # requries preprocess metadata input
diff --git a/config_preprocess_3d_custom.txt b/config_preprocess_3d_custom.txt
index 6f17e71..7789c3d 100644
--- a/config_preprocess_3d_custom.txt
+++ b/config_preprocess_3d_custom.txt
@@ -32,7 +32,7 @@ target-unique-ids=1
 #network-input-shape= 4;3;64;224;224

 # 3D sequence of 32 images
-network-input-shape= 4;3;32;224;224
+network-input-shape= 1;3;16;112;112

     # 0=RGB, 1=BGR, 2=GRAY
 network-color-format=0
@@ -40,10 +40,10 @@ network-color-format=0
 network-input-order=2
     # 0=FP32, 1=UINT8, 2=INT8, 3=UINT32, 4=INT32, 5=FP16
 tensor-data-type=0
-tensor-name=input_rgb
+tensor-name=0

-processing-width=224
-processing-height=224
+processing-width=112
+processing-height=112

     # 0=NVBUF_MEM_DEFAULT 1=NVBUF_MEM_CUDA_PINNED 2=NVBUF_MEM_CUDA_DEVICE
     # 3=NVBUF_MEM_CUDA_UNIFIED  4=NVBUF_MEM_SURFACE_ARRAY(Jetson)
@@ -69,15 +69,11 @@ custom-tensor-preparation-function=CustomSequenceTensorPreparation
 # 3D conv custom params
 [user-configs]
 channel-scale-factors=0.007843137;0.007843137;0.007843137
-channel-mean-offsets=127.5;127.5;127.5
+channel-mean-offsets=110.79;103.3;96.26
 stride=1
 subsample=0

 [group-0]
-src-ids=0;1;2;3
+src-ids=0
 process-on-roi=1
 roi-params-src-0=0;0;1280;720
-roi-params-src-1=0;0;1280;720
-roi-params-src-2=0;0;1280;720
-roi-params-src-3=0;0;1280;720
-
$ /usr/src/tensorrt/bin/trtexec --onnx=../resnet-18-kinetcis-moments.onnx --saveEngine=resnet-18-kinetcis-moments.onnx_b1_gpu0_fp16.engine --fp16
$ ./deepstream-3d-action-recognition -c deepstream_action_recognition_config.txt

Thanks.

Hi,
Thank you for the response, i have made these exact changes in config files but yet the output label is not showing up in the display window.
i am only able to view the FPS on left corner.

Here is a reference image of output.

With example model we can see the labels as below.

Though the pipeline is running i am still not getting the labels. Is there anything i am missing in config to make it work?
Did the labels appeared while testing at your end?
please let me know

Thanks
Rajat M R

Hi,

We can get the label with your model and the below video:

/opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_ride_bike.mov

Please noted that we have replaced the context of labels.txt with the labels you shared above.
Would you mind double-checking if the labels file in your environment is also updated?

It is possible that the label doesn’t show up since it cannot find the corresponding name.
Here is our result with the sample_ride_bike.mov video for your reference:

Thanks.

Hi,
Thank you for checking out the model inference.
I get the inference output as you have mentioned only when i run the deepstream config file inside /opt/nvidia/deepstream/deepstream-6.0/sources/apps/sample_apps/deepstream-3d-action-recognition folder as deepstream-app was compiled there.
I will update if any issues arises thanks for verifying the model.

Thanks & Regards
Rajat M R