Using Custom action recognition Model in Deepstream 3D action recognition

Hi,

Just in case you didn’t notice this.

You can update the preprocessing parameter based on the normalization you used.
For example, according to here, we set the parameter into the following:

config_preprocess_3d_custom.txt

[user-configs]
channel-scale-factors=0.0167;0.0167;0.0167
channel-mean-offsets=124.0;117.0;104.0

Thanks.

Hi @AastaLLL

We actually tried with the default values as well as we calculated channel-scale-factors and channel-mean-offsets from mean and std deviation value of our original PyTorch pipeline’s normalization to compare the ground truths.

Both scenarios only generated incorrect outputs (continuous True/1 label) in DS pipeline.

Thanks,
Hemang

Thanks.

It seems that we still need your PyTorch pipeline to debug further.
We will wait for your update.

I’ll get back to you on this after discussing internally.

Thanks,
Hemang

Hi @AastaLLL

Does TRT supports torch.logical_or operation for conversion to engine?

PyTorch exporter’s ONNX support for the logical_or op is available. This operation is part of the AR Ensemble model. thus, would like to confirm the same.

Thanks,
Hemang

Hi,

Based on the below document, we do support the Or operation in TensorRT 8.2 (JetPack4.6.1) and 8.4 (JetPack 5.0DP):

TensorRT v8.2: onnx-tensorrt/operators.md at 8.2-GA · onnx/onnx-tensorrt · GitHub
TensorRT v8.4: onnx-tensorrt/operators.md at 8.4-EA · onnx/onnx-tensorrt · GitHub

Thanks

Hi @AastaLLL
I am facing a similar problem while using custom 3D action recognition model.

As mentioned in the beginning of this issue thread, when i converted onnx model to TRT using trtexec i get the below output:

&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --onnx=resnet-18-kinetcis-moments.onnx --saveEngine=3d_action_recognition_fp32.engine
[05/04/2022-11:49:43] [I] === Model Options ===
[05/04/2022-11:49:43] [I] Format: ONNX
[05/04/2022-11:49:43] [I] Model: resnet-18-kinetcis-moments.onnx
[05/04/2022-11:49:43] [I] Output:
[05/04/2022-11:49:43] [I] === Build Options ===
[05/04/2022-11:49:43] [I] Max batch: explicit batch
[05/04/2022-11:49:43] [I] Workspace: 16 MiB
[05/04/2022-11:49:43] [I] minTiming: 1
[05/04/2022-11:49:43] [I] avgTiming: 8
[05/04/2022-11:49:43] [I] Precision: FP32
[05/04/2022-11:49:43] [I] Calibration: 
[05/04/2022-11:49:43] [I] Refit: Disabled
[05/04/2022-11:49:43] [I] Sparsity: Disabled
[05/04/2022-11:49:43] [I] Safe mode: Disabled
[05/04/2022-11:49:43] [I] DirectIO mode: Disabled
[05/04/2022-11:49:43] [I] Restricted mode: Disabled
[05/04/2022-11:49:43] [I] Save engine: 3d_action_recognition_fp32.engine
[05/04/2022-11:49:43] [I] Load engine: 
[05/04/2022-11:49:43] [I] Profiling verbosity: 0
[05/04/2022-11:49:43] [I] Tactic sources: Using default tactic sources
[05/04/2022-11:49:43] [I] timingCacheMode: local
[05/04/2022-11:49:43] [I] timingCacheFile: 
[05/04/2022-11:49:43] [I] Input(s)s format: fp32:CHW
[05/04/2022-11:49:43] [I] Output(s)s format: fp32:CHW
[05/04/2022-11:49:43] [I] Input build shapes: model
[05/04/2022-11:49:43] [I] Input calibration shapes: model
[05/04/2022-11:49:43] [I] === System Options ===
[05/04/2022-11:49:43] [I] Device: 0
[05/04/2022-11:49:43] [I] DLACore: 
[05/04/2022-11:49:43] [I] Plugins:
[05/04/2022-11:49:43] [I] === Inference Options ===
[05/04/2022-11:49:43] [I] Batch: Explicit
[05/04/2022-11:49:43] [I] Input inference shapes: model
[05/04/2022-11:49:43] [I] Iterations: 10
[05/04/2022-11:49:43] [I] Duration: 3s (+ 200ms warm up)
[05/04/2022-11:49:43] [I] Sleep time: 0ms
[05/04/2022-11:49:43] [I] Idle time: 0ms
[05/04/2022-11:49:43] [I] Streams: 1
[05/04/2022-11:49:43] [I] ExposeDMA: Disabled
[05/04/2022-11:49:43] [I] Data transfers: Enabled
[05/04/2022-11:49:43] [I] Spin-wait: Disabled
[05/04/2022-11:49:43] [I] Multithreading: Disabled
[05/04/2022-11:49:43] [I] CUDA Graph: Disabled
[05/04/2022-11:49:43] [I] Separate profiling: Disabled
[05/04/2022-11:49:43] [I] Time Deserialize: Disabled
[05/04/2022-11:49:43] [I] Time Refit: Disabled
[05/04/2022-11:49:43] [I] Skip inference: Disabled
[05/04/2022-11:49:43] [I] Inputs:
[05/04/2022-11:49:43] [I] === Reporting Options ===
[05/04/2022-11:49:43] [I] Verbose: Disabled
[05/04/2022-11:49:43] [I] Averages: 10 inferences
[05/04/2022-11:49:43] [I] Percentile: 99
[05/04/2022-11:49:43] [I] Dump refittable layers:Disabled
[05/04/2022-11:49:43] [I] Dump output: Disabled
[05/04/2022-11:49:43] [I] Profile: Disabled
[05/04/2022-11:49:43] [I] Export timing to JSON file: 
[05/04/2022-11:49:43] [I] Export output to JSON file: 
[05/04/2022-11:49:43] [I] Export profile to JSON file: 
[05/04/2022-11:49:43] [I] 
[05/04/2022-11:49:43] [I] === Device Information ===
[05/04/2022-11:49:43] [I] Selected Device: NVIDIA GeForce GTX 1050 Ti
[05/04/2022-11:49:43] [I] Compute Capability: 6.1
[05/04/2022-11:49:43] [I] SMs: 6
[05/04/2022-11:49:43] [I] Compute Clock Rate: 1.62 GHz
[05/04/2022-11:49:43] [I] Device Global Memory: 4040 MiB
[05/04/2022-11:49:43] [I] Shared Memory per SM: 96 KiB
[05/04/2022-11:49:43] [I] Memory Bus Width: 128 bits (ECC disabled)
[05/04/2022-11:49:43] [I] Memory Clock Rate: 3.504 GHz
[05/04/2022-11:49:43] [I] 
[05/04/2022-11:49:43] [I] TensorRT version: 8.2.1
[05/04/2022-11:49:43] [I] [TRT] [MemUsageChange] Init CUDA: CPU +177, GPU +0, now: CPU 189, GPU 816 (MiB)
[05/04/2022-11:49:44] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 189 MiB, GPU 816 MiB
[05/04/2022-11:49:44] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 252 MiB, GPU 816 MiB
[05/04/2022-11:49:44] [I] Start parsing network model
[05/04/2022-11:49:44] [I] [TRT] ----------------------------------------------------------------
[05/04/2022-11:49:44] [I] [TRT] Input filename:   resnet-18-kinetcis-moments.onnx
[05/04/2022-11:49:44] [I] [TRT] ONNX IR version:  0.0.6
[05/04/2022-11:49:44] [I] [TRT] Opset version:    12
[05/04/2022-11:49:44] [I] [TRT] Producer name:    pytorch
[05/04/2022-11:49:44] [I] [TRT] Producer version: 1.8
[05/04/2022-11:49:44] [I] [TRT] Domain:           
[05/04/2022-11:49:44] [I] [TRT] Model version:    0
[05/04/2022-11:49:44] [I] [TRT] Doc string:       
[05/04/2022-11:49:44] [I] [TRT] ----------------------------------------------------------------
[05/04/2022-11:49:44] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/04/2022-11:49:44] [I] Finish parsing network model
[05/04/2022-11:49:44] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +249, GPU +102, now: CPU 630, GPU 918 (MiB)
[05/04/2022-11:49:45] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +219, GPU +82, now: CPU 849, GPU 1000 (MiB)
[05/04/2022-11:49:45] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/04/2022-11:49:53] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[05/04/2022-11:49:55] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[05/04/2022-11:49:55] [I] [TRT] Total Host Persistent Memory: 992
[05/04/2022-11:49:55] [I] [TRT] Total Device Persistent Memory: 0
[05/04/2022-11:49:55] [I] [TRT] Total Scratch Memory: 15221760
[05/04/2022-11:49:55] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 2 MiB, GPU 129 MiB
[05/04/2022-11:49:55] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 0.810701ms to assign 4 blocks to 40 nodes requiring 31278080 bytes.
[05/04/2022-11:49:55] [I] [TRT] Total Activation Memory: 31278080
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1035, GPU 1231 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1035, GPU 1239 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +129, now: CPU 0, GPU 129 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1163, GPU 1065 (MiB)
[05/04/2022-11:49:55] [I] [TRT] Loaded engine size: 128 MiB
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 1163, GPU 1205 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1163, GPU 1213 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +128, now: CPU 0, GPU 128 (MiB)
[05/04/2022-11:49:55] [I] Engine built in 11.9132 sec.
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 843, GPU 1205 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 843, GPU 1213 (MiB)
[05/04/2022-11:49:55] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +30, now: CPU 0, GPU 158 (MiB)
[05/04/2022-11:49:55] [I] Using random values for input 0
[05/04/2022-11:49:55] [I] Created input binding for 0 with dimensions 1x3x16x112x112
[05/04/2022-11:49:55] [I] Using random values for output 198
[05/04/2022-11:49:55] [I] Created output binding for 198 with dimensions 1x1039
[05/04/2022-11:49:55] [I] Starting inference
[05/04/2022-11:49:58] [I] Warmup completed 8 queries over 200 ms
[05/04/2022-11:49:58] [I] Timing trace has 132 queries over 3.06989 s
[05/04/2022-11:49:58] [I] 
[05/04/2022-11:49:58] [I] === Trace details ===
[05/04/2022-11:49:58] [I] Trace averages of 10 runs:
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.9716 ms - Host latency: 23.3739 ms (end to end 46.3644 ms, enqueue 3.48358 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 24.5993 ms - Host latency: 25.0815 ms (end to end 47.3465 ms, enqueue 3.86793 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.9112 ms - Host latency: 23.317 ms (end to end 45.7564 ms, enqueue 3.77248 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.8363 ms - Host latency: 23.2428 ms (end to end 45.3281 ms, enqueue 3.84795 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 23.6491 ms - Host latency: 24.0537 ms (end to end 46.9746 ms, enqueue 3.49109 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.9503 ms - Host latency: 23.3536 ms (end to end 45.4984 ms, enqueue 3.59299 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.805 ms - Host latency: 23.2089 ms (end to end 45.3626 ms, enqueue 3.51901 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.6869 ms - Host latency: 23.0948 ms (end to end 45.0141 ms, enqueue 3.885 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.9644 ms - Host latency: 23.3701 ms (end to end 45.6225 ms, enqueue 3.41711 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.6783 ms - Host latency: 23.0807 ms (end to end 45.0477 ms, enqueue 3.46648 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.6976 ms - Host latency: 23.107 ms (end to end 45.038 ms, enqueue 3.55405 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.912 ms - Host latency: 23.3164 ms (end to end 45.4568 ms, enqueue 3.59172 ms)
[05/04/2022-11:49:58] [I] Average on 10 runs - GPU latency: 22.7565 ms - Host latency: 23.1657 ms (end to end 45.1867 ms, enqueue 3.67351 ms)
[05/04/2022-11:49:58] [I] 
[05/04/2022-11:49:58] [I] === Performance summary ===
[05/04/2022-11:49:58] [I] Throughput: 42.9982 qps
[05/04/2022-11:49:58] [I] Latency: min = 22.4573 ms, max = 26.7571 ms, mean = 23.4397 ms, median = 23.1427 ms, percentile(99%) = 26.7419 ms
[05/04/2022-11:49:58] [I] End-to-End Host Latency: min = 40.9007 ms, max = 52.6338 ms, mean = 45.6848 ms, median = 45.1901 ms, percentile(99%) = 50.8307 ms
[05/04/2022-11:49:58] [I] Enqueue Time: min = 1.35107 ms, max = 5.44977 ms, mean = 3.63209 ms, median = 3.55768 ms, percentile(99%) = 5.09448 ms
[05/04/2022-11:49:58] [I] H2D Latency: min = 0.374634 ms, max = 0.631592 ms, mean = 0.407851 ms, median = 0.401947 ms, percentile(99%) = 0.626831 ms
[05/04/2022-11:49:58] [I] GPU Compute Time: min = 22.0508 ms, max = 26.3568 ms, mean = 23.0283 ms, median = 22.7378 ms, percentile(99%) = 26.112 ms
[05/04/2022-11:49:58] [I] D2H Latency: min = 0.00268555 ms, max = 0.0212402 ms, mean = 0.00348317 ms, median = 0.00317383 ms, percentile(99%) = 0.00463867 ms
[05/04/2022-11:49:58] [I] Total Host Walltime: 3.06989 s
[05/04/2022-11:49:58] [I] Total GPU Compute Time: 3.03974 s
[05/04/2022-11:49:58] [I] Explanations of the performance metrics are printed in the verbose logs.
[05/04/2022-11:49:58] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8201] # /usr/src/tensorrt/bin/trtexec --onnx=resnet-18-kinetcis-moments.onnx --saveEngine=3d_action_recognition_fp32.engine

As you can see i am getting input and output dimensions similar to NGC’s model.

Now in deepstream 6.0 action recognition app i have modified the config_preprocess_3d_action_recognition.txt , config_infer_secondary_action_recognition.txt and deepstream_custom_action_recognition_config.txt respectively as below.
1. config_preprocess_3d_action_recognition.txt

[property]
enable=1
target-unique-ids=1

# network-input-shape: batch, channel, sequence, height, width
# 3D sequence of 16 images
network-input-shape= 1;3;16;112;112

# 0=RGB, 1=BGR, 2=GRAY
network-color-format=0
# 0=NCHW, 1=NHWC, 2=CUSTOM
network-input-order=2
# 0=FP32, 1=UINT8, 2=INT8, 3=UINT32, 4=INT32, 5=FP16
tensor-data-type=0
tensor-name=0

processing-width=112
processing-height=112

# 0=NVBUF_MEM_DEFAULT 1=NVBUF_MEM_CUDA_PINNED 2=NVBUF_MEM_CUDA_DEVICE
# 3=NVBUF_MEM_CUDA_UNIFIED  4=NVBUF_MEM_SURFACE_ARRAY(Jetson)
scaling-pool-memory-type=0

# 0=NvBufSurfTransformCompute_Default 1=NvBufSurfTransformCompute_GPU
# 2=NvBufSurfTransformCompute_VIC(Jetson)
scaling-pool-compute-hw=0

# Scaling Interpolation method
# 0=NvBufSurfTransformInter_Nearest 1=NvBufSurfTransformInter_Bilinear 2=NvBufSurfTransformInter_Algo1
# 3=NvBufSurfTransformInter_Algo2 4=NvBufSurfTransformInter_Algo3 5=NvBufSurfTransformInter_Algo4
# 6=NvBufSurfTransformInter_Default
scaling-filter=0

# model input tensor pool size
tensor-buf-pool-size=8

custom-lib-path=/opt/nvidia/deepstream/deepstream/lib/libnvds_custom_sequence_preprocess.so
#custom-lib-path=./custom_sequence_preprocess/libnvds_custom_sequence_preprocess.so
custom-tensor-preparation-function=CustomSequenceTensorPreparation

# 3D conv custom params
[user-configs]
#channel-scale-factors=0.003921569;0.003921569;0.003921569
channel-scale-factors=0.007843137;0.007843137;0.007843137
channel-mean-offsets=110.79;103.3;96.26
stride=1
subsample=0

[group-0]
src-ids=0
process-on-roi=1
roi-params-src-0=0;0;1920;1080

2. config_infer_secondary_action_recognition.txt

[property]
gpu-id=0

model-engine-file=/home/rajat/3d_action_recognition/3d_resnet_model_pytorch/Activity-Recognition-TensorRT/3d_action_recognition_fp32.engine
#onnx-file=/home/rajat/3d_action_recognition/3d_resnet_model_pytorch/Activity-Recognition-TensorRT/resnet-18-kinetcis-moments.onnx

labelfile-path=action_labels.txt
batch-size=1
process-mode=1

# requries preprocess metadata input
input-tensor-from-meta=1

## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
network-type=1

gie-unique-id=5

# Let application to parse the inference tensor output
output-tensor-meta=1
tensor-meta-pool-size=8

3. deepstream_custom_action_recognition_config.txt

[action-recognition]

# stream/file source list
uri-list=file:///home/rajat/videoplayback.mp4
# eglglessink settings
display-sync=0


# <preprocess-config> is the config file path for nvdspreprocess plugin
# <infer-config> is the config file path for nvinfer plugin

# Enable 3D preprocess and inference
preprocess-config=/home/rajat/config_preprocess_3d_action_recognition.txt
infer-config=/home/rajat/config_infer_secondary_action_recognition.txt


# nvstreammux settings
muxer-height=1080
muxer-width=1920

# nvstreammux batched push timeout in usec
muxer-batch-timeout=40000


# nvmultistreamtiler settings
tiler-height=1080
tiler-width=1920

# Log debug level. 0: disabled. 1: debug. 2: verbose.
debug=2

# Enable fps print on screen. 0: disable. 1: enable
enable-fps=1

Now when i run the deepstream the model is loaded successfully but does not give any output tensors or labels, however the pipeline runs showing only the FPS in the output window(no predictions labels).

Below is the output when i run the deepstream app.

num-sources = 1
Now playing: file:///home/rajat/videoplayback.mp4,
0:00:00.135077369 12648 0x564672438060 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 5]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1914> [UID = 5]: Trying to create engine from model files
WARNING: [TRT]: onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
WARNING: FP16 not supported by platform. Using FP32 mode.
0:00:02.478384792 12648 0x564672438060 INFO                 nvinfer gstnvinfer.cpp:638:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 5]: Info from NvDsInferContextImpl::buildModel() <nvdsinfer_context_impl.cpp:1946> [UID = 5]: serialize cuda engine to file: /home/rajat/sandlogic/internship_project/3d_action_recognition/3d_resnet_model_pytorch/Activity-Recognition-TensorRT/resnet-18-kinetcis-moments.onnx_b1_gpu0_fp32.engine successfully
INFO: [Implicit Engine Info]: layers num: 2
0   INPUT  kFLOAT 0               3x16x112x112    
1   OUTPUT kFLOAT 198             1039            

0:00:02.485296433 12648 0x564672438060 INFO                 nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary-nvinference-engine> [UID 5]: Load new model:/home/rajat/sandlogic/internship_project/ad_insertion_use_case_deepstream/config_infer_secondary_action_recognition.txt sucessfully
sequence_image_process.cpp:696, [DEBUG: CUSTOM_LIB] Initializing Custom sequence preprocessing lib
sequence_image_process.cpp:456, [DEBUG: CUSTOM_LIB] considered as 3d sequence input since network_input_shape size is 5
sequence_image_process.cpp:496, [INFO: CUSTOM_LIB] 3D custom sequence network info(NCSHW), [N: 1, C: 3, S: 16, H: 112, W:112]
sequence_image_process.cpp:524, [INFO: CUSTOM_LIB] Sequence preprocess buffer manager initialized with stride: 1, subsample: 0
sequence_image_process.cpp:526, [INFO: CUSTOM_LIB] SequenceImagePreprocess initialized successfully
Using user provided processing height = 112 and processing width = 112
Decodebin child added: source
Decodebin child added: decodebin0
Running...
Decodebin child added: qtdemux0
Decodebin child added: multiqueue0
Decodebin child added: h264parse0
Decodebin child added: capsfilter0
Decodebin child added: nvv4l2decoder0
In cb_newpad
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:176, [DEBUG: CUSTOM_LIB] Create New accumulating ROI block, batch size: 1, total block size: 1
sequence_image_process.cpp:195, [DEBUG: CUSTOM_LIB] Adding new ROI of source: 0 mapped to dst blockId: 0, offset:0, total ROI numbers: 1
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 0
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 0
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 1
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 1
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 2
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 2
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 3
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 3
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 4
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 4
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 5
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 5
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 6
sequence_image_process.cpp:674, [DEBUG: CUSTOM_LIB] preparing sequence batching is not ready on frame: 6
sequence_image_process.cpp:685, [DEBUG: CUSTOM_LIB] CustomSequenceTensorPreparation processing in progress
sequence_image_process.cpp:610, [DEBUG: CUSTOM_LIB] preparing sequence TensorData in progress...
sequence_image_process.cpp:629, [DEBUG: CUSTOM_LIB] preparing sequence tensor data received 1 rois
sequence_image_process.cpp:653, [DEBUG: CUSTOM_LIB] Trying to collect ready batches on frame: 7

NOTE: I have also tried adding nvdsinfer.patch but i am still not getting output.

The major difference i noticed between the sample 3d-action-recognition and this custom model is while inference as shown below in the debug logs.
i am not getting output tensors in custom model inference.

Left: Custom model inference Right: NGC model

i will attach label file used for the model
action_labels.txt (13.5 KB)

Please help me in resolving this and letting me know any further changes i need to do for getting this up running correctly.

Thanks & Regards,
Rajat M R

HI, rajatmr619

Since this is a long topic already, would you mind filing a new one for your issue?
Thanks.

Hi @AastaLLL

I have tested the ONNX model with the actual input as well. I have compared the outputs for Pytorch pipeline and ONNX runtime pipeline. ONNX model’s execution provides output which are matching with Pytorch’s Ground truth. But, If i give the same ONNX file to DS pipeline, It only yields continuous False/0 as an output. Thus, We can atleast conclude that PyTorch to ONNX conversion is happening properly.

Just FYI!

Thanks,
Hemang

Thanks for updating the status.

Could we reproduce this issue with the random weight model?
Or it requires a true model for the True/1 output?

Thanks.

Hi @AastaLLL

With the random weight model, It continuously gives true/1 output in DS pipeline.

With the original model, It continuously gives false/0 output in DS pipeline.

I hope this helps.

Thanks!

Thanks.

Would you share the model for us to check?
It seems that we still need the true model to debug the root cause.

Thanks.

Hi @AastaLLL ,

We can not share the original model for debugging. Although the AR model with random weights is the same one which i have shared earlier. See if it helps as the model behavior is similar to the true model except it generates continuous true/1 output where as the true model generates continuous false/0 output but output doesn’t change for the both.

As of now, We have used the true onnx model in triton server to execute the AR use-case to move further.

Thanks,
Hemang

Hi,

Does the Triton inference server unblock your project?
If yes, we can focus on the issue from the Triton server first.

Thanks.

Hi @AastaLLL

I am not sure what you mean by this.

We are using ONNX backend to execute the model in triton server with python client. We haven’t tried with TRT backend. Triton server also doesn’t support 1D output shape(The issue which we faced earlier in DS pipeline) hence I am using the model with unsqueezed output with 2D shape.

Thanks.

Hi,

It seems that you can get the expected result with a modified model.
Is that correct?

Thanks.

Hi @AastaLLL

Yes, That’s correct. But with modified ONNX model and not with the converted TRT model.

The sole intent of this ticket is to get the expected results with TRT model which is not achieved yet. Hence, For time being, We are going ahead with ONNX backend.

Thanks

Hi,

There are two issues discussed on this topic.

  1. Squeezing axis issue. This is already fixed in the comment above.

  2. Accuray issue. We will need the real model to debug internally.
    Since this is not reproducible with a random weight model.

As a result, we will need a real model to move on.
If this is not an option for you, it’s good to use the ONNX backend instead?

Thanks.

That’s correct, We have decided to use ONNX backend for this use-case.

Thank you for the support @AastaLLL , Really appreciate!

Hemang

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.