Running deepstream-3d-action-recognition application

edit_or · July 30, 2025, 1:26am

Please provide complete information as applicable to your setup.

**• Hardware Platform (Jetson / GPU)**RTX4080
• DeepStream Version7.0
i have action_recognition_net trained with TAO library. Trained for Fight and Normal activities. Tested with about 10 test folders for each activity and achieved 100% accuracy. Those test folders are never used in training and validation.
When run the model using deepstream-3d-action-recognitionapp, detection is totally opposite. All those normal activities are detected as Fight.
May I know what to change in my config files?
deepstream_action_recognition_config.txt (3.8 KB)
config_preprocess_2d_custom.txt (2.9 KB)
config_infer_primary_2d_action.txt (2.6 KB)
What could be wrong in my running application?

fanzh · July 30, 2025, 6:56am

which actionrecognitionnet version did you train the model on? what tools did you used to test 10 test folders for each activity and achieved 100% accuracy?
did you modify the preprocessing parameters? if so, please make sure the preprocessing parameters are consistent with those used for training. please refer to this faq Debug Tips for DeepStream Accuracy Issue.
could you share a DeepStream running log? Thanks!

edit_or · July 30, 2025, 7:04am

I used 2D RGB pretrained model. resnet18_2d_rgb_hmdb5_32.tlt.
No modification to preprocessing parameters. All are as shown in the attached config files.
The following is the running log.

(base) root@user-Nuvo-10000-Series:/workspace/opt/nvidia/deepstream/deepstream/s
ources/apps/sample_apps/deepstream-3d-action-recognition# ./deepstream-3d-action-recognition -c deepstream_action_recognition_config.txt
num-sources = 12
Now playing: rtsp://admin:nextan6423@172.16.158.244:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.244:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.244:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.244:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.245:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.245:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.245:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.245:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.247:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.247:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.247:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.247:554/cam/realmonitor?channel=1&subtype=0,
0:00:03.278705967    31 0x635a3cb169f0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2095> [UID = 1]: deserialized trt engine from :/workspace/opt/nvidia/deepstream/deepstream-7.0/sources/apps/sample_apps/deepstream-3d-action-recognition/rgb_resnet18_2D_32sql.onnx_b12_gpu0_fp16.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:612 [FullDims Engine Info]: layers num: 2
0   INPUT  kFLOAT input_rgb       96x224x224      min: 1x96x224x224    opt: 12x96x224x224   Max: 12x96x224x224   
1   OUTPUT kFLOAT fc_pred         2               min: 0               opt: 0               Max: 0               

0:00:03.375988024    31 0x635a3cb169f0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2198> [UID = 1]: Use deserialized engine model: /workspace/opt/nvidia/deepstream/deepstream-7.0/sources/apps/sample_apps/deepstream-3d-action-recognition/rgb_resnet18_2D_32sql.onnx_b12_gpu0_fp16.engine
0:00:03.378604023    31 0x635a3cb169f0 INFO                 nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<primary-nvinference-engine> [UID 1]: Load new model:config_infer_primary_2d_action.txt sucessfully
sequence_image_process.cpp:489, [INFO: CUSTOM_LIB] 2D custom sequence network shape NSHW[12, 96, 224, 224], reshaped as [N: 12, C: 3, S:32, H: 224, W:224]
sequence_image_process.cpp:512, [INFO: CUSTOM_LIB] Sequence preprocess buffer manager initialized with stride: 1, subsample: 0
sequence_image_process.cpp:516, [INFO: CUSTOM_LIB] SequenceImagePreprocess initialized successfully
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Running...
Read detection info.
Decodebin child added: decodebin0
Decodebin child added: rtph265depay0
Decodebin child added: h265parse0
Decodebin child added: capsfilter0
Decodebin child added: nvv4l2decoder0
Decodebin child added: decodebin1
Decodebin child added: rtph265depay1
Decodebin child added: h265parse1
Decodebin child added: capsfilter1
Decodebin child added: nvv4l2decoder1
Decodebin child added: decodebin2
Decodebin child added: rtph265depay2
Decodebin child added: h265parse2
Decodebin child added: capsfilter2
Decodebin child added: decodebin3
Decodebin child added: nvv4l2decoder2
Decodebin child added: rtph265depay3
Decodebin child added: h265parse3
Decodebin child added: capsfilter3
Decodebin child added: nvv4l2decoder3
Decodebin child added: decodebin4
Decodebin child added: decodebin5
Decodebin child added: rtph265depay4
Decodebin child added: rtph265depay5
Decodebin child added: h265parse4
Decodebin child added: h265parse5
Decodebin child added: capsfilter4
Decodebin child added: capsfilter5
Decodebin child added: nvv4l2decoder4
Decodebin child added: decodebin6
Decodebin child added: nvv4l2decoder5
Decodebin child added: rtph265depay6
Decodebin child added: decodebin7
Decodebin child added: h265parse6
Decodebin child added: capsfilter6
Decodebin child added: rtph265depay7
Decodebin child added: h265parse7
Decodebin child added: capsfilter7
Decodebin child added: nvv4l2decoder6
Decodebin child added: decodebin8
Decodebin child added: decodebin9
Decodebin child added: rtph265depay8
Decodebin child added: h265parse8
Decodebin child added: rtph265depay9
Decodebin child added: capsfilter8
Decodebin child added: nvv4l2decoder7
Decodebin child added: decodebin10
Decodebin child added: rtph265depay10
Decodebin child added: h265parse9
Decodebin child added: decodebin11
Decodebin child added: rtph265depay11
Decodebin child added: capsfilter9
Decodebin child added: h265parse10
Decodebin child added: capsfilter10
Decodebin child added: nvv4l2decoder8
Decodebin child added: h265parse11
Decodebin child added: nvv4l2decoder9
Decodebin child added: nvv4l2decoder10
Decodebin child added: capsfilter11
Decodebin child added: nvv4l2decoder11
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
FPS(cur/avg): 11.25 (11.25) 	11.25 (11.25) 	11.25 (11.25) 	11.25 (11.25) 	10.69 (10.69) 	11.25 (11.25) 	11.03 (11.03) 	11.25 (11.25) 	11.25 (11.25) 	11.03 (11.03) 	11.25 (11.25) 	10.69 (10.69) 	
FPS(cur/avg): 24.97 (15.61) 	24.97 (15.61) 	24.97 (15.61) 	24.97 (15.61) 	24.97 (15.29) 	24.97 (15.61) 	24.97 (15.51) 	24.97 (15.61) 	24.97 (15.61) 	24.97 (15.51) 	24.97 (15.61) 	24.97 (15.29) 	
FPS(cur/avg): 25.01 (17.87) 	25.01 (17.87) 	25.01 (17.87) 	25.01 (17.87) 	25.01 (17.66) 	25.01 (17.87) 	25.01 (17.81) 	25.01 (17.87) 	25.01 (17.87) 	25.01 (17.81) 	25.01 (17.87) 	25.01 (17.66) 	
FPS(cur/avg): 25.00 (19.25) 	25.00 (19.25) 	25.00 (19.25) 	25.00 (19.25) 	25.00 (19.10) 	25.00 (19.25) 	25.00 (19.22) 	25.00 (19.25) 	25.00 (19.25) 	25.00 (19.22) 	25.00 (19.25) 	25.00 (19.10) 	
FPS(cur/avg): 25.00 (20.19) 	25.00 (20.19) 	25.00 (20.19) 	25.00 (20.19) 	25.00 (20.06) 	25.00 (20.19) 	25.00 (20.16) 	25.00 (20.19) 	25.00 (20.19) 	25.00 (20.16) 	25.00 (20.19) 	25.00 (20.06) 	
FPS(cur/avg): 25.01 (20.86) 	25.01 (20.86) 	25.01 (20.86) 	25.01 (20.86) 	25.01 (20.76) 	25.01 (20.86) 	25.01 (20.84) 	25.01 (20.86) 	25.01 (20.86) 	25.01 (20.84) 	25.01 (20.86) 	25.01 (20.76) 	
Read detection info.
FPS(cur/avg): 24.97 (21.37) 	24.97 (21.37) 	24.97 (21.37) 	24.97 (21.37) 	24.97 (21.28) 	24.97 (21.37) 	24.97 (21.35) 	24.97 (21.37) 	24.97 (21.37) 	24.97 (21.35) 	24.97 (21.37) 	24.97 (21.28) 	
FPS(cur/avg): 25.00 (21.76) 	25.00 (21.76) 	25.00 (21.76) 	25.00 (21.76) 	25.00 (21.69) 	25.00 (21.76) 	25.00 (21.75) 	25.00 (21.76) 	25.00 (21.76) 	25.00 (21.75) 	25.00 (21.76) 	25.00 (21.69) 	
FPS(cur/avg): 25.01 (22.09) 	25.01 (22.09) 	25.01 (22.09) 	25.01 (22.09) 	25.01 (22.02) 	25.01 (22.09) 	25.01 (22.08) 	25.01 (22.09) 	25.01 (22.09) 	25.01 (22.08) 	25.01 (22.09) 	25.01 (22.02) 	
FPS(cur/avg): 24.99 (22.34) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.28) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.28) 	
ERROR from element nvvideo-renderer: Output window was closed
Error details: ext/eglgles/gsteglglessink.c(900): gst_eglglessink_event_thread (): /GstPipeline:preprocess-test-pipeline/GstEglGlesSink:nvvideo-renderer
Deleting array 
Returned, stopping playback
Deleting pipeline
sequence_image_process.cpp:578, [INFO: CUSTOM_LIB] SequenceImagePreprocess is deinitializing
(base) root@user-Nuvo-10000-Series:/workspace/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-3d-action-recognition#

edit_or · July 31, 2025, 1:07am

TAO has evaluation command.
I use that one and tested 10 test folders for each activity, which are not used in training.
Have 100% accuracy in testing.
The following command.

action_recognition evaluate \
                    -e /workspace/mnt/sda/Activities/BMTC_V/Fights/pretrained_2D/experiment.yaml \
                    encryption_key="nvidia_tao" \
                    results_dir=/workspace/mnt/sda/Activities/BMTC_V/Fights/results_2D/rgb_3d_ptm_32sql \
                    dataset.workers=0 \
                    evaluate.checkpoint=/workspace/mnt/sda/Activities/BMTC_V/Fights/results_2D/rgb_3d_ptm_32sql/train/lightning_logs/version_0/checkpoints/epoch19.tlt  \
                    evaluate.batch_size=1 \
                    evaluate.test_dataset_dir=/workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/test \
                    evaluate.video_eval_mode=center

This is training config file.

results_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/results/rgb_3d_ptm_32sql
encryption_key: nvidia_tao
model:
  model_type: rgb
  backbone: resnet_18
  rgb_seq_length: 32
  input_height: 224
  input_width: 224
  input_type: 2d
  sample_strategy: consecutive
  dropout_ratio: 0.0
  rgb_pretrained_num_classes: 5
dataset:
  train_dataset_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/train
  val_dataset_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/test
  label_map:
    normal: 0
    fight: 1
  batch_size: 8
  workers: 8
  clips_per_video: 5
  augmentation_config:
    train_crop_type: random_crop
    horizontal_flip_prob: 0.5
    rgb_input_mean: [0.5]
    rgb_input_std: [0.5]
    val_center_crop: true
    crop_smaller_edge: 256
train:
  optim:
    lr: 0.01
    momentum: 0.9
    weight_decay: 0.0005
    lr_scheduler: MultiStep
    lr_steps: [30, 60, 80]
    lr_decay: 0.1
  num_epochs: 20
  checkpoint_interval: 1
evaluate:
  checkpoint: "??"
  test_dataset_dir: "??"
inference:
  checkpoint: "??"
  inference_dataset_dir: "??"
export:
  checkpoint: "??"

It is 100% evaluating at TAO.
But testing at deepstream, even using the same training videos have wrong classification between Fight and Normal.

fanzh · July 31, 2025, 10:05am

Noticing you trained a 3d model actionrecognitionnet, but config_preprocess_2d_custom.txt is for the 2d model. for example, the sequence length of 2d model is 96. please make sure the preprocessing parameters are consistent with those used for training.

edit_or · August 1, 2025, 12:28am

No I trained on 2D RGB pretrained model. resnet18_2d_rgb_hmdb5_32.tlt. Yes rgb_seq_length: 32 in training. So I set network-input-shape is set 12;96;224;224 for batch size 12. Deepstream config files are here
deepstream_action_recognition_config.txt (4.6 KB)
config_preprocess_2d_custom.txt (2.9 KB)
config_infer_primary_2d_action.txt (2.6 KB)

Training specs file is as follows.

results_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/results/rgb_3d_ptm_32sql
encryption_key: nvidia_tao
model:
  model_type: rgb
  backbone: resnet_18
  rgb_seq_length: 32
  input_height: 224
  input_width: 224
  input_type: 2d
  sample_strategy: consecutive
  dropout_ratio: 0.0
  rgb_pretrained_num_classes: 5
dataset:
  train_dataset_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/train
  val_dataset_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/test
  label_map:
    normal: 0
    fight: 1
  batch_size: 8
  workers: 8
  clips_per_video: 5
  augmentation_config:
    train_crop_type: random_crop
    horizontal_flip_prob: 0.5
    rgb_input_mean: [0.5]
    rgb_input_std: [0.5]
    val_center_crop: true
    crop_smaller_edge: 256
train:
  optim:
    lr: 0.01
    momentum: 0.9
    weight_decay: 0.0005
    lr_scheduler: MultiStep
    lr_steps: [30, 60, 80]
    lr_decay: 0.1
  num_epochs: 20
  checkpoint_interval: 1
evaluate:
  checkpoint: "??"
  test_dataset_dir: "??"
inference:
  checkpoint: "??"
  inference_dataset_dir: "??"
export:
  checkpoint: "??"

fanzh · August 1, 2025, 3:00am

Noticing network-color-format and network-input-order are changed, why do you need to change these values? Please make sure all configurations are correct.

edit_or · August 1, 2025, 3:04am

network-input-shape needs to be 12;96;224;224, right? 12 is batch size. 96 is for rgb_seq_length: 32 times 3 channels so 96. 224 are w & h.
network-color-format is I just want to test.

fanzh · August 1, 2025, 8:49am

Yes, you can change network-input-shape. did you try the original network-color-format and network-input-order value? you can compare the files to check which configurations are changed.=

edit_or · August 4, 2025, 6:36am

I tested as follows.

    # 0=RGB, 1=BGR, 2=GRAY
network-color-format=0
    # 0=NCHW, 1=NHWC, 2=CUSTOM
network-input-order=2

But still have errors in classification.

fanzh · August 19, 2025, 8:40am

Sorry for the late reply, Is this still an DeepStream issue to support? Thanks!

Topic		Replies	Views
Issue with Deepstream Inference of custom 3D action recognition model DeepStream SDK	8	1036	May 18, 2022
Error details: gstnvdspreprocess.cpp(372): gst_nvdspreprocess_start (): /GstPipeline:preprocess-test-pipeline/GstNvDsPreProcess:preprocess-plugin: DeepStream SDK tao , deepstream61	16	1510	August 4, 2022
Integrating my Custom TAO Action Recognition Net to Deepstream DeepStream SDK	34	986	June 27, 2023
ERROR: nvdsinfer_backend.cpp:472 Failed to enqueue buffer in fulldims mode because binding idx: 0 with batchDims: 1x96x224x224 is not supported TAO Toolkit tao	18	1050	August 1, 2022
Issue with deepstream-3d-action-recognition sample app DeepStream SDK deepstream	5	208	July 12, 2024
Using Custom action recognition Model in Deepstream 3D action recognition DeepStream SDK tensorrt , gstreamer , deepstream	39	2677	June 6, 2022
I want to testing action recognition with deepstream_test5, and use 3d onnx model ,but failed DeepStream SDK	5	98	August 9, 2024
I want to testing action recognition with deepstream_test5, but show RGB/BGR input format specified but network input channels is not 3 DeepStream SDK deepstream	3	100	August 9, 2024
3d_action_detection got stuck in Running DeepStream SDK	4	319	October 26, 2023
Developing and Deploying Your Custom Action Recognition Application Without Any AI Expertise Using NVIDIA TAO and NVIDIA DeepStream Technical Blog	7	1402	March 29, 2023

Running deepstream-3d-action-recognition application

Related topics