Running deepstream-3d-action-recognition application

Please provide complete information as applicable to your setup.

**• Hardware Platform (Jetson / GPU)**RTX4080
• DeepStream Version7.0
i have action_recognition_net trained with TAO library. Trained for Fight and Normal activities. Tested with about 10 test folders for each activity and achieved 100% accuracy. Those test folders are never used in training and validation.
When run the model using deepstream-3d-action-recognitionapp, detection is totally opposite. All those normal activities are detected as Fight.
May I know what to change in my config files?
deepstream_action_recognition_config.txt (3.8 KB)
config_preprocess_2d_custom.txt (2.9 KB)
config_infer_primary_2d_action.txt (2.6 KB)
What could be wrong in my running application?

  1. which actionrecognitionnet version did you train the model on? what tools did you used to test 10 test folders for each activity and achieved 100% accuracy?
  2. did you modify the preprocessing parameters? if so, please make sure the preprocessing parameters are consistent with those used for training. please refer to this faq Debug Tips for DeepStream Accuracy Issue.
  3. could you share a DeepStream running log? Thanks!

I used 2D RGB pretrained model. resnet18_2d_rgb_hmdb5_32.tlt.
No modification to preprocessing parameters. All are as shown in the attached config files.
The following is the running log.

(base) root@user-Nuvo-10000-Series:/workspace/opt/nvidia/deepstream/deepstream/s
ources/apps/sample_apps/deepstream-3d-action-recognition# ./deepstream-3d-action-recognition -c deepstream_action_recognition_config.txt
num-sources = 12
Now playing: rtsp://admin:nextan6423@172.16.158.244:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.244:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.244:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.244:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.245:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.245:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.245:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.245:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.247:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.247:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.247:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.247:554/cam/realmonitor?channel=1&subtype=0,
0:00:03.278705967    31 0x635a3cb169f0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2095> [UID = 1]: deserialized trt engine from :/workspace/opt/nvidia/deepstream/deepstream-7.0/sources/apps/sample_apps/deepstream-3d-action-recognition/rgb_resnet18_2D_32sql.onnx_b12_gpu0_fp16.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:612 [FullDims Engine Info]: layers num: 2
0   INPUT  kFLOAT input_rgb       96x224x224      min: 1x96x224x224    opt: 12x96x224x224   Max: 12x96x224x224   
1   OUTPUT kFLOAT fc_pred         2               min: 0               opt: 0               Max: 0               

0:00:03.375988024    31 0x635a3cb169f0 INFO                 nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2198> [UID = 1]: Use deserialized engine model: /workspace/opt/nvidia/deepstream/deepstream-7.0/sources/apps/sample_apps/deepstream-3d-action-recognition/rgb_resnet18_2D_32sql.onnx_b12_gpu0_fp16.engine
0:00:03.378604023    31 0x635a3cb169f0 INFO                 nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<primary-nvinference-engine> [UID 1]: Load new model:config_infer_primary_2d_action.txt sucessfully
sequence_image_process.cpp:489, [INFO: CUSTOM_LIB] 2D custom sequence network shape NSHW[12, 96, 224, 224], reshaped as [N: 12, C: 3, S:32, H: 224, W:224]
sequence_image_process.cpp:512, [INFO: CUSTOM_LIB] Sequence preprocess buffer manager initialized with stride: 1, subsample: 0
sequence_image_process.cpp:516, [INFO: CUSTOM_LIB] SequenceImagePreprocess initialized successfully
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Running...
Read detection info.
Decodebin child added: decodebin0
Decodebin child added: rtph265depay0
Decodebin child added: h265parse0
Decodebin child added: capsfilter0
Decodebin child added: nvv4l2decoder0
Decodebin child added: decodebin1
Decodebin child added: rtph265depay1
Decodebin child added: h265parse1
Decodebin child added: capsfilter1
Decodebin child added: nvv4l2decoder1
Decodebin child added: decodebin2
Decodebin child added: rtph265depay2
Decodebin child added: h265parse2
Decodebin child added: capsfilter2
Decodebin child added: decodebin3
Decodebin child added: nvv4l2decoder2
Decodebin child added: rtph265depay3
Decodebin child added: h265parse3
Decodebin child added: capsfilter3
Decodebin child added: nvv4l2decoder3
Decodebin child added: decodebin4
Decodebin child added: decodebin5
Decodebin child added: rtph265depay4
Decodebin child added: rtph265depay5
Decodebin child added: h265parse4
Decodebin child added: h265parse5
Decodebin child added: capsfilter4
Decodebin child added: capsfilter5
Decodebin child added: nvv4l2decoder4
Decodebin child added: decodebin6
Decodebin child added: nvv4l2decoder5
Decodebin child added: rtph265depay6
Decodebin child added: decodebin7
Decodebin child added: h265parse6
Decodebin child added: capsfilter6
Decodebin child added: rtph265depay7
Decodebin child added: h265parse7
Decodebin child added: capsfilter7
Decodebin child added: nvv4l2decoder6
Decodebin child added: decodebin8
Decodebin child added: decodebin9
Decodebin child added: rtph265depay8
Decodebin child added: h265parse8
Decodebin child added: rtph265depay9
Decodebin child added: capsfilter8
Decodebin child added: nvv4l2decoder7
Decodebin child added: decodebin10
Decodebin child added: rtph265depay10
Decodebin child added: h265parse9
Decodebin child added: decodebin11
Decodebin child added: rtph265depay11
Decodebin child added: capsfilter9
Decodebin child added: h265parse10
Decodebin child added: capsfilter10
Decodebin child added: nvv4l2decoder8
Decodebin child added: h265parse11
Decodebin child added: nvv4l2decoder9
Decodebin child added: nvv4l2decoder10
Decodebin child added: capsfilter11
Decodebin child added: nvv4l2decoder11
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
FPS(cur/avg): 11.25 (11.25) 	11.25 (11.25) 	11.25 (11.25) 	11.25 (11.25) 	10.69 (10.69) 	11.25 (11.25) 	11.03 (11.03) 	11.25 (11.25) 	11.25 (11.25) 	11.03 (11.03) 	11.25 (11.25) 	10.69 (10.69) 	
FPS(cur/avg): 24.97 (15.61) 	24.97 (15.61) 	24.97 (15.61) 	24.97 (15.61) 	24.97 (15.29) 	24.97 (15.61) 	24.97 (15.51) 	24.97 (15.61) 	24.97 (15.61) 	24.97 (15.51) 	24.97 (15.61) 	24.97 (15.29) 	
FPS(cur/avg): 25.01 (17.87) 	25.01 (17.87) 	25.01 (17.87) 	25.01 (17.87) 	25.01 (17.66) 	25.01 (17.87) 	25.01 (17.81) 	25.01 (17.87) 	25.01 (17.87) 	25.01 (17.81) 	25.01 (17.87) 	25.01 (17.66) 	
FPS(cur/avg): 25.00 (19.25) 	25.00 (19.25) 	25.00 (19.25) 	25.00 (19.25) 	25.00 (19.10) 	25.00 (19.25) 	25.00 (19.22) 	25.00 (19.25) 	25.00 (19.25) 	25.00 (19.22) 	25.00 (19.25) 	25.00 (19.10) 	
FPS(cur/avg): 25.00 (20.19) 	25.00 (20.19) 	25.00 (20.19) 	25.00 (20.19) 	25.00 (20.06) 	25.00 (20.19) 	25.00 (20.16) 	25.00 (20.19) 	25.00 (20.19) 	25.00 (20.16) 	25.00 (20.19) 	25.00 (20.06) 	
FPS(cur/avg): 25.01 (20.86) 	25.01 (20.86) 	25.01 (20.86) 	25.01 (20.86) 	25.01 (20.76) 	25.01 (20.86) 	25.01 (20.84) 	25.01 (20.86) 	25.01 (20.86) 	25.01 (20.84) 	25.01 (20.86) 	25.01 (20.76) 	
Read detection info.
FPS(cur/avg): 24.97 (21.37) 	24.97 (21.37) 	24.97 (21.37) 	24.97 (21.37) 	24.97 (21.28) 	24.97 (21.37) 	24.97 (21.35) 	24.97 (21.37) 	24.97 (21.37) 	24.97 (21.35) 	24.97 (21.37) 	24.97 (21.28) 	
FPS(cur/avg): 25.00 (21.76) 	25.00 (21.76) 	25.00 (21.76) 	25.00 (21.76) 	25.00 (21.69) 	25.00 (21.76) 	25.00 (21.75) 	25.00 (21.76) 	25.00 (21.76) 	25.00 (21.75) 	25.00 (21.76) 	25.00 (21.69) 	
FPS(cur/avg): 25.01 (22.09) 	25.01 (22.09) 	25.01 (22.09) 	25.01 (22.09) 	25.01 (22.02) 	25.01 (22.09) 	25.01 (22.08) 	25.01 (22.09) 	25.01 (22.09) 	25.01 (22.08) 	25.01 (22.09) 	25.01 (22.02) 	
FPS(cur/avg): 24.99 (22.34) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.28) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.34) 	24.99 (22.28) 	
ERROR from element nvvideo-renderer: Output window was closed
Error details: ext/eglgles/gsteglglessink.c(900): gst_eglglessink_event_thread (): /GstPipeline:preprocess-test-pipeline/GstEglGlesSink:nvvideo-renderer
Deleting array 
Returned, stopping playback
Deleting pipeline
sequence_image_process.cpp:578, [INFO: CUSTOM_LIB] SequenceImagePreprocess is deinitializing
(base) root@user-Nuvo-10000-Series:/workspace/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-3d-action-recognition#

TAO has evaluation command.
I use that one and tested 10 test folders for each activity, which are not used in training.
Have 100% accuracy in testing.
The following command.

action_recognition evaluate \
                    -e /workspace/mnt/sda/Activities/BMTC_V/Fights/pretrained_2D/experiment.yaml \
                    encryption_key="nvidia_tao" \
                    results_dir=/workspace/mnt/sda/Activities/BMTC_V/Fights/results_2D/rgb_3d_ptm_32sql \
                    dataset.workers=0 \
                    evaluate.checkpoint=/workspace/mnt/sda/Activities/BMTC_V/Fights/results_2D/rgb_3d_ptm_32sql/train/lightning_logs/version_0/checkpoints/epoch19.tlt  \
                    evaluate.batch_size=1 \
                    evaluate.test_dataset_dir=/workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/test \
                    evaluate.video_eval_mode=center

This is training config file.

results_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/results/rgb_3d_ptm_32sql
encryption_key: nvidia_tao
model:
  model_type: rgb
  backbone: resnet_18
  rgb_seq_length: 32
  input_height: 224
  input_width: 224
  input_type: 2d
  sample_strategy: consecutive
  dropout_ratio: 0.0
  rgb_pretrained_num_classes: 5
dataset:
  train_dataset_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/train
  val_dataset_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/test
  label_map:
    normal: 0
    fight: 1
  batch_size: 8
  workers: 8
  clips_per_video: 5
  augmentation_config:
    train_crop_type: random_crop
    horizontal_flip_prob: 0.5
    rgb_input_mean: [0.5]
    rgb_input_std: [0.5]
    val_center_crop: true
    crop_smaller_edge: 256
train:
  optim:
    lr: 0.01
    momentum: 0.9
    weight_decay: 0.0005
    lr_scheduler: MultiStep
    lr_steps: [30, 60, 80]
    lr_decay: 0.1
  num_epochs: 20
  checkpoint_interval: 1
evaluate:
  checkpoint: "??"
  test_dataset_dir: "??"
inference:
  checkpoint: "??"
  inference_dataset_dir: "??"
export:
  checkpoint: "??"

It is 100% evaluating at TAO.
But testing at deepstream, even using the same training videos have wrong classification between Fight and Normal.

Noticing you trained a 3d model actionrecognitionnet, but config_preprocess_2d_custom.txt is for the 2d model. for example, the sequence length of 2d model is 96. please make sure the preprocessing parameters are consistent with those used for training.

No I trained on 2D RGB pretrained model. resnet18_2d_rgb_hmdb5_32.tlt. Yes rgb_seq_length: 32 in training. So I set network-input-shape is set 12;96;224;224 for batch size 12. Deepstream config files are here
deepstream_action_recognition_config.txt (4.6 KB)
config_preprocess_2d_custom.txt (2.9 KB)
config_infer_primary_2d_action.txt (2.6 KB)

Training specs file is as follows.

results_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/results/rgb_3d_ptm_32sql
encryption_key: nvidia_tao
model:
  model_type: rgb
  backbone: resnet_18
  rgb_seq_length: 32
  input_height: 224
  input_width: 224
  input_type: 2d
  sample_strategy: consecutive
  dropout_ratio: 0.0
  rgb_pretrained_num_classes: 5
dataset:
  train_dataset_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/train
  val_dataset_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/test
  label_map:
    normal: 0
    fight: 1
  batch_size: 8
  workers: 8
  clips_per_video: 5
  augmentation_config:
    train_crop_type: random_crop
    horizontal_flip_prob: 0.5
    rgb_input_mean: [0.5]
    rgb_input_std: [0.5]
    val_center_crop: true
    crop_smaller_edge: 256
train:
  optim:
    lr: 0.01
    momentum: 0.9
    weight_decay: 0.0005
    lr_scheduler: MultiStep
    lr_steps: [30, 60, 80]
    lr_decay: 0.1
  num_epochs: 20
  checkpoint_interval: 1
evaluate:
  checkpoint: "??"
  test_dataset_dir: "??"
inference:
  checkpoint: "??"
  inference_dataset_dir: "??"
export:
  checkpoint: "??"

Noticing network-color-format and network-input-order are changed, why do you need to change these values? Please make sure all configurations are correct.

network-input-shape needs to be 12;96;224;224, right? 12 is batch size. 96 is for rgb_seq_length: 32 times 3 channels so 96. 224 are w & h.
network-color-format is I just want to test.

Yes, you can change network-input-shape. did you try the original network-color-format and network-input-order value? you can compare the files to check which configurations are changed.=

I tested as follows.

    # 0=RGB, 1=BGR, 2=GRAY
network-color-format=0
    # 0=NCHW, 1=NHWC, 2=CUSTOM
network-input-order=2

But still have errors in classification.

Sorry for the late reply, Is this still an DeepStream issue to support? Thanks!