Please provide complete information as applicable to your setup.
**• Hardware Platform (Jetson / GPU)**RTX4080
• DeepStream Version 7.0
i have action_recognition_net trained with TAO library. Trained for Fight and Normal activities. Tested with about 10 test folders for each activity and achieved 100% accuracy. Those test folders are never used in training and validation.
When run the model using deepstream-3d-action-recognition
app, detection is totally opposite. All those normal activities are detected as Fight.
May I know what to change in my config files?
deepstream_action_recognition_config.txt (3.8 KB)
config_preprocess_2d_custom.txt (2.9 KB)
config_infer_primary_2d_action.txt (2.6 KB)
What could be wrong in my running application?
I used 2D RGB pretrained model. resnet18_2d_rgb_hmdb5_32.tlt
.
No modification to preprocessing parameters. All are as shown in the attached config files.
The following is the running log.
(base) root@user-Nuvo-10000-Series:/workspace/opt/nvidia/deepstream/deepstream/s
ources/apps/sample_apps/deepstream-3d-action-recognition# ./deepstream-3d-action-recognition -c deepstream_action_recognition_config.txt
num-sources = 12
Now playing: rtsp://admin:nextan6423@172.16.158.244:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.244:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.244:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.244:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.245:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.245:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.245:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.245:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.247:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.247:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.247:554/cam/realmonitor?channel=1&subtype=0, rtsp://admin:nextan6423@172.16.158.247:554/cam/realmonitor?channel=1&subtype=0,
0:00:03.278705967 31 0x635a3cb169f0 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:2095> [UID = 1]: deserialized trt engine from :/workspace/opt/nvidia/deepstream/deepstream-7.0/sources/apps/sample_apps/deepstream-3d-action-recognition/rgb_resnet18_2D_32sql.onnx_b12_gpu0_fp16.engine
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:612 [FullDims Engine Info]: layers num: 2
0 INPUT kFLOAT input_rgb 96x224x224 min: 1x96x224x224 opt: 12x96x224x224 Max: 12x96x224x224
1 OUTPUT kFLOAT fc_pred 2 min: 0 opt: 0 Max: 0
0:00:03.375988024 31 0x635a3cb169f0 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2198> [UID = 1]: Use deserialized engine model: /workspace/opt/nvidia/deepstream/deepstream-7.0/sources/apps/sample_apps/deepstream-3d-action-recognition/rgb_resnet18_2D_32sql.onnx_b12_gpu0_fp16.engine
0:00:03.378604023 31 0x635a3cb169f0 INFO nvinfer gstnvinfer_impl.cpp:343:notifyLoadModelStatus:<primary-nvinference-engine> [UID 1]: Load new model:config_infer_primary_2d_action.txt sucessfully
sequence_image_process.cpp:489, [INFO: CUSTOM_LIB] 2D custom sequence network shape NSHW[12, 96, 224, 224], reshaped as [N: 12, C: 3, S:32, H: 224, W:224]
sequence_image_process.cpp:512, [INFO: CUSTOM_LIB] Sequence preprocess buffer manager initialized with stride: 1, subsample: 0
sequence_image_process.cpp:516, [INFO: CUSTOM_LIB] SequenceImagePreprocess initialized successfully
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Decodebin child added: source
Running...
Read detection info.
Decodebin child added: decodebin0
Decodebin child added: rtph265depay0
Decodebin child added: h265parse0
Decodebin child added: capsfilter0
Decodebin child added: nvv4l2decoder0
Decodebin child added: decodebin1
Decodebin child added: rtph265depay1
Decodebin child added: h265parse1
Decodebin child added: capsfilter1
Decodebin child added: nvv4l2decoder1
Decodebin child added: decodebin2
Decodebin child added: rtph265depay2
Decodebin child added: h265parse2
Decodebin child added: capsfilter2
Decodebin child added: decodebin3
Decodebin child added: nvv4l2decoder2
Decodebin child added: rtph265depay3
Decodebin child added: h265parse3
Decodebin child added: capsfilter3
Decodebin child added: nvv4l2decoder3
Decodebin child added: decodebin4
Decodebin child added: decodebin5
Decodebin child added: rtph265depay4
Decodebin child added: rtph265depay5
Decodebin child added: h265parse4
Decodebin child added: h265parse5
Decodebin child added: capsfilter4
Decodebin child added: capsfilter5
Decodebin child added: nvv4l2decoder4
Decodebin child added: decodebin6
Decodebin child added: nvv4l2decoder5
Decodebin child added: rtph265depay6
Decodebin child added: decodebin7
Decodebin child added: h265parse6
Decodebin child added: capsfilter6
Decodebin child added: rtph265depay7
Decodebin child added: h265parse7
Decodebin child added: capsfilter7
Decodebin child added: nvv4l2decoder6
Decodebin child added: decodebin8
Decodebin child added: decodebin9
Decodebin child added: rtph265depay8
Decodebin child added: h265parse8
Decodebin child added: rtph265depay9
Decodebin child added: capsfilter8
Decodebin child added: nvv4l2decoder7
Decodebin child added: decodebin10
Decodebin child added: rtph265depay10
Decodebin child added: h265parse9
Decodebin child added: decodebin11
Decodebin child added: rtph265depay11
Decodebin child added: capsfilter9
Decodebin child added: h265parse10
Decodebin child added: capsfilter10
Decodebin child added: nvv4l2decoder8
Decodebin child added: h265parse11
Decodebin child added: nvv4l2decoder9
Decodebin child added: nvv4l2decoder10
Decodebin child added: capsfilter11
Decodebin child added: nvv4l2decoder11
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
In cb_newpad
FPS(cur/avg): 11.25 (11.25) 11.25 (11.25) 11.25 (11.25) 11.25 (11.25) 10.69 (10.69) 11.25 (11.25) 11.03 (11.03) 11.25 (11.25) 11.25 (11.25) 11.03 (11.03) 11.25 (11.25) 10.69 (10.69)
FPS(cur/avg): 24.97 (15.61) 24.97 (15.61) 24.97 (15.61) 24.97 (15.61) 24.97 (15.29) 24.97 (15.61) 24.97 (15.51) 24.97 (15.61) 24.97 (15.61) 24.97 (15.51) 24.97 (15.61) 24.97 (15.29)
FPS(cur/avg): 25.01 (17.87) 25.01 (17.87) 25.01 (17.87) 25.01 (17.87) 25.01 (17.66) 25.01 (17.87) 25.01 (17.81) 25.01 (17.87) 25.01 (17.87) 25.01 (17.81) 25.01 (17.87) 25.01 (17.66)
FPS(cur/avg): 25.00 (19.25) 25.00 (19.25) 25.00 (19.25) 25.00 (19.25) 25.00 (19.10) 25.00 (19.25) 25.00 (19.22) 25.00 (19.25) 25.00 (19.25) 25.00 (19.22) 25.00 (19.25) 25.00 (19.10)
FPS(cur/avg): 25.00 (20.19) 25.00 (20.19) 25.00 (20.19) 25.00 (20.19) 25.00 (20.06) 25.00 (20.19) 25.00 (20.16) 25.00 (20.19) 25.00 (20.19) 25.00 (20.16) 25.00 (20.19) 25.00 (20.06)
FPS(cur/avg): 25.01 (20.86) 25.01 (20.86) 25.01 (20.86) 25.01 (20.86) 25.01 (20.76) 25.01 (20.86) 25.01 (20.84) 25.01 (20.86) 25.01 (20.86) 25.01 (20.84) 25.01 (20.86) 25.01 (20.76)
Read detection info.
FPS(cur/avg): 24.97 (21.37) 24.97 (21.37) 24.97 (21.37) 24.97 (21.37) 24.97 (21.28) 24.97 (21.37) 24.97 (21.35) 24.97 (21.37) 24.97 (21.37) 24.97 (21.35) 24.97 (21.37) 24.97 (21.28)
FPS(cur/avg): 25.00 (21.76) 25.00 (21.76) 25.00 (21.76) 25.00 (21.76) 25.00 (21.69) 25.00 (21.76) 25.00 (21.75) 25.00 (21.76) 25.00 (21.76) 25.00 (21.75) 25.00 (21.76) 25.00 (21.69)
FPS(cur/avg): 25.01 (22.09) 25.01 (22.09) 25.01 (22.09) 25.01 (22.09) 25.01 (22.02) 25.01 (22.09) 25.01 (22.08) 25.01 (22.09) 25.01 (22.09) 25.01 (22.08) 25.01 (22.09) 25.01 (22.02)
FPS(cur/avg): 24.99 (22.34) 24.99 (22.34) 24.99 (22.34) 24.99 (22.34) 24.99 (22.28) 24.99 (22.34) 24.99 (22.34) 24.99 (22.34) 24.99 (22.34) 24.99 (22.34) 24.99 (22.34) 24.99 (22.28)
ERROR from element nvvideo-renderer: Output window was closed
Error details: ext/eglgles/gsteglglessink.c(900): gst_eglglessink_event_thread (): /GstPipeline:preprocess-test-pipeline/GstEglGlesSink:nvvideo-renderer
Deleting array
Returned, stopping playback
Deleting pipeline
sequence_image_process.cpp:578, [INFO: CUSTOM_LIB] SequenceImagePreprocess is deinitializing
(base) root@user-Nuvo-10000-Series:/workspace/opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-3d-action-recognition#
TAO has evaluation command.
I use that one and tested 10 test folders for each activity, which are not used in training.
Have 100% accuracy in testing.
The following command.
action_recognition evaluate \
-e /workspace/mnt/sda/Activities/BMTC_V/Fights/pretrained_2D/experiment.yaml \
encryption_key="nvidia_tao" \
results_dir=/workspace/mnt/sda/Activities/BMTC_V/Fights/results_2D/rgb_3d_ptm_32sql \
dataset.workers=0 \
evaluate.checkpoint=/workspace/mnt/sda/Activities/BMTC_V/Fights/results_2D/rgb_3d_ptm_32sql/train/lightning_logs/version_0/checkpoints/epoch19.tlt \
evaluate.batch_size=1 \
evaluate.test_dataset_dir=/workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/test \
evaluate.video_eval_mode=center
This is training config file.
results_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/results/rgb_3d_ptm_32sql
encryption_key: nvidia_tao
model:
model_type: rgb
backbone: resnet_18
rgb_seq_length: 32
input_height: 224
input_width: 224
input_type: 2d
sample_strategy: consecutive
dropout_ratio: 0.0
rgb_pretrained_num_classes: 5
dataset:
train_dataset_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/train
val_dataset_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/test
label_map:
normal: 0
fight: 1
batch_size: 8
workers: 8
clips_per_video: 5
augmentation_config:
train_crop_type: random_crop
horizontal_flip_prob: 0.5
rgb_input_mean: [0.5]
rgb_input_std: [0.5]
val_center_crop: true
crop_smaller_edge: 256
train:
optim:
lr: 0.01
momentum: 0.9
weight_decay: 0.0005
lr_scheduler: MultiStep
lr_steps: [30, 60, 80]
lr_decay: 0.1
num_epochs: 20
checkpoint_interval: 1
evaluate:
checkpoint: "??"
test_dataset_dir: "??"
inference:
checkpoint: "??"
inference_dataset_dir: "??"
export:
checkpoint: "??"
It is 100% evaluating at TAO.
But testing at deepstream, even using the same training videos have wrong classification between Fight and Normal.
fanzh
July 31, 2025, 10:05am
6
Noticing you trained a 3d model actionrecognitionnet, but config_preprocess_2d_custom.txt is for the 2d model. for example, the sequence length of 2d model is 96. please make sure the preprocessing parameters are consistent with those used for training.
No I trained on 2D RGB pretrained model. resnet18_2d_rgb_hmdb5_32.tlt
. Yes rgb_seq_length: 32 in training. So I set network-input-shape is set 12;96;224;224 for batch size 12. Deepstream config files are here
deepstream_action_recognition_config.txt (4.6 KB)
config_preprocess_2d_custom.txt (2.9 KB)
config_infer_primary_2d_action.txt (2.6 KB)
Training specs file is as follows.
results_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/results/rgb_3d_ptm_32sql
encryption_key: nvidia_tao
model:
model_type: rgb
backbone: resnet_18
rgb_seq_length: 32
input_height: 224
input_width: 224
input_type: 2d
sample_strategy: consecutive
dropout_ratio: 0.0
rgb_pretrained_num_classes: 5
dataset:
train_dataset_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/train
val_dataset_dir: /workspace/mnt/sda/Activities/BMTC_V/Fights/dataset/test
label_map:
normal: 0
fight: 1
batch_size: 8
workers: 8
clips_per_video: 5
augmentation_config:
train_crop_type: random_crop
horizontal_flip_prob: 0.5
rgb_input_mean: [0.5]
rgb_input_std: [0.5]
val_center_crop: true
crop_smaller_edge: 256
train:
optim:
lr: 0.01
momentum: 0.9
weight_decay: 0.0005
lr_scheduler: MultiStep
lr_steps: [30, 60, 80]
lr_decay: 0.1
num_epochs: 20
checkpoint_interval: 1
evaluate:
checkpoint: "??"
test_dataset_dir: "??"
inference:
checkpoint: "??"
inference_dataset_dir: "??"
export:
checkpoint: "??"
fanzh
August 1, 2025, 3:00am
8
Noticing network-color-format and network-input-order are changed, why do you need to change these values? Please make sure all configurations are correct.
fanzh:
network-color-format
network-input-shape needs to be 12;96;224;224, right? 12 is batch size. 96 is for rgb_seq_length: 32 times 3 channels so 96. 224 are w & h.
network-color-format is I just want to test.
fanzh
August 1, 2025, 8:49am
10
Yes, you can change network-input-shape. did you try the original network-color-format and network-input-order value? you can compare the files to check which configurations are changed.=
I tested as follows.
# 0=RGB, 1=BGR, 2=GRAY
network-color-format=0
# 0=NCHW, 1=NHWC, 2=CUSTOM
network-input-order=2
But still have errors in classification.
fanzh
August 19, 2025, 8:40am
13
Sorry for the late reply, Is this still an DeepStream issue to support? Thanks!