Fight detection using action_recognition_net

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) RTX4080
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) action_recognition_net
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) nvcr.io/nvidia/tao/tao-toolkit:5.3.0-pyt
• Training spec file(If have, please share here)

train_rgb_2d_finetune.txt (882 Bytes)

I am training fight detection.

My fight scenario is quite simple. Two person face to face and fight each other like in the image.

Normal activities are like walking or sitting.

Now the issue is when deploy the model, model detection are quite random. Even nobody in the scene, just background, is detected as fight. Sometimes fight and normal is flickering. Model didn’t learn like what I train.

If two persons face to face and attack each other is Fight. Normal is one or two person just randomly walking or sitting is normal activity.

I have 160 images for each folder in training.

rgb_seq_length is 16 and clips_per_video is 10. That is why each folder has 160 images. I deploy model using deepstream.

You mentioned using 160 images for each class. With a sequence length of 16 and 10 clips per video, this implies you may be using only one video for your “fight” class and one for your “normal” class. A model trained on such limited data will not learn to recognize the action of fighting itself. Instead, it will likely overfit and learn to associate the specific, static background of that one video with the “fight” label. When it sees that background (or a similar one) in deployment, it incorrectly triggers a “fight” detection, even with no people present.

Please gather significantly more video clips for all your classes. Aim for dozens or hundreds of short videos recorded in diverse environments with different backgrounds, lighting, camera angles, and numbers of people.

No I have 128 + 128 = 256 folders for training and 27 + 27 = 54 folders for validation. Each folder has 160 images. Because I train rgb_seq_length: 32 and clips_per_video: 5.

Now it looks fine. I tested using the following command

action_recognition inference
-e results_3D/experiment.yaml
encryption_key=nvidia_tao
results_dir=results_3D/rgb_3d_ptm
dataset.workers=0
inference.checkpoint=results_3D/rgb_3d_ptm/train/model_epoch_099_step_15900.pth
inference.inference_dataset_dir=dataset/test/fight/02-05_normal_1/rgb
inference.video_inf_mode=center

I picked up fight and normal folder randomly and test. The inference has correct detection.

I like to test using TensorRT inference

I used the following command to convert the exported model to engine.

tao-converter fight.onnx -k nvidia_tao -p input_rgb,1x3x32x224x224,1x3x32x224x224,1x3x32x224x224 -e trt3d.engine -t fp16

fight.onnx is produced using the command

action_recognition export
-e results_3D/rgb_3d_ptm/train/export_rgb.yaml
encryption_key=nvidia_tao
results_dir=results_3D/rgb_3d_ptm
export.checkpoint=results_3D/rgb_3d_ptm/train/model_epoch_099_step_15900.pth
export.onnx_file=results_3D/rgb_3d_ptm/train/fight.onnx

fight.onnx is here.

I did tao-converter conversion in nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5.

Model training is done using nvcr.io/nvidia/tao/tao-toolkit:6.25.9-pyt

The error is

root@user-Nuvo-10000-Series:/workspace/mnt/sda/Activities/BMTC_V/Fights/results_3D/rgb_3d_ptm/train# tao-converter fight.onnx -k nvidia_tao -p input_rgb,1x96x224x224,1x96x224x224,1x96x224x224 -e fight.engine -t fp16
Error: no input dimensions given

Could you refer to TRTEXEC with ActionRecognitionNet — Tao Toolkit and ActionRecognitionNet — Tao Toolkit?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.