ActionRecognitionNet for customized data

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc) RTX4080
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) ActionRecognitionNet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt-base
• Training spec file(If have, please share here)
I trained ActionRecognitionNet with clips_per_video: 5 and rgb_seq_length==32 (Tried rgb_seq_length=16 also). But model doesn’t work well. Normal activities are detected as Fight.
Model is trained for only two activities. Fight and Normal. The sample clips will be shared in private message when I have reply.
How can I improve the accuracy?
Fight scenario is two persons fights. The fight activities are quite consisent. Normal activities are just normal like walking along the corridor.
The two activities are quite different. Fight is mostly between two persons and normal is mostly walking at corridors. The corridors with no people also trained in the normal activities.
Training works well and accuracy is quite high nearly 1.
I also do augmentation for brightness and blurrry.
But when model is deployed, always detect as Fight. Even no people and no activities are all detected as Fight.
In the training dataset, Fight activities have two people fighting. But now even corridors have no people, it is detected as Fight.
How can I improve the training?

Could you please run inference against some training dataset to check if it is expected?

More, how many training dataset and validation dataset?

Yes I did. Tested with some test data, they are not in training. I have 100% detection accuracy for Fight & Normal activities. But when deploy, I have poor detection results. Training dataset and deployment environment, they are quite related. Because, all training images are taken from CCTVs where deployment environment is. I did some augmentation on the training data also.

So, you can get 100% detection when test some data which are not in training. But get different results during deployment. So, the gap is in the deploy period. How did you deploy?

I used deepstream-3d-actionrecogintion app.

Could you also try to run with https://docs.nvidia.com/tao/tao-toolkit/text/cv_finetuning/pytorch/action_recognition_net.html#running-actionrecognitionnet-inference-on-the-stand-alone-sample to narrow down?

May I know where do I get the application ar_trt_inference.py?
I am using docker nvcr.io/nvidia/tao/tao-toolkit:5.3.0-pyt to train.

Hi, please refer to:

A stand-alone TensorRT inference sample is also provided. It consumes the TensorRT engine and supports running with 2D/3D input on images. The sample can be found on Github.

The file is in tao_toolkit_recipes/tao_action_recognition/tensorrt_inference at main · NVIDIA-AI-IOT/tao_toolkit_recipes · GitHub.