Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc) RTX4080
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) ActionRecognitionNet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt-base
• Training spec file(If have, please share here)
I trained ActionRecognitionNet with clips_per_video: 5 and rgb_seq_length==32 (Tried rgb_seq_length=16 also). But model doesn’t work well. Normal activities are detected as Fight.
Model is trained for only two activities. Fight and Normal. The sample clips will be shared in private message when I have reply.
How can I improve the accuracy?
Fight scenario is two persons fights. The fight activities are quite consisent. Normal activities are just normal like walking along the corridor.
The two activities are quite different. Fight is mostly between two persons and normal is mostly walking at corridors. The corridors with no people also trained in the normal activities.
Training works well and accuracy is quite high nearly 1.
I also do augmentation for brightness and blurrry.
But when model is deployed, always detect as Fight. Even no people and no activities are all detected as Fight.
In the training dataset, Fight activities have two people fighting. But now even corridors have no people, it is detected as Fight.
How can I improve the training?