Need suggestions to improve my Visual behavioral cloning model

i need help to improve the learnability of my behavioral cloning model.

i couldn’t see much of loss changes.
Can you please suggest me the changes could improve it.
i am thinking of attention, but i would like to keep it once if it reaches certain movements to finetune.

Here is my model architecture

===================================================================================================================
Layer (type:depth-idx)                                            Output Shape              Param #
===================================================================================================================
DataParallel                                                      [1, 16]                   --
├─CustomModel: 1-1                                                [1, 16]                   124,313,708
├─CustomModel: 1-2                                                --                        --
│    └─Sequential: 2-1                                            [96, 1280, 1, 1]          4,007,548
│    └─Sequential: 2-2                                            --                        --
│    │    └─Sequential: 3-1                                       [96, 1280, 5, 9]          4,007,548
│    │    └─Sequential: 3-2                                       --                        4,007,548
│    │    └─AdaptiveAvgPool2d: 3-3                                [96, 1280, 1, 1]          --
│    └─ConvLSTM2D: 2-3                                            [1, 96, 1280, 1, 1]       117,964,800
│    └─ConvLSTM2D: 2-4                                            --                        --
│    │    └─ModuleList: 3-4                                       --                        117,964,800
│    └─Linear: 2-5                                                [1, 512]                  655,360
│    └─Dropout: 2-6                                               [1, 512]                  --
│    └─Conv2d: 2-7                                                [1, 256, 1, 1]            1,179,648
│    └─GroupNorm: 2-8                                             [1, 256, 1, 1]            512
│    └─Conv2d: 2-9                                                [1, 124, 1, 1]            285,696
│    └─GroupNorm: 2-10                                            [1, 124, 1, 1]            248
│    └─Conv2d: 2-11                                               [1, 64, 1, 1]             71,424
│    └─GroupNorm: 2-12                                            [1, 64, 1, 1]             128
│    └─Conv2d: 2-13                                               [1, 124, 1, 1]            71,424
│    └─GroupNorm: 2-14                                            [1, 124, 1, 1]            248
│    └─Conv2d: 2-15                                               [1, 64, 1, 1]             71,424
│    └─GroupNorm: 2-16                                            [1, 64, 1, 1]             128
│    └─Dropout: 2-17                                              [1, 64]                   --
│    └─Linear: 2-18                                               [1, 64]                   4,096
│    └─Linear: 2-19                                               [1, 11]                   704
│    └─Linear: 2-20                                               [1, 2]                    128
│    └─Linear: 2-21                                               [1, 1]                    64
│    └─Linear: 2-22                                               [1, 1]                    64
│    └─Linear: 2-23                                               [1, 1]                    64
===================================================================================================================
Total params: 507,363,040
Trainable params: 507,363,040
Non-trainable params: 0
Total mult-adds (Units.GIGABYTES): 44.38
===================================================================================================================
Input size (MB): 48.38
Forward/backward pass size (MB): 8965.26
Params size (MB): 497.25
Estimated Total Size (MB): 9510.90
===================================================================================================================