A10 GPU using more GPU RAM than T4 GPU for inference using PyTorch TensorRT model

Screenshot 2021-12-02 at 3.27.32 PM
Screenshot 2021-12-02 at 3.27.43 PM
A10
Nvidia driver → 470 (A10 requires 470)
Cuda → 11.0
Cudnn → 8.1
TensorRT → 7.2.3.4
Torch → 1.7.1+cu110
TRTorch → 0.2.0
Python → 3.7

T4
Nvidia driver → 450
Cuda → 11.0
Cudnn → 8.1
TensorRT → 7.2.3.4
Torch → 1.7.1+cu110
TRTorch → 0.2.0
Python → 3.7
can you help why GPU RAM usage is higher for A10 than T4 and how we reduce it to run for multiple streams?