Higher Resource Consumption on Ampere architecture vs Turing architecture

I have a machine with an nvidia RTX 8000 that i used to run certain AI models, (Resnet50 with Pytorch, AZURE Custom vision exported as onnx, and others). And i recently got a new machine that has an RTX A4000 GPU. I have noticed a big diffrence in resource consumption, Both on CPU-RAM and GPU-RAM

as an example, One ONNX model on the RTX 8000 machines consumes 1.8GB of CPU-RAM and 711 MB GPU-RAM just to load a single image and run inference on it continously, if i run the exact same code with same model on the RTX A4000 machine it consumes 3.2 GB CPU-RAM and 1.49 GB of GPU-RAM, which is a huge diffrence.

I understand that these two cards are different arch (Turing vs Ampere) but i am suprissed to see such a huge difference. at first i thought it could be cuda versioning, but i have tried different combinations of Cuda and Cudnn and the result is the same.

Do you guys have any idea what could be the casue and if there is a way to solve it?