A huge difference of memory usage on different GPU

I have two workstation: one with RTX 3090 and another with RTX A6000. The model on RTX 3090 takes 8897MB in total.When it runs on the RTX A6000 it takes 36945MB. I have identical cuda, cudnn and pytorch version

That’s an interesting observation.
Can you share some more details about the environment and the model.