RTX 4070s is 2x slower than 2080 at inference

Hello! I have very strange situation with two GPU. One is RTX2080 (Desktop) and the other is RTX4070 Super. And 2080 is 2x faster at fp16 object detector inference than 4070s. I tried different models and versions of pytorc-cuda. RTX is mounted in PCI gen4. And synthetic tests (Geekbench) shows that 4070s outperforms 2080 by 70%. But inference speed of 4070s is 2x slower. I am absolutely at a loss as to what the reason might be.I am absolutely at a loss as to what the reason might be. I have another RTX4070Super and the situation is the same. Any ideas? Thanks!