4090 lower performance than 3090 CUDA

Hello mates!
I had a problem: I had a 4090 paired with an Epyc 7532 as MoBo has an MZ32-AR0 V1.0 with 64GB of RAM ECC DDR4 2666Mhz

Samsung with NVMe drive, SSD drive, and an AIO TT Water3.0. It’s on Linux (Ubuntu 22.04) and the latest NVIDIA CUDA and CUDNN driver. Also, it had two PSUs, one RM750 for the CPU and MoBo and an RM850 for the GPU.

The issue is the next: this PC is used for research (Deep Learning). However, it seems slower than my previous 3090. Maybe I sound paranoid, but the 4090 does every epoch of ResNET-18 in 29s, while with the 3090, the time per epoch is 17s.

If you had any thoughts on what could be the issue, would be appreciated.

In addition, I use TF, Pytorch and Pycuda development frameworks. I have attached a lot of info on Windows and Linux, pictures.zip are the pictures of logs and workstation config, logs.zip are the results of several benchmarking tests on Linux and windows.

nvidia-bug-report.log.gz (381.3 KB)
3090_nvidia-bug-report.log.gz (390.8 KB)
pictures.zip (12.5 MB)
logs.zip (180.3 KB)

you’ll need to check whether the software you’re using has cuda kernels for cc 8.9 (4090) or only for cc 8.6 (3090)

@generix the system is running CC 8.9 here are pics of the mentioned issue, now with paired with the 3090, both cards have new PCI 4.0 risers due to dimensions of the cards and the motherboard.

nvidiasmi.log (17.3 KB)

From your screenshots, the 4090 is only using PCIe 4.0 x8 while the 3090 is using PCIe 4.0 x16.

Which slots do you have the cards in on the motherboard? Slot5 is listed as limited to x8. Slot3, Slot4, and Slot6 are x16.