RTX3050 : Low clock speed

Hi everyone,

i’m experiencing a strange behaviour when testing my application on a new GPU (RTX3050) compiled with cuda 11.7. This application was formerly running on GTX 1660.

The screen is plugged on an intel iGPU, so the GTX 3050 is dedicate to CUDA kernels, but some kernels are execution time increased about nearly a 10x factor compared to GTX 1660.

Using NVIDIA NSight System i can see that the GPU isn’t working at full speed (only around 400Mhz).
On other GPUs using the same application and driver version, GTX 1060, GTX 1660 are working at full speed.
The CPU is not fully saturated.

here is the behaviour using “normal” performance mode :

For now going to NVidia Control Panel and set the power management to Maximum Performance fix the behaviour. But i don’t want to do this operation manually on all computers, especially since i observed the GPU is running at full speed even when not loaded (More electrical power need for nothing, can this make premature damages on GPU if the application runs 24/7 ?).

Here are my questions :
Can this be a bug/regression/feature in the power management ?
Using code, how can i help the driver to make my GPU work at full speed ?

You’ll find some informations on my setup here :
Operating System: Windows 10 IoT Enterprise LTSC 2021, 64-bit
DirectX version: 12.0
GPU processor: NVIDIA GeForce RTX 3050
Driver version: 526.98
Driver Type: DCH
Direct3D feature level: 12_1
CUDA Cores: 2560
Resizable BAR No
Core clock: 1792 MHz
Memory data rate: 14.00 Gbps
Memory interface: 128-bit
Memory bandwidth: 224.03 GB/s
Total available graphics memory: 24498 MB
Dedicated video memory: 8192 MB GDDR6
System video memory: 0 MB
Shared system memory: 16306 MB
Video BIOS version:
IRQ: Not used
Bus: PCI Express x8 Gen3