CUDA kernel and cuDNN results different performance from CUDA v11.3

Hi, I am a engineer who codes inference system using cuDNN and CUDA kernel.

We have been updating our Deeplearning inference program since CUDA 10.2. However, since CUDA 11.3, we have confirmed that the inference performance of cuDNN’s dependent kernel as well as our simple separate CUDA kernel shows very low performance.
Not only does the performance slow down, but it’s very unstable, sometimes resulting in a 5x or more performance drop.

We are using RTX 3090, and we have confirmed that CUDA versions 11.3~11.7 show the same symptoms. (cuDNN uses the appropriate version)
and… we use Visual Studio 2019.

I wonder if there are any special compilation or programming changes from version 11.3.

Thank you.