Slow inference speed on RTX 3080

Hello everyone.

I have encountered an issue with tensorflow session initialization and inference.

I’ve trained a neural network that processes images and I’m loading it in C++ application using Tensorflow’s C-API. The network performs as expected on previous generation cards (RTX 2000 series and GTX 1000 series), but there are issues with RTX 3080s. In short, loading graph into session and running inference for the first time takes unusually long time (several minutes, up to a dozen). This only happens for graph loading and first inference call, each subsequent call to TF_SessionRun function takes approximately 100ms (which is inline with expectations).

The model is saved as .pb file, the Tensorflow version is Tensorflow-gpu 1.14, CUDA version is 10.0 and CuDNN version is 7.6.5. and the drivers are 457.30.

Any help is appreciated.