Higher performance when display is connected to dGPU instead of iGPU

I have a PC on which I want to perform AI inference with CUDA 11.6. Between two input images, there is usually a delay of a second. I noticed that the inference often takes too long (180ms instead of 30ms). After some investigations, I found two work-arounds.

  1. Artifically decrease the delay between two input images. However, outside of testing, I cannot influence the delay.
  2. Disable integrated graphics (i7-6700K with Intel HD Graphics 530) in Windows Device Manager. If the iGPU is disabled or the display port cable is plugged into the GPU (Quadro RTX 4000) instead of the mainboard, inference becomes fast.

I suspect there is a wrong configuration because intuitively, performance should increase if the Quadro is only used for computing and not as a display source. I am aware of setting “Power management mode” to maximum performance, however, that did not help. Does anyone have an idea?

The system has Windows 10, and for inference I use the Onnx Runtime (1.9) with CUDA 11.6 and cuDNN 8.4.1 as execution provider.