CUDA performance running at fp16 precision in Linux 35-40% lower than in Windows with RTX 2080TI


I saw a similar topic just below mine with lower performance on Kepler hardware, but decided to open a new thread.
In my case it’s an RTX 2080 TI running on Linux Ubuntu 18.04, same problem occurs in Ubuntu 16.04.

I get about 35-40% less performance in Linux than in Windows using CUDNN backend with fp16 precision and using driver version 410.72 when running a chess neural network engine. With fp32 precision the performance is as expected (and same as in Windows).
With fp16 I’m also observing a lower GPU utilization of 40-50% (while it should be 95-100%).

Nvidia bug report is attached, it was running when GPU was loaded.

cuDNN version: v7.4.1 (Nov 8, 2018)
CUDA version: 10.0

Same problem with previous CUDNN version 7.4.0.

It looks to me as if the GPU just wouldn’t be used at full power at this mode.

Solved by correcting compile options. Performance in Linux is ok now.