CUDA performance running at fp16 precision in Linux 35-40% lower than in Windows with RTX 2080TI

Hello,

I saw a similar topic just below mine with lower performance on Kepler hardware, but decided to open a new thread.
In my case it’s an RTX 2080 TI running on Linux Ubuntu 18.04, same problem occurs in Ubuntu 16.04.

I get about 35-40% less performance in Linux than in Windows using CUDNN backend with fp16 precision and using driver version 410.72 when running a chess neural network engine. With fp32 precision the performance is as expected (and same as in Windows).
With fp16 I’m also observing a lower GPU utilization of 40-50% (while it should be 95-100%).

Nvidia bug report is attached, it was running when GPU was loaded.

cuDNN version: v7.4.1 (Nov 8, 2018)
CUDA version: 10.0

Same problem with previous CUDNN version 7.4.0.

It looks to me as if the GPU just wouldn’t be used at full power at this mode.

Edit:
Solved by correcting compile options. Performance in Linux is ok now.