2080Ti slower than 1080Ti in mutl-gpus training

I have both two rack servers have 10 1080 Ti and 10 2080 Ti each, with same hardware spec and same Ubuntu 18.04.2 LTS, kernel in 4.15.0-46-generic

I was running waveglow repo(GitHub - NVIDIA/waveglow: A Flow-based Generative Network for Speech Synthesis) in multi-gpus training(8 gpus synchronous SGD) through slurm, however 2080Ti always slower than 1080Ti in each training step, I tried with different nvidia driver version, from 418 to 430 neither is helped in my case.

after more digging into this, looks like the optimizer.step() which is pytorch default method is pretty slow in 2080Ti, in the meantime, I observed during training , 2080Ti cards’ temperature always higher than 1080Ti 10-20C, from nvidia-smi, 1080Ti usually is in 48C, 2080Ti is around 72C, sometimes I can see power cap active in nviidia-smi -q -d clock

do we have any solution on how to solve this issue?

the venv for mine is python3.6
absl-py==0.7.1
apex==0.1
astor==0.8.0
audioread==2.1.7
cycler==0.10.0
decorator==4.4.0
future==0.17.1
gast==0.2.2
grpcio==1.20.1
h5py==2.9.0
inflect==0.2.5
joblib==0.13.2
Keras-Applications==1.0.7
Keras-Preprocessing==1.0.9
librosa==0.6.0
llvmlite==0.28.0
Markdown==3.1
matplotlib==2.1.0
mock==3.0.5
numba==0.43.1
numpy==1.16.4
Pillow==6.0.0
protobuf==3.7.1
pyparsing==2.4.0
python-dateutil==2.8.0
pytz==2019.1
resampy==0.2.1
scikit-learn==0.21.1
scipy==1.0.0
six==1.12.0
tb-nightly==1.14.0a20190604
tensorboard==1.13.1
tensorboardX==1.1
tensorflow==1.13.1
tensorflow-estimator==1.13.0
termcolor==1.1.0
torch==1.1.0
torchvision==0.3.0
Unidecode==1.0.22
Werkzeug==0.15.4