Upgraded from Cuda 5.5 (CentOS 6.5) to Cuda 9.1 (CentOS 7.4), now GPU application running slower aft...

Currently running CUDA application against two Kepler K20c GPU devices, and recently upgraded Cuda toolkint, and Linux OS. I was hoping to see similiar throughput performance after upgrading the Linux OS (Centos 5.5 → 7.4), and migrating from Cuda 5.5 to 9.1. However, surprisingly the the GPU application is running 1.5x slower under Cuda 9.1?! To me, it’s not obvious what could be causing the slow down. Has anyone run into a similiar issue after upgrading the Cuda toolkit?