Measuring time

My workstation has 16 cores (2-socket quad-core with hyperthreading on) and 4 GPUs. When I run a GPU application, it takes 1.2 seconds to complete one step of a Monte Carlo code. However, when I run two instances of the
same application using different GPUs, the one launched 2nd takes 2.4 seconds to complete one step. Should I disable hyperthreading or set processor affinities? Not sure. Can someone help?