tesla m40 runs extremely slowly

My keras code with tensorflow backend runs extremely slowly on my Tesla M40 GPUs. I doubt that there are some bugs in my code. However, when I run the same code on another 1080ti GPU, it runs very fast. I test the same code on 3 Tesla M40 GPU group and another single 1080ti, however, the single 1080ti runs much faster than 3 Tesla M40(8 times faster), by the way, I am using keras official multiple GPUs API,https://github.com/fchollet/keras/blob/master/keras/utils/training_utils.py. Sometimes, the Volatile GPU-Util was 100% without any running process. Furthermore, the power usage is always no more than 100W. Is this a hardware problem?

hey,man,have you solved this problem?