We have a commercial scientific application developed to run on the Tesla family of GPU cards.
The application has been running fine for about one year in the Tesla C2050 and the Tesla C2070 cards.
Recently, we have switched to the Tesla C2075, which replaces both the C2050 and the C2070.
With the Tesla C2075, the application occasionally hangs. This didn’t happen with the C2050 or the C2070.
At the hang state, nvidis-smi reports 99% GPU utilization.
We have not changed the code, neither the computer used by our customers. The GPU card is the only change.
We have reproduced the issue with two different C2075 cards coming from different suppliers, and in two different computers. As far as we know, the Tesla C2075 is similar to the Tesla C2070 but with added power management.
The application was developed with CUDA 4.0 and runs with TCC enabled and ECC disabled.
We have tested the following Tesla WHQL drivers and the issue persists: 276.52, 295.73, 296.35, and 296.70.
Can anyone give us a clue on what might be going on?
This is a brief description of one system:
Motherboard: Tyan S7025
CPU: Intel Xeon X5650 @ 2.67 GHz
Ram: 12 GB
OS: Windows 7 Professional 64-bit