Hello,
As I already posted on another thread we are having problems with this server: http://www.supermicro.com/products/system/2u/2026/sys-2026gt-trf.cfm equipped with two Teslas M2090.
Running on...
Device 0: Tesla M2090
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2476.2
Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3141.6
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 140591.6
[bandwidthTest] test results...
PASSED
> exiting in 3 seconds: 3...2...1...done!
real 0m8.582s
user 0m0.182s
sys 0m2.401s
If you look at the sys time you will immediately see there is something wrong. Just for comparison, on a fairly old laptop the results are:
real 0m3.986s
user 0m0.640s
sys 0m0.336s
Anything related to cuda exhibits stalls or slowdowns which are explained by the abnormal high sys times
We were advised to try an older version of centOS (5.8) and the problem went away but now we are stuck with a really old system. We have tried centOS 6.2 with its standard kernel, centOS 6.2 with a 3.4 kernel, Fedora 17 and we always get the same bad results.
Does anybody have a clue how we could track down the problem. I really thought a modern kernel (3.4) would also solve the problem but it was not the case
Thank you vey much in advance.