I have a 2 machines which are connected to a S1070. Each machine obviously sees 2 C1060 (half of the S1070) and if
I run deviceQuery I see all 4 C1060. However if I run a sample test from the SDK (reduction for example but any other application)
one of the machines will hang on device 0 and succeed on device 1 and the second machine will succeed on device 0 and hang on device 1.
Any ideas why? is it cable related?
I’m using CUDA 2.3 and my linux system is: Linux qa-slave5 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
Also, from time to time (after a crash or a reboot usually) the following files just dissapper and then deviceQuery returns only enumeration mode instead
of the 4 cards:
/dev/nvidiactl /dev/nvidia1 /dev/nvidia2
if I manually create thos files like this:
mknod -m 0666 /dev/nvidiactl c 195 255 mknod -m 0666 /dev/nvidia0 c 195 0 mknod -m 0666 /dev/nvidia1 c 195 1
the system sees the cards again and its working as described above.
any assistance would be very appriciated.
furthermore if I run bandwidth test on the “faulty” device I get this:
-bash-3.2$ ./bandwidthTest --device=0 Running on...... device 0:Tesla C1060 Quick Mode Host to Device Bandwidth for Pageable memory . Transfer Size (Bytes) Bandwidth(MB/s) 33554432 2150.8 Quick Mode Device to Host Bandwidth for Pageable memory . Transfer Size (Bytes) Bandwidth(MB/s) 33554432 1268.2 Quick Mode Device to Device Bandwidth
the Device to Device bandwidth just hangs…
the same on the “valid/working” device works just fine…