Hi!
I recently got my hands on a monster machine:
2 hexacore Xeons, 8 GTX 460, lots of ram in one rackable box.
There are 8 regular PCI-Slots, no riser-cards or anything like that.
The system is really cool but sounds like a starfighter.
OS: OpenSuse 11.2, 64 bit.
Cuda-SDK 3.2 (final) has been installed on it.
deviceQuery runs ok giving all 8 gpus.
MonteCarloMultiGPU fails on 8 gpus (L1 norm: NAN)
cuda-memcheck MonteCarloMultiGPU
says, that MonteCarloReduce() execution failed in MonteCarlo_kernel.cuh(265)
simpleMultiGPU fails on 8 gpus (GPU sum: inf)
cuda-memcheck simpleMultiGPU
says:
Invalid global write of size 4 at 0x000000f0 in reduceKernel
by thread (115,0,0) in block (31,0)
Adress 0xf801017dcc is out of bounds
dmsg spits out:
simpleMultiGPU[2629]: segfault at 40 ip 00007f65946f000 error 4 in libcudart.so.3.2.16[7f65751df000+4b000]
looks not good to me.
Although eigenvalues seems to work on all 8 gpus (tested with all 8 -device= options)
cuda-memcheck BlackScholes -device=2 complains about an invalid global write of size 4 (different threadIds and blockIds).
cuda-memcheck BlackScholes -device=5 complains about an invalid global write of size 4 (different threadIds and blockIds).
cuda-memcheck BlackScholes -device=5 complains about an invalid global write of size 4 (different threadIds and blockIds).
cuda-memcheck BlackScholes -device=3 yields an unspedified launch failure in BlackScholes.cu(171)
cuda-memcheck BlackScholes -device=4 yields an unspedified launch failure in BlackScholes.cu(171)
cuda-memcheck BlackScholes -device=0 runs ok.
cuda-memcheck BlackScholes -device=1 runs ok.
cuda-memcheck BlackScholes -device=6 runs ok.
cuda-memcheck BlackScholes -device=7 runs ok.
nvidia-smi -d | fgrep Temp says, that all 8 gpu’s are below 30 C (basically they are idle).
dmegs tells me, that the 8 cards share 4 IRQs (two cards use one IRQ together).
Any ideas where could be the problem?
Is it hardware, OS, driver, software or anything else?
Thanks for any hints in advance
Martin