I have a 4-socket SGI system with two Tesla M2090 cards which used to work fine (with Ubuntu 12.04) until a while ago. Then all of a sudden the system started hanging when the GPUs are used. I could not find out when exactly this started happening and what modification in the system could cause it. I have now upgraded the system to Ubuntu 14.04 and installed Cuda 7.0 (tried both the .deb package or the runfile install) but still no success. I can execute one of the samples (say, matrixMul) once but upon the second execution the sample hangs. The process shows as R in top but actually consumes 0% of the GPU and there’s no way I can kill it. I can still do stuff in another console but little by little the whole system becomes unusable an the only option is to reboot it.
Has anybody come across the same problem or does anybody have an idea how to solve it?