I have an old S1070 attached to a Dell R815 server, running Linux RHEL5 64-bit. Coming back into the office after Christmas vacation, I upgraded to Cuda 5.0 and started running bandwidthTest to make sure the basic pieces were working. It worked okay with --memory=pageable, but when I switched to --memory=pinned, it fails repeatedly. The host to device test runs but the device to host test fails, with runtime error 4 (unspecified launch failure) in cudaDeviceSynchronize().
I installed Cuda 4.0, replacing 5.0, to go back to what I had previously. BandwidthTest runs with --memory=pageable, but --memory=pinned runs once and then fails on a second run after the host to device test during the device to host test. It fails hard enough:
Message from syslogd@ at Mon Jan 7 17:18:36 2013 …
merlin1 kernel: Dazed and confused, but trying to continue
Message from syslogd@ at Mon Jan 7 17:18:36 2013 …
merlin1 kernel: Uhhuh. NMI received for unknown reason 20.
Message from syslogd@ at Mon Jan 7 17:18:36 2013 …
merlin1 kernel: Do you have a strange power saving mode enabled?
that I have to reboot the computer to recover.
Has anyone seen this sort of failure before? This same hardware worked fine with Cuda 4.0 for my ongoing GPU development work in 2012, but I don’t recall running bandwidthTest last year.