I wish to make sure Nvidia is aware of this problem in the hope that it will be fixed at Kepler GTX680.
The problem is being discussed on the TeraChem forum, but the problem is not limited to TeraChem. It always happens at GF110 and there are two reports at GF100. There would seem to be grounds for concern that it will persist into Kepler.
Nvidia would be able to reproduce this problem using TeraChem 1.45 . I believe they have a copy of TeraChem for testing purposes.
Surely the developers of these codes are registered developers and/or have contacts at NVIDIA. Bugs reported via the registered developer login bug submission tool are taken very seriously by NVIDIA. As are ones reported on the forums, but without a minimal repro case or a specific set of instructions (i.e, execute these shell commands), no one can confirm nor deny that a problem exists.
If the problem is truly an overtaxing of the double precision units as has been hypothesized, then shouldn’t a kernel that does nothing but double precision a*b+c a million times per thread trigger it?
The developer’s position is that the chips are already out there and so there is nothing that can be done. My interest is in preventing this fault from being perpetuated into the Kepler GTX680, which puts me in an awkward position. Would Nvidia listen if someone who doesn’t have access to the source code were to report first-hand experience of the fault through this forum? I am told the fault is robustly reproducible. NVidia have the program and could test it themselves.
Sounds like a reasonable first test. If I had a GTX580 I would try this.
I am not sure if they care too much for the bugs. I had reported a couple of bugs… They just kept saying next release will fix etc etc… Nothing happened… And after sometime, I could not even find my bug reports in their portal…Sigh…