Have to justify the use of the Tesla GPU line for a client, and for my use case the main compelling argument is avoiding ECC errors.
Have read through the Google results such as this;
and seen the counterpoint argument (for the simulation case) made by Amber;
Assuming that a GTX Titan X will not be running 24/7, rather in short batches 5 days a week, how likely are ECC errors likely to cause a serious issue?
Even if this statement is true;
about 5 single bit errors in 8 Gigabytes of RAM per hour using the top-end error rate
the probability that the specific memory affected by the errors is used for a result is quite small.
I am not minimizing the risk associated with non-ECC corrected memory, rather looking for recent resources from which I can generate some reasonable prediction of the occurrence of ECC related errors in an image processing pipeline.
I know njuffa has seen ECC errors occur with some frequency.
Anyone out there run into ECC errors with the GTX Titan X or any Maxwell GPU?
Is there a diagnostic method via software which can be used to either detect an ECC error or to determine the probability for a given set of memory that an ECC error may occur in a given timeframe?
Is DDR5 RAM any more or less likely to be affected than DDR4 or DDR3 CPU memory?