Question on M5000

Is anyone in possession of a Quadro M5000 card? The question is, what is the deviceProp.ECCEnabled boolean set to for this card? It has ECC for the main memory, but not for registers and cache, so I am wondering if it is categorized as having ECC or not within the cudaGetDeviceProperties call.

Thanks!

Do any GPUs have ECC protection for registers and caches? Not to my recollection; they are only protected by parity bits from what I recall (corrections welcome!). In other words, ECC is normally considered as a property of the on-board memory of the GPU. That said, I do not know whether the Quadro M5000 supports ECC or not.

Purely out of interest: What is the significance of the value of ECCEnabled returned by cudaGetDeviceProperties to your software?

Tesla cards specifically state that the ECC protection covers registers and caches. See e.g. http://www.nvidia.com/content/PDF/kepler/Tesla-K40-Active-Board-Spec-BD-06949-001_v03.pdf second paragraph of the overview.

On the other hand, Quadro cards do not use the same language. See e.g. http://h20195.www2.hp.com/v2/getpdf.aspx/c05040364.pdf?ver=1 (couldnt find the equivalent on nvidia’s page) where they mention “8GB GDDR5 ultra fast ECC memory supporting a wide memory path to minimize memory access performance penalties”.

As for the little story behind checking for that flag… we develop medical software and we only flag ECC enabled graphics card as being “valid” within a system, and want our CI/CD testing to be done on hardware that doesn’t require a code change (disabling that check). We’re looking at non Tesla options for our testing hardware (for cost reasons), but if the Quadro doesnt answer “yes” to the ECC flag then we might as well go with GeForces and live with disabling that check within the software during development.

Thanks for the correction regarding ECC protection in modern GPUs. Since documentation does not have a unit test, it can easily be incomplete or inconsistent, at times even plain incorrect. I note that the quoted text about the Quadro does not preclude ECC protection on registers and caches, it is simply silent on the issue, possibly because this is not an important detail for typical Quadro customers.

If you must ensure that full ECC protection is available in identical fashion on both Tesla and Quadro, I would suggest getting in touch directly with NVIDIA through dedicated Tesla support channels.

I certainly appreciate why such assurances would be important in the context of GPUs used in medical devices: in the context of medical imaging I have stated before that ECC is desired because one would never want to have to worry whether a discolored spot in an image is an artifact caused by a flipped bit in GPU memory, or a cancerous lesion.