memory protection

The Kepler architecture white paper says:

Does this apply to all GPUs, or only Tesla-branded ones? (I realize that only the latter supports ECC in DRAM)

In fact there are two “kepler” white papers, one which was written with GK104 in view:

[url]http://www.geforce.com/Active/en_US/en_US/pdf/GeForce-GTX-680-Whitepaper-FINAL.pdf[/url]

and the other which was written with GK110 in view:

[url]Page Not Found | NVIDIA

Your quotation appears to be taken from the GK110 whitepaper.

ECC correction only applies to Tesla and some Quadro GPUs. The specifics of “internal” ECC protection (i.e. on register file, etc.) vs. “external” (i.e. ECC protection on the off-chip DRAM array) also vary from GPU to GPU.

Tesla boards built from the GK110 GPU and its variants, such as K20, K20x, and K40, include the possibility for both “internal” and “external” ECC. Tesla boards built from the GK104 GPU (e.g. K10) offer what is primarily “external” ECC, and in fact the GK104 whitepaper makes no mention of ECC. Note the sentence in the K10 board specification:

“On the Tesla K10 the ECC protection is for DRAM only”

Any GPU which has ECC protection either turned off or disabled/unavailable, has the internal ECC protection, if present, turned off/disabled.

Yes, I was asking about GK110. Thanks for the clarification.

Does Nvidia say that in any of its documents? (We are talking about “internal” memory, i.e. registers, cache and shared memory) The quotation from the white paper does not seem to be restricted to Teslas only.

(Side note: I’m surprised that any system that supports external ECC, would not protect its internal memory. I believe most, if not all, consumer CPUs, including those that do not support ECC RAM, do protect their own caches)

[url]http://www.nvidia.com/object/why-choose-tesla.html[/url]

“Tesla exclusive features include: ECC protection for uncompromised data reliability For memories inside the GPU and the external GDDR5 memory”

The above statement is a marketing statement. Not all Tesla products have exactly the same feature content, and this is true for ECC, but all do include ECC protection (of the external memories, at least).

[url]Page Not Found | NVIDIA

"Quadro is the only professional graphics solution with ECC memory " Again, it’s a marketing statement. Not all Quadro products include ECC protection.

For more specific product-by-product tabulation, you can review the specific board specifications or product specifications/data sheets.

You won’t find mention of ECC on any GeForce product.

All Tesla (in some fashion). Some Quadro. No GeForce.

You can also run nvidia-smi -a on any product to get a quick view of its ECC content. Internal and external error categories and tabulations are broken out separately. Products like K10, even with ECC enabled, will say “N/A” on the internal category tabulations. GeForce products will say “N/A” on all ECC categories, and will not support the option to enable it.

I see. Thanks!