Hi,
I am running CUDA on a Tesla V100 that contains an HBM2 memory that supports Single-Error Correcting Double-Error Detecting (SECDED).
My question is:
In order to detect the error, the ECC memory has to be enabled or it is enabled by default?
Additionally, how will the system inform me about the error? Will it stop the execution and just inform me or it will automatically solve the error and it will continue the execution?
Thanks in advance!