I get a unknown error (with code 30) after execute thousands of kernels.
I put traces in the code and I saw that the error appears when I try to execute ‘cudaMalloc’ but I don’t understand this because I used ‘cudaMemGetInfo’ and all the memory was free.
Also I am executing my simulations with ‘cuda-memcheck’ and I can’t see any error. Finally this error provokes a crash of device and I need to reset the workstation.
“internal error” means something unexpected happened inside the program (here: cuda-memcheck), and that there is no public information about this. Mostly these are debugging hints for the developers.
Internal errors should not occur, and any instances encountered are usually worth reporting via the normal bug-reporting channels but you will need to submit a repro case with the bug report, which may be non-trivial in a case like this.