I am using compute sanitizer (CUDA 12.9) to monitor a process running a TensorRT inference loop that (after several days of runtime) crashes and causes a driver reset. While this might be a problem with the driver itself, I am trying my luck with the sanitizer. If I intentionally inject some GPU related errors (memory out-of-bounds writes or similar stuff) the sanitizer outputs them nicely, so I know it works and catches the problems. However, if I run it with my original process, it eventually outputs (after hours or days of runtime):
“Error in printing record”
Huh, what kind of error can cause the sanitizer to output this very message?
Thanks.