Clarify thread safety of __assertfail race conditions

A description of __assertfail is given here: PTX Interoperability :: CUDA Toolkit Documentation

However the thread safety guarantees of __assertfail are not documented as far as I can tell.

My understanding is that __assertfail will abort the kernel in which it is called via the cassert macro. Then the driver API will manage flushing the error message to stderr via the host thread that submitted the kernel. Is this correct?

My question here is what happens if multiple host threads (that share a device and have the same CUcontext set) submit kernels where __assertfail is called simultaneously in the sense that:
host thread 1 receives an assert message due to a call to __assertfail in kernel 1, then host thread 2 also receives an assert message due to __assertfail in kernel 2 before host thread 1 has output the error message and terminated the program.

Is it guaranteed that at least one of the messages associated with one of the asserts will be successfully output?

Thanks

The printing is handled by the CUDA driver, not by any threads you spin up. I think that should be self-evident.

I wouldn’t expect any trouble with printout as long as you use proper synchronization after CUDA device activity, before program termination.