__assert_fail - can it be documented/publicly exposed?


We are in need of device assert. CUDA documents “assert” as a function, but this is not true. “assert” is a macro that is only active when NDEBUG is not defined.

We want to be in full control and define our own macro to control whether the assert is enabled or not, so we cannot use the public “assert” function.

Looking at your source code, we find that your assert macro calls the “__assert_fail” device function, which is the one responsible for actually asserting. We would like to use this function in our code instead of “assert”.

  • Can this function be made publicly available and documented?
  • If not, can you give some hint on how to implement such a function?

We have tried using printf and __trap(). However printf crashes our kernels due to “too many resources for launch”. This doesn’t happen with your __assert_fail function, which also prints a lot more context (threadIdx, blockIdx), which is awesome. So it must be doing some magic things inside. Could you share what’s the magic inside, so we can reproduce it?



I’ll share my solution here in case someone else comes across the same issue. It would be good to get feedback from Nvidia.

The key enabler is to mark the function as __noinline__. This makes sure that the register pressure is not increased.

Complete example:

__device__ __noinline__ assertFail(......)
    printf("......", ....);