Dynamic Parallelism and cudaGetLastError()

I seem to be having problems with dynamic parallelism. Specifically, it seems that the dynamic kernels will fail occasionally if launched from too many host threads.

I’ve been able to confirm that cudaGetLastError() does return an error. But it doesn’t seem to be an error type I’m aware of, I just get error “”.

Are there errors specific to dynamic parallelism? Or does anyone have any theories on why some of my dynamic kernels would be failing?

Okay, answering my own question, the error is cudaErrorLaunchPendingCountExceeded, which means that I am
exceeding the upper bound of outstanding launches that can be issued to the device runtime.