Does nvcc support tail call optimization in dynamic parallelism?

Under the CUDA Programming Guide section C. “Nesting and Synchronization Depth”, it is mentioned:

“An optimization is permitted where the system detects that it need not reserve space for the parent’s state in cases where the parent kernel never calls cudaDeviceSynchronize(). In this case, because explicit parent/child synchronization never occurs, the memory footprint required for a program will be much less than the conservative maximum. Such a program could specify a shallower maximum synchronization depth to avoid over-allocation of backing store”

Does this mean that the compiler supports tail recursion under dynamic parallelism? For example, if I have a kernel that recursively calls itself:

__global__ void kernel(int layer){

Launched in the host:


If the tail recursion can be optimized by the compiler, is it still restricted by the maximum recursion level of 24, since “parent/child synchronization never occurs” ? If it is not restricted, how can the optimization be turned on by the compiler?