Will calling a kernel from a kernel help on performance?

CUDA 12 has introduced new CDP functionality. This may also be of interest.