Dynamic parallelism vs flat kernels

Generally speaking, when can dynamic parallelism out-perform a (well-written) flat Kernel?