recursive kernel launches many blocks


if i launched a recursive device function by a global function with for example 4 or 8 blocks with 256 threads, i found that the recursive function just launch 2 block from start of recursion phase to the end .
how can I solve this problem in which at every phase of recursion the same number of blocks in the launching global function remains?