Millions of threads kernel launch

Hello,

I am having a kernel which can be launched by millions of threads at sometimes. Do you know if the performance (time execution) will be better if i decide to launch by smaller group of threads?

Thanks