cufft batch vs single fft performance

is there any performance difference for batch of 6*2^18fft vs loop of 6 single 2^18 fft?

You’ll typically see much better performance with batched mode since it can launch fewer kernels and potentially balance the threads better.