writing to global memory in kernel can each thread write different amount of data into an array?

I have a question about writing results:

There is an allocated array in global memory space. Each thread will do some calculation and then writes a few numbers in that array, but the amount of numbers each thread will generate is not the same. So i thought the writing to the memory should be serialized and there should be a global index to indicate the writing place. How can I do that? Or is it possible to do that in the kernel?

Or each thread should keep its result in another global memory space → exit the kernel → use cuda copy to store it the destination array?

Thank you for your help.