Need advice about coalescence

Hello there,

I am quite new to CUDA.
I have been trying to implement an algorithm on CUDA.

My input is a float2 matrix.

I want to swap indexes Even indexes (in x axe) are swapped to the upper part of the matrix. Odd indexes, are swapped to the lower part of the matrix.

I have a few problem with my implementation.

Which way should i go ?
-> I apply to kernels, one for the even indexes, one for the odd ones. I make it coalesced by letting participate one thread on two.

-> I try to implement it in a single kernel. With a if(threadIdx.x % 2 ==0) for even, and the else part for odd ? Would it be coalesced this way ?

Last thing to know, the output is written in the reading order so that the writing in global memory would be coalesced.

Sorry for my english and my non-precise explainations…

Thanks for your help. :)