Hello there,
I am quite new to CUDA.
I have been trying to implement an algorithm on CUDA.
My input is a float2 matrix.
I want to swap indexes Even indexes (in x axe) are swapped to the upper part of the matrix. Odd indexes, are swapped to the lower part of the matrix.
I have a few problem with my implementation.
Which way should i go ?
→ I apply to kernels, one for the even indexes, one for the odd ones. I make it coalesced by letting participate one thread on two.
→ I try to implement it in a single kernel. With a if(threadIdx.x % 2 ==0) for even, and the else part for odd ? Would it be coalesced this way ?
Last thing to know, the output is written in the reading order so that the writing in global memory would be coalesced.
Sorry for my english and my non-precise explainations…
Thanks for your help. :)