Use of bitonic sort of CUDA

sorry for asking such a general question but I will try to make it simple…
I am using the bitonic sort function in a CUDA program but instead of passing integer to the swap function I am passing a struct and i maintained the same code with no modification…the problem is that in some inputs the last few elements are not sorted properly…is the problem from the swap function because it takes a struct instead of integer???
I would appreciate any answer…

It is impossible to say what goes wrong without more information. Are you talking about the bitonic sort from the Cuda SDK? Are your input sizes powers of 2?

Thanx alot for you reply…I know the information I provided are few but the kernel code does another functionality…
anyway when having the data length not a power of 2 it does not sort well but when having the data length a power of 2 it sometimes does and sometimes doesn’t…
when I had the data length 4 and 8 sorting was ok but I made it 16, it was like
2 3 4 5 3 4 7 9 5 6 6 8 14 13 7 11
but can I say that the problem is not because i pass a struct address in the swap function like this:
device inline void swap(TermNodeInfo & a, TermNodeInfo & b )
where TermNodeInfo is a struct of two integers
another question :)
isn’t there another algorithm for parallel sort which doesn’t have this restriction of having the data length a power of 2
thanx in advance

And yes I am using the CUDA SDK bitonic sort

Could it be that you don’t allocate enough memory? You can modify bitonic sort so it sorts arbitrary sizes. The most simple (but not most efficient) way is to add padding to the data you want to sort, i.e. choose the next larger power of 2 as size and fill the spare entries at the last positions with the maximum key.

double post