Intersects two sorted lists

There is a paper “Parallel Search on Video cards”, saying that the P-Ary Search could outperform the binary-search based algorithm, I have followed the paper’s idea and re-implement the P-Ary search. But i do not get that good results. Could any one help me improve my code or give me advice.

By the way: I am preparing the CIKM10 Conference, if any one have interest, we could co-write the paper about this topic.
And I have paper "Efficient Lists Intersection by CPU-GPU Cooperative Computing" accepted by the Workshop LSPP of IPDPS10 conference.

