problem with the Bitonic sort sample project

I am new to CUDA. I am trying to learn CUDA with the sample projected in SDK. But when I study the bitonic sorting algorithm, I find that the code does not always work. Sometimes the sorting result is wrong when I made some changes to the number of elements from 256 to some smaller numbers. Does somebody has the same experience? If there are some bugs with the code, can somebody know how to fix them. Thanks!