Bitnic sort, NVidia example

I know there are lot of sorting algorithms developed for GPU and CUDA and I tried to understand Bitonic Sort. I cannot find why algorithm is not working for arrays bigger than 16777216. Memory is enough, I debug it, but can’t find solution.

Any help will be appreciated.