CUDA program crashed on release build only.

I’m newbee on cuda programming.
I have modified thrust sort code to perform like std::nth_element() of stl.
It works well on debug mode, but it crashes on release mode.
CUDA Memcheck reports memory invalid read problem on release mode program(and reports no error on debug mode program), and I specified the code, but I didn’t find any problem on that code.
So I want to know why it occurs and how to fix it.
I compiled the code using visual studio 2010 and cuda 5.0.
I attached the modified thrust source code, and test code, and cuda-memcheck dump file.
(The thrust::sort on my attached file perform select algorithm not sort algorithm on float data type)
Thanks in advance.

Modified Thrust Library Code
Test Code
Cuda-Memcheck dump file