Code not working in non-emulation mode Motion Estimation in H264 accelerated by CUDA


I am trying to bring some Motion Estimation for H264 encoder code into GPU through CUDA.
I have made a simple example that estimate motion vectors of macroblocks from 2 fake frames.

My code is working fine in emulation mode, but not in “full speed” mode : the computation is very fast and gives me wrong results.
My Visual Studio project is attached.

Do you have an idea of what is wrong ?

Thanks in advance. (6.79 KB)