diffrences in results between windows 32 & 64 executable

Creating and running 32 and 64 executables on the same machine and running with cuda 5.5 and VS 2010 when I compare the results between the two executables I’m getting different results.

Same hardware, same OS yet the two exe version give different numerical results from the cuda card?

(1) Make sure the exact same source data is being passed from the host to the device
(2) Make sure the host code does not access uninitialized data and has out-of-bounds accesses
(3) Check the return status of every CUDA API call and every kernel launch
(4) Run the app under cuda-memcheck, suing both the bounds checker and the race checker