Gave it a try, changed timing code to use gettimeofday() on linux, time result is in seconds.
Running a GTX Titan on linux Mint 16
Expected iterations CPU= 10354250000
Expected iterations GPU= 113896750000
Running CPU implementation..
CPU solution timing: 126.01
CPU best value= 254 , point indexes ( 369 , 301 , 185 ).
CUDA timing: 0.630161
GPU best value= 254 , point indexes ( 369 , 301 , 185 ).
Success. GPU results match CPU results!. GPU was 199.966 faster than 3.9 ghz CPU.
And with Double Precision mode turned on:
Expected iterations CPU= 10354250000
Expected iterations GPU= 113896750000
Running CPU implementation..
CPU solution timing: 125.935
CPU best value= 240 , point indexes ( 300 , 255 , 92 ).
CUDA timing: 0.671099
GPU best value= 240 , point indexes ( 300 , 255 , 92 ).
Success. GPU results match CPU results!. GPU was 187.655 faster that 3.9 ghz CPU.
Wow, changing some optimization flags changed things quite a bit.
Compiled with
nvcc -O3 --use_fast_math -m64 -v -Xcompiler -march=native -Xcompiler -ffast-math -gencode arch=compute_35,code=sm_35 -odir "src" -M -o "EXP3.d" "EXP3.cu"
I get (single precision mode):
Expected iterations CPU= 10354250000
Expected iterations GPU= 113896750000
Running CPU implementation..
CPU solution timing: 83.5187
CPU best value= 254 , point indexes ( 343 , 229 , 194 ).
CUDA timing: 0.313176
GPU best value= 254 , point indexes ( 343 , 229 , 194 ).
Success. GPU results match CPU results!. GPU was 266.683 faster that 3.9 ghz CPU.