I compiled the matrixMul example that is provided using VS2010 C++ express with windows SDK 7.1 into a 64 bit exe, but the problem is that my compiled version is around 11.5 times slower than the include exe. I used the profiler, and the problem appears to be the DRAM utilization. Does any know what would cause this? I’m running windows 7 64 bit.
Included exe:
run time = 4.653 msec
DRAM Utilization = 6.3% (1.73 GB/s)
My compiled exe:
run time = 53.543 msec
DRAM Utilization = 0.5% (152.12 MB/s)
Note: I also compiled a 32 bit version and a similar performance gap between my 32 bit binary and the included 32 bit binary is large.