I’m trying to get some benchmarking numbers out of a test port of a large montecarlo simulation our group has developed. These numbers will directly influence our purchasing decision, so you can image my surprise when running the profiler, I noticed that my timings of kernel calls indicated an extra factor of 2 speed increase over what my program normally does when run by itself. I need to know if this is real and why this is happening and I unfortunately dont have a lot of time to investigate myself, nor can I check if the program results are still correct. Can any nvidia people explain this as a known possibility and tell me if it represents real potential performance?
I report my numbers on monday morning, and given where our benchmarking is sitting now, it could determine whether we invest in a Tesla-based cluster or a traditional cluster (!)
Thank you for any help you can provide!