Hi,
I am currently running and timing some code samples, and I noticed that there is a major difference in Monte Carlo Option Pricing using Multi-GPU when run against CUDA 3.2 and CUDA 4.0. Following is the program output when I use CUDA 3.2:
[MonteCarloMultiGPU] starting...
main(): generating input data...
main(): starting 4 host threads...
main(): waiting for GPU results...
Resetting device 3
Resetting device 1
Resetting device 0
Resetting device 2
main(): GPU statistics, threaded
GPU #0
Options : 64
Simulation paths: 262144
Total time (ms.): 1115.595947
Options per sec.: 229.473763
GPU #1
Options : 64
Simulation paths: 262144
Total time (ms.): 14.766000
Options per sec.: 17337.126071
GPU #2
Options : 64
Simulation paths: 262144
Total time (ms.): 568.174988
Options per sec.: 450.565416
GPU #3
Options : 64
Simulation paths: 262144
Total time (ms.): 1677.020996
Options per sec.: 152.651637
main(): comparing Monte Carlo and Black-Scholes results...
Shutting down...
Test Summary...
L1 norm : 2.979117E-06
Average reserve: 384.457409
[MonteCarloMultiGPU] test results...
PASSED
Now when I use CUDA 4.0, I get this:
[MonteCarloMultiGPU] starting...
main(): generating input data...
main(): starting 4 host threads...
main(): waiting for GPU results...
Resetting device 0
Resetting device 3
Resetting device 1
Resetting device 2
main(): GPU statistics, threaded
GPU #0
Options : 64
Simulation paths: 262144
Total time (ms.): 5.523000
Options per sec.: 46351.622481
GPU #1
Options : 64
Simulation paths: 262144
Total time (ms.): 5.845000
Options per sec.: 43798.119622
GPU #2
Options : 64
Simulation paths: 262144
Total time (ms.): 9.681000
Options per sec.: 26443.549887
GPU #3
Options : 64
Simulation paths: 262144
Total time (ms.): 4.173000
Options per sec.: 61346.755010
main(): comparing Monte Carlo and Black-Scholes results...
Shutting down...
Test Summary...
L1 norm : 2.979117E-06
Average reserve: 384.457409
[MonteCarloMultiGPU] test results...
PASSED
The total time field is quite different, and I would want to know the reason of such a divergence, when the total wall clock time remains the same. Any help appreciated.
Thanks,
Sayan