Black-Scholes performance on G80 how many pricings/second can be done?

vpodlozhnyuk · June 25, 2007, 8:33pm

4.7 GOpt/s was given for GeForce8800GTX, counting each input index as two options, since both call and put is calculated. Sorry for the confusion.

BlackScholes CUDA SDK sample currently reports effective 4.45 Gopt/s, which is slightly lower than 4.7 Gopt/s, but the primary goal of the sample is to be a simple demonstration of a streaming application.

As for optimization, here are some general recommendations:

Warp occupancy can be increased with the help of -po maxrregcount=<…> Depending on how many threads per block you shoot for, optimal register count may vary, so by default the compiler doesn’t try to minimize local register count as much as it can. But with this option the compiler is forced to stay on the budget. However be warned that too low maxrregcount can force local memory spills, in many cases slowing things down in spite of increased warp occupancy.
Synchronization overhead and timing precision, depending on OS/CPU/chipset, can influence observed performance. So doing multiple iterations, measuring total time and dividing by the number of iterations should give more precise results. This is especially important in Linux, where depending on the kernel build version timer resolution (gettimeofday() sys. call) can be as low as 100Hz (10ms)

Concerning timing, the profiler (enabled with environment variable CUDA_PROFILE=1) reports kernel and memcpy times using high-precision internal GPU timers (in addition to CPU ones) without any need in cudaThreadSynchronize() call in user programs. There are plans to expose these timers in CUDA API.

Topic		Replies	Views
BlackScholes test FAILED CUDA Programming and Performance	5	7881	July 30, 2007
Black Scholes on CUDA Question on sample code CUDA Programming and Performance	0	13429	October 26, 2011
Differences in Monte Carlo Option Pricing Multi GPU code sample on CUDA 3.2 and CUDA 4.0 CUDA Programming and Performance	1	6456	June 24, 2011
GPU 4 Finance Some results. CUDA Programming and Performance	25	14094	June 19, 2011
Binomial Tree - Need for WARPS... CUDA Programming and Performance	20	25753	January 31, 2008
G80 - 14 clocks per Instruction ? CUDA Programming and Performance	4	3257	March 4, 2008
Option Pricing on GPU - The Trinomial Tree Financial Algorithms CUDA Programming and Performance	5	2953	March 17, 2008
How do you know if the performance of your gpu card is correct? CUDA Programming and Performance	2	1503	July 16, 2008
Tesla1060 vs GS8600 CUDA Programming and Performance	3	1616	March 11, 2010
SDK BlackScholes thread indexing question about kernel launch config CUDA Programming and Performance	1	907	April 19, 2010

Black-Scholes performance on G80 how many pricings/second can be done?

Related topics