I’m playing with CUDA in order to find out if it’s suitable for my needs. Most of projects of SDK work just fine on my hardware (8500 GT), but BlackScholes options pricing sample constantly produces the following output:
Initializing data…
…allocating CPU memory for options.
…allocating GPU memory for options.
…generating input data in CPU mem.
…copying input data to GPU mem.
Data init done.
Executing GPU kernel…
Options count : 40000000
BlackScholesGPU() time: 1017.946106 msec
Options per second : 3.929481E+007
Reading back GPU results…
Checking the results…
…running CPU calculations.
Comparing the results…
L1 norm: 1.000000E+000
Max absolute error: 9.581588E+001
TEST FAILED
Shutting down…
…releasing GPU memory.
…releasing CPU memory.
Shutdown done.
Looks like the accuracy of computations is bad in case of this sample … Why can this happen ? Replacing of fast routines (__expf, __logf) with normal ones (expf, logf) does not help.
Reduce the number of options you are pricing.
Right now you are pricing 40M and you are allocating 5 arrays for each one.
How much memory do you have on your card?
I’ve reduced the OPT_N to 10000000, test is passed now.
Test code allocates 200000000 bytes of data (10000000 (OPT_N) * 4 (sizeof(float) * 5 (number of cudaMalloc calls), this is obviously less than 256MB and everything works fine.
But why the same code with OPT_N = 20000000 does not inform the programmer that there is no device memory for the task ? cudaMalloc calls still return OK status …