Hi Everyone,
I have a Monte Carlo code for option pricing which is similar to the example ‘MonteCarloMultiGpu’ in CUDA SDK 5.5. The difference is my code uses a time-step of 250 (total computations=pathsoptions250). As a result the performance is poor even on K40.
Could you please suggest how can I parallelize the time-step loop? I have provided the code for reference:
Setup_Kernel() //Setting up states (=paths) for Random Number generations
Random_Number_Kernel() //Generating 250*Paths Random Numbers and storing in global memory
Compute_kernel() //doing the computations for N paths and M options
{
for(int numSample=threadIdx.x; numSample < NUM_SAMPLES; numSample+=blockIdx.x)
{
getPath(path, numSample, random_Numbers, optionStructs[optionIndex]);
price[GLobal_ID] = path[250-1];
}
}
device void getPath(dataType* path, int numSample, dataType* random_Numbers, MCStruct optionStructs)
{
path[0] = process(optionStruct);
for (size_t i=1; i<250; i++)
{
dataType t = i*dt;
int index = (i-1)*250 + numSample;
dataType randVal =(random_Numbers[index]);
path[i] = process(t, path[i-1], randVal, optionStruct);
}
}
Thanks in advance.