Hi,
I am using NVIDIA’s Quadro FX 5800. I am trying to find the GPU execution time for one core and 240 cores for cuda SDK’s Monte Carlo program . The problem is , in Monte carlo the grid size= number of options. But i want to run this program with grid size=1 and block size=1 where one block can handle multiple options.
I am calling the kernel function like this
[codebox] MonteCarloOneBlockPerOption<<optionCount, THREAD_N>>>(
plan->d_Samples,
plan->pathN
);[/codebox]
The kernel function is
[codebox]static global void MonteCarloOneBlockPerOption(
float *d_Samples,
int pathN
){
const int SUM_N = THREAD_N;
__shared__ real s_SumCall[SUM_N];
__shared__ real s_Sum2Call[SUM_N];
const int optionIndex = blockIdx.x;
const real S = d_OptionData[optionIndex].S;
const real X = d_OptionData[optionIndex].X;
const real MuByT = d_OptionData[optionIndex].MuByT;
const real VBySqrtT = d_OptionData[optionIndex].VBySqrtT;
//Cycle through the entire samples array:
//derive end stock price for each path
//accumulate partial integrals into intermediate shared memory buffer
for(int iSum = threadIdx.x; iSum < SUM_N; iSum += blockDim.x){
__TOptionValue sumCall = {0, 0};
for(int i = iSum; i < pathN; i += SUM_N){
real r = d_Samples[i];
real callValue = endCallValue(S, X, r, MuByT, VBySqrtT);
sumCall.Expected += callValue;
sumCall.Confidence += callValue * callValue;
}
s_SumCall[iSum] = sumCall.Expected;
s_Sum2Call[iSum] = sumCall.Confidence;
}
//Reduce shared memory accumulators
//and write final result to global memory
sumReduce<real, SUM_N, THREAD_N>(s_SumCall, s_Sum2Call);
if(threadIdx.x == 0){
__TOptionValue t = {s_SumCall[0], s_Sum2Call[0]};
d_CallValue[optionIndex] = t;
}
}
[/codebox]
can any one help me .
Thank you in advance.