Hi,
can anybody help me solve the segmentation fault…
I am using NVIDIA’s Quadro FX5800.
I am trying to make some changes in SDK’S Binomial Options.
I need to increase the number of options. here number of options = number of blocks.
so I tried launching my kernel in this way:
[codebox]nt rootOptN = sqrt(optN);
dim3 theGrid(rootOptN, (optN+rootOptN-1)/rootOptN);
binomialOptionsKernel<<<theGrid, CACHE_SIZE>>>(optN);[/codebox]
In the above code I can have “theGrid” upto 161000. If I try increasing the value of “theGrid” ,I am getting "Segmentation fault’.
My kernel function is called in this way:
[codebox]static global void binomialOptionsKernel(const unsigned int optN){
__shared__ real callA[CACHE_SIZE];
__shared__ real callB[CACHE_SIZE];
const int BlockIndex = blockIdx.y * gridDim.x + blockIdx.x;
//const int ThreadIndex = BlockIndex * blockDim.x + threadIdx.x;
//Global memory frame for current option (thread block)
if (BlockIndex >= optN) {
return;
}
real *const d_Call = &d_CallBuffer[BlockIndex* (NUM_STEPS + 16)];
const int tid = threadIdx.x;
//const int thid= blockIdx.x*blockDim.x+threadIdx.x;
// const int bid = gridDim.x*blockDim.x;
//const int bd = blockDim.x*Grid;
//const int bd = blockDim.xgridDim.xgridDim.y;
const real S = d_OptionData[BlockIndex].S;
const real X = d_OptionData[BlockIndex].X;
const real vDt = d_OptionData[BlockIndex].vDt;
const real puByDf = d_OptionData[BlockIndex].puByDf;
const real pdByDf = d_OptionData[BlockIndex].pdByDf;
// for(int thid=blockIdx.x*blockDim.x+threadIdx.x; thid <=bd ; thid += bid){
//Compute values at expiry date
for(int i = tid; i <= NUM_STEPS; i += CACHE_SIZE)
d_Call[i] = expiryCallValue(S, X, vDt, i);
//Walk down binomial tree
//So double-buffer and synchronize to avoid read-after-write hazards.
for(int i = NUM_STEPS; i > 0; i -= CACHE_DELTA)
for(int c_base = 0; c_base < i; c_base += CACHE_STEP){
//Start and end positions within shared memory cache
int c_start = min(CACHE_SIZE - 1, i - c_base);
int c_end = c_start - CACHE_DELTA;
//Read data(with apron) to shared memory
__syncthreads();
if(tid <= c_start)
callA[tid] = d_Call[c_base + tid];
//Calculations within shared memory
for(int k = c_start - 1; k >= c_end;){
//Compute discounted expected value
__syncthreads();
if(tid <= k)
callB[tid] = puByDf * callA[tid + 1] + pdByDf * callA[tid];
k--;
//Compute discounted expected value
__syncthreads();
if(tid <= k)
callA[tid] = puByDf * callB[tid + 1] + pdByDf * callB[tid];
k--;
}
//Flush shared memory cache
__syncthreads();
if(tid <= c_end)
d_Call[c_base + tid] = callA[tid];
}
//Write the value at the top of the tree to destination buffer
if(threadIdx.x == 0) d_CallValue[BlockIndex] = (float)callA[0];
}[/codebox]
Can anybody help me find a solution for this…
Thanks in advance.