Hi,

can anybody help me solve the segmentation fault…

I am using NVIDIA’s Quadro FX5800.

I am trying to make some changes in SDK’S Binomial Options.

I need to increase the number of options. here number of options = number of blocks.

so I tried launching my kernel in this way:

[codebox]nt rootOptN = sqrt(optN);

dim3 theGrid(rootOptN, (optN+rootOptN-1)/rootOptN);

binomialOptionsKernel<<<theGrid, CACHE_SIZE>>>(optN);[/codebox]

In the above code I can have “theGrid” upto 161000. If I try increasing the value of “theGrid” ,I am getting "Segmentation fault’.

My kernel function is called in this way:

[codebox]static **global** void binomialOptionsKernel(const unsigned int optN){

```
__shared__ real callA[CACHE_SIZE];
__shared__ real callB[CACHE_SIZE];
```

const int BlockIndex = blockIdx.y * gridDim.x + blockIdx.x;

//const int ThreadIndex = BlockIndex * blockDim.x + threadIdx.x;

```
//Global memory frame for current option (thread block)
if (BlockIndex >= optN) {
return;
}
real *const d_Call = &d_CallBuffer[BlockIndex* (NUM_STEPS + 16)];
```

const int tid = threadIdx.x;

```
//const int thid= blockIdx.x*blockDim.x+threadIdx.x;
```

// const int bid = gridDim.x*blockDim.x;

//const int bd = blockDim.x*Grid;

//const int bd = blockDim.x*gridDim.x*gridDim.y;

```
const real S = d_OptionData[BlockIndex].S;
const real X = d_OptionData[BlockIndex].X;
const real vDt = d_OptionData[BlockIndex].vDt;
const real puByDf = d_OptionData[BlockIndex].puByDf;
const real pdByDf = d_OptionData[BlockIndex].pdByDf;
```

// for(int thid=blockIdx.x*blockDim.x+threadIdx.x; thid <=bd ; thid += bid){

//Compute values at expiry date

```
for(int i = tid; i <= NUM_STEPS; i += CACHE_SIZE)
d_Call[i] = expiryCallValue(S, X, vDt, i);
```

//Walk down binomial tree

```
//So double-buffer and synchronize to avoid read-after-write hazards.
for(int i = NUM_STEPS; i > 0; i -= CACHE_DELTA)
for(int c_base = 0; c_base < i; c_base += CACHE_STEP){
//Start and end positions within shared memory cache
int c_start = min(CACHE_SIZE - 1, i - c_base);
int c_end = c_start - CACHE_DELTA;
```

//Read data(with apron) to shared memory

```
__syncthreads();
if(tid <= c_start)
callA[tid] = d_Call[c_base + tid];
```

//Calculations within shared memory

```
for(int k = c_start - 1; k >= c_end;){
//Compute discounted expected value
__syncthreads();
if(tid <= k)
callB[tid] = puByDf * callA[tid + 1] + pdByDf * callA[tid];
k--;
```

//Compute discounted expected value

```
__syncthreads();
if(tid <= k)
callA[tid] = puByDf * callB[tid + 1] + pdByDf * callB[tid];
k--;
}
```

//Flush shared memory cache

```
__syncthreads();
if(tid <= c_end)
d_Call[c_base + tid] = callA[tid];
}
```

//Write the value at the top of the tree to destination buffer

```
if(threadIdx.x == 0) d_CallValue[BlockIndex] = (float)callA[0];
```

}[/codebox]

Can anybody help me find a solution for this…

Thanks in advance.