SDK BlackScholes thread indexing question about kernel launch config


In the SDK code example “BlackScholes” (2.1 release)

the kernel calls look like this:

BlackScholesGPU<<<480, 128>>>( parameters… )

and the indexing calculations within the kernle look like this:

//Thread index
const int tid = blockDim.x * blockIdx.x + threadIdx.x;

//Total number of threads in execution grid
const int THREAD_N = blockDim.x * gridDim.x;

//No matter how small is execution grid or how large OptN is,
//exactly OptN indices will be processed with perfect memory coalescing

for(int opt = tid; opt < optN; opt += THREAD_N)
    BlackScholesBodyGPU( <-- this is a func in the kernel

it gets called with arg = 
(blockDim.x * blockIdx.x + threadIdx.x) PLUS  (blockDim.x * gridDim.x)

“480” doesn’t appear anywhere else in the code, in any of the files, I think.

It’s suggestive to me personally because I have two Teslas, meaning I have 480
processor cores; I could imagine arranging the processing so that each core
gets 128 threads.

Is it just a convenient number less that 512 threads/per/block? Is it twice 240 cores
such that each core goes thru 2 blocks of 128?

Minny Thank - youse


The 480 number is just a programmer selected value less than the 512 limit which you would usually optimize based on register and shared memory usage of the kernel, and benchmarking.

Having two Teslas has nothing to do with it. In fact, CUDA won’t automagically be using both Tesla cards unless you are using some sort of explicitly programmed multi-gpu scheme in user code. The API provides no such facilities. I haven’t looked at the SDK for a long time, but I am 99% sure that the BlackScholes example isn’t multi-gpu capable.