HOW divine dimGrid.. dimBlock ?

Hi. everyone…

I’m studying Evolutionary Computation using CUDA
I have 9800GT(14MP, 512MB), INTEL E6300
I have implemented Evolutionary Computation on CUDA
So, It was successful. That was faster than CPU
But, Studying, I have any Questions.

First…
About dimGrid and dimBlock…

I have 14MP → 112 SP;
so, I coded ( dimGrid → N times 14MP ) possibly.
because i knew that MP → BLOCK , 8SP → 8 threads
for example, num of Data = 100
dimGrid = 14, dimBlock = 8

After… I know new information…
Efficient dimBlock is N times 32;
I catched that 32 is warpsize.

Hmm… Then… dimGrid = 4, dimBlock = 32??

if) that is not efficient dimension of GIRD & BLOCK
helps me…
example… Num of data 100, 200, 500;;
dimGrid = ? , dimBlock = ?

Second,
when only 1 cycles;;
if GPU cores Implement 112 datas…
threadIdx = 0 ~ 112 ?
or, blockIdx * blockDim + threadIdx = 0~112 ??

Please Help…me…

thank you, read this post;

I don’t know if you have seen this occupancy calculator spreadsheet, but it will let you play around with different grid and block configurations and let you see how the kernel will be run. The key number is a minimum of 192 active threads per MP (not per block). That fully hides instruction fetching and pipelining overheads.

The second one.

Thank You… Because, That is first Algorithm. so, I referenced a Other program… EASEA…

Do you know that?

H…m…m…

Anyway… So So thanks…

In addition… I have a question;

Compute Capability : 1.1

Threads / Warp : 3.2

Warps / Multiprocessor : 24

Threads / Multiprocessor : 768

Thread Blocks / Multiprocessor : 8

Shared Memory / Multiprocessor (bytes) : 16384

Register File Size : 8192

Register Allocation Unit Size : 256

From Occupancy Calculator Sheet, I can know new information.

Really, Really; I wanna a Solution…

Warp == Kernel ?

i want to make function that is sums of grade each students. That is sumgrade( )

therefore, sumgrade( ) is A Kernel equal Warp ???

If)num of Student = 1000

sumgrade <<< dimGrid, dimBlock >>>(Parameter)

Efficient Solution ==> <<< 2, 512 >>> ??? ???

Is it correct???

Sorry, but I am really having some difficulty following your post. About the only thing I understand is this:

And the answer to that is no. A warp is the elementary scheduling unit of a CUDA MP. Threads are run in groups of 32 warps, and the number of threads in a block should be an even multiple of 32. All this is discussed in the first few pages of chapter 4 of the programming guide.

Sorry I cannot be of more help.