How many can use Blocks to effcient parallel prog

Hello.

I searched Sums Function in forums.

Many Peopls said that See Reduction Example.

So, I’m seeing that.

But, In 4 pages of Reduction DOC… I don’t understand …

IN 4 Pages

Problem : Global Synchronization

CUDA has no global synchronization. Why?
Expensive to build in hardware for GPUs with high processor count
Would force programmer to run fewer blocks (no more than # multiprocessors * # resident blocks / multiprocessor) to avoid deadlock, which may reduce overall efficiency

I don’t understand…
Needs Many Blocks for my program…
I have matched M -Number of Block and Up to 512 threads of a block
I knew that Using Maximum Blocks in One cycles is Most Efficency till yesterday!!
But,That is My miss. Above statement (Would force ~~~~)

Geforce 9800 GT;
14 MPs;
a BLOCK has up to 512threads
a Grid has up to 3-D : 512 * 512 * 64

I need 2-Demsion Grid. (512 * 512)
that has 512 * 512 Blocks… equal 262144;

How many can use that one cycles…??

and How code that??

Seperate Auto?? or Manual?

Simple (My code);;
int idx = threadIdx.x;
int j = blockIdx.x + gridDim.x * blockIdx.y;

result[j + idx] = arr[j + idx] * ALPHA;
sums[j] += result[j + idx]

Help me…

Thanks… Read This…

That is incorrect

A block can have up to 512 threads in a 3D arragement, with maximum dimensions in the x,y, and z directions of 512,512 and 64.

A grid can have up to 4294836225 blocks in a 2D arrangement, with maximum dimensions 65535x65535

I think That is incorrect…

I metioned data from deviceQuery;

When The size of A block is more than 500(600 in the Limit test), Test failed.

Output : Incorrect Value

Before that test, I have test 2-D block… because that test failed, I had above test.

I want to really know, Not Why test failed, How many can use Thread, Block, Grid… in 1 cycle.

From Test…

in Addition…

[b] Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1[/b]

second line…

I think (( )) first.

blockDim.x < 512, blockDim.y < 512, blockDim…z < 64

gridDim.x < 65535, gridDim.y < 65535

But I don’t think that NoW…

Now

blockIdx.x < 512, blockIdx.y < 512, blockIdx.z < 64

gridIdx.x < 65535, gridIdx.y < 65535

What is the case correct ???

What I wrote is 100% correct and exactly mirrored in the device query output you posted above. I don’t understand where you think the ambiguity is.

BlockDim and GridDim are the selected block and grid size. threadIdx and blockIdx are the indices of a given thread and block within the block/grid hierachy so that

threadIdx.{x,y,z} < blockDim.{x,y,z} < {512,512,64} and blockIdx.{x,y}<gridDim.{x,y} < {65535,65535}

just to make clear to anyone not familiar with the matter there are more conditions:

0 < blockDim.x * blockDim.y * blockDim.z <=512, with the 512 being a maimum that might need to be smaller depending on kernel register usage.

I had missed…

I said you that below articles. I have test already Modification blockDim… I’m failed… dim3 dblock(500,500,60) ;

Looking CUDA programming guide DOC, have coded.

Modificate dimx,dimy,dimz ; Conclusion is Total Number of threads of 3D - block equal to that of 1D-block;

I have told this…

of course, I’m not experts; As first articles. I’m beginner;

If you have example codes about {512,512,64}; I want that you send me e-mails ;

8-Days… This problem make me Crazy;

Thanks for comment;

Again, I think you are misunderstanding something. The maximum number of threads in a block is always 512. The maximum dimensions of a block in the x,y, and z directions is 512,512, and 64. There is a difference between those two statements. A block of (500,500,60) is illegal, it has more than 512 total threads, even though its individual x,y, and z dimensions are permitted.

mrkyunby:Maximum number of threads per block: 512
E.D. Riedijk:0 < blockDim.x * blockDim.y * blockDim.z <=512
mrhyunby:Conclusion is Total Number of threads of 3D - block equal to that of 1D-block;
avidday:the maximum number of threads in a block is always 512

I don’t understand… you keep saying the same thing over and over again and stating that the other side is wrong… :)

Um… I want to have confidence that Above texts…

Before Said… I’m beginner. so I have mistake that, at first…

I want to know subject this article.

I had mistaken… about Dimension of block;

So, I worry that didn’n know about HOW MANY can use ther MAXIMUM number of BLOCKS…

Addition. [b]For Efficient Parallel Prog.

the relation between THE NUMBER of MP , SP and Grid, BLock… [/b]

If) Blocks have Maximum number of threads = 512 threads.

Then) I Needs RESOURCES for Test

Maximum Number of threads of a block : 500

Maximum Number of blocks : 10,000

May be… Automatically Compiler divide Blocks to suitable or Efficient My environment…??

;; HA… Is that Impossible? ==;;

Thanks… and Sorry…