Window function for FFT

navier-stokes · April 17, 2009, 7:43am

Then search and find :wacko:

Pimbolie1979 · April 17, 2009, 9:03am

Pu 7km mit dem Fahrrad zur Arbeit ist doch ganz schÃ¶n anstrengend, besonders wenn man nur Gegenwind hat ^^

Ich habe deb Code gestern abend noch Stundenlang angestarrt. Und ich glaube der Groschen ist gefallen.

Ich versuche es noch mal in meinen eigenen Worten zusammenzufassen:

Du benutzt einen Block der 512 Threads enhÃ¤lt. Mit Hilfe dieses Blocks muliplizierst Du dann den Waveform[0] mit coeff[0], Waveform[1] mit coeff[1] … Waveform[511] mit coeff[511] (Alle Multiplikationen werden gleichzeitig ausgefÃ¼hrt). Somit ist 1. Block optimal ausgelastet. Es sind aber 1024 und nicht 512 Koeffizienten. Demnach benÃ¶tigt man noch einen 2. Block, der dann von Waveform [511] bis Waveform[1023] alle Berechnungen erledigt.

Soviel ich weiÃŸ kann ein Grid 512 BlÃ¶cke besitzen. Wenn ich also fÃ¼r die Berechnung von einer Waveform genau 2 BlÃ¶cke benÃ¶tige, dann brauche ich fÃ¼r 1000 Waveformen genau 2000 BlÃ¶cke. -->2000/512 = 3,9 Grids. Also benÃ¶tige ich fÃ¼r die komplette Berechnung 4 Grids. Soviel ich weiÃŸ hat meine Grafikkarte (9800GTX+ 64 Grids)

Ich hoffe Du konntest es einigermaÃŸen nachvollziehen. Ich finde das NVIDIA Programmer Guide ist an dieser Stelle sehr schwammig beschrieben.

Ich hoffe mal das ich es jetzt verstanden habe. Das wÃ¤re schon mal eine wichtige Vorraussetzung um selbstÃ¤ndiger mit einer GPU zu arbeiten.

navier-stokes · April 17, 2009, 9:27am

Bingo

MÃ¶Ã¶Ã¶Ã¶Ã¶Ã¶p! The programming guide says (A.1.1): The max. size of EACH DIMENSION of a grid of thread blocks is 65535. Since a grid can be defined in a 2d manner it can contain up to 65535^2 thread blocks. This is approx. 4 billion. You should know this, 'cause you’ve read the programming guide!

No, the GPU processes ONE grid at a time. I don’t believe that you’ve read the 3 pages of the programming guide about thread hierarchy. And I don’t believe that you’ve read my last post carefully.

Your graphics adapter has 16 multiprocessors == 128 scalar processors (Appendix A.1 in the programming guide). But this has little to do with launching your kernel.

Pimbolie1979 · April 17, 2009, 10:12am

Ach ich muss also erst ein Grid definieren, das eindimensional, zweidimensional, dreidimensional seind kann. Dort ist dann meine Funktion implementiert. AnschlieÃŸend fÃ¼hre ich diese Grid aus. Ich kann immer nur 1 Grid, das aber mehrere Dimensionen besitzt zur gleichen Zeit ausfÃ¼hren.

Gibt es eigentlich noch mehr Tutorials, als das von NVIDIA?

Pimbolie1979 · April 17, 2009, 10:26am

Ich werde jetzt noch mal Satz fÃ¼r Satz des Tutorails durchlesen und Ã¼bersetzen.

navier-stokes · April 17, 2009, 11:16am

Look here

navier-stokes · April 17, 2009, 11:17am

Always a good idea

navier-stokes · April 17, 2009, 11:27am

… and especially here

Pimbolie1979 · April 17, 2009, 11:43am

For convenience, threadIdx is a 3-component vector, so that threads can be
identified using a one-dimensional, two-dimensional, or three-dimensional index,
forming a one-dimensional, two-dimensional, or three-dimensional thread block.

Demanch kann ein Block der aus 512 Threads besteht Eindimensional, Zweidimensional oder Dreidimensional aufgebaut sein.

These multiple blocks are organized into a one-dimensional or
two-dimensional grid of thread blocks as illustrated by Figure 2-1. The dimension of
the grid is specified by the first parameter of the <<<â€¦>>> syntax. Each block
within the grid can be identified by a one-dimensional or two-dimensional index
accessible within the kernel through the built-in blockIdx variable. The dimension
of the thread block is accessible within the kernel through the built-in blockDim
variable. The previous sample code becomes:

Laut dieser Aussage kann die Anordnung der BlÃ¶cke in einem Grid nur Eindimensional oder Zweidimensional sein. Durch den 1. Parameter in kernelfunction <<<Grid_Dim,BlockDim>>>(…) kann ich die Dimension der BlÃ¶cke im Grid angeben. Durch BlockDim wird angegeben, welche Dimension die BlÃ¶cke selber haben.

Da die Dimension des Grids Eindimensional oder Zweidimensional sein kann, kann ich doch nur eine 1 oder 2 als Parameter angeben. Da die Blockdimension nur Eindimensional, Zweidimensional oder Dreidimensional sein kann, kann ich doch auch nur 1, 2 oder 3 als Parameter angeben.

Sehe ich das richtig oder habe ich einen Denkfehler?

Wie die BlÃ¶cke angeordnet sind

navier-stokes · April 17, 2009, 12:09pm

In

<<<Grid_dim, Block_dim>>>

Grid_dim and Block_dim are either int or dim3. Use integers for 1D and dim3 for 1D, 2D or 3D.

dim3 is a struct of 3 uints (x,y,z) and they specify the EXTENTS of the 3 dimensions. Pleas have a look at the cuda header files, my previous posts or any of my thousands of hints.

Pimbolie1979 · April 17, 2009, 12:36pm

Was ich jetzt dazugelernt habe ist das meine 9800GTX+ besitzt 128 Skalarprozessoren auf jeden kann ich einen Block ausfÃ¼hren. Jeder Block hat 512 Threads, somit komme ich auch 128*512 = 655536 Threads.

dim3 dimBlock(16,16); → definiere ich dadurch ein Block der Zweidimensional aufgebaut ist? Und das ich nur 256 von 512 Threads im Block nutze?

navier-stokes · April 17, 2009, 12:51pm

512 is the MAXIMUM number of thredas per block. Using only 256 thredas will not result in waste of hardware. READ THE PROGRAMMING GUIDE FROM THE BEGINNING TO THE END.

Pimbolie1979 · April 17, 2009, 12:54pm

dim3 dimGrid ((N + dimBlock.x -1) /dimBlock.x, (N+dimBlock.y - 1)/ dimBlock.y) ;

→ erzeugt ein Grid, in dem die BlÃ¶cke zweidimensional angeordnet sind. Die Anzahl der BlÃ¶cke muss ausgerechnet werden.

Wenn N=1024 ist, und dim3 dimBlock(16,16); dann ist dimGrid(64,9,64,9) Demnach benÃ¶tige ich 65x65 BlÃ¶cke und 16x16 Threads fÃ¼r die Bearbeitung des Kernels.

navier-stokes · April 17, 2009, 1:06pm

Yeeeeeaaaahhhh External Media External Media External Media External Media External Media External Media External Media External Media External Media External Media

BTW: Integer arithmetic works a bit different: (1024 + 16 -1)/16 == 1039/16 == 64.

64 is enough since 64 * 16 == 1024.

This integer formula

(N + dimBlock.x -1) /dimBlock.x

guarantees that there are always enough threads. Try N = 1025. ==> dimBlock(65, 65).

For your waveform use N=1024 and M=1000 and then

dim3 dimGrid  ((N + dimBlock.x -1) /dimBlock.x, (M+dimBlock.y - 1)/ dimBlock.y);

Pimbolie1979 · April 17, 2009, 1:24pm

So langsam sieht man ja Licht am Ende des Tunnels. Werde das Tutorial heute noch mal komplett durchlesen.

Noch mal vielen Dank fÃ¼r Deine Hilfe. Ich bin Dir mehr als ein Bier schuldig. Du wohnst nicht zufÃ¤llig in der NÃ¤he von Hamburg?

navier-stokes · April 17, 2009, 1:31pm

Nee, in Bonn. Und die RheinlÃ¤nder verstehen hier gaaaar nichts vom Bierbrauen External Media

Pimbolie1979 · April 17, 2009, 1:33pm

Ah in Bonn war ich mal vor gar nicht so langer Zeit mal durchgefahren. Muste nach Mainz.

Pimbolie1979 · April 17, 2009, 1:36pm

Quatsch nach KÃ¶ln (wie peinlich) ^^