I’m working on finite element methods. Previously, I considered on “Element” for each “Thread” on the GPU. Therefore, the required computations for each element had to be serialized. Now, I want to map each element into each “Block of Threads” in order to parallelized the element computations. I did this in the bellow format:
thread_per_block = number of Degree of Freedom
block_per_grid = number of elements
global void kernel ( element* e )
e[blockIdx.x].R[threadIdx.x] = …
This worked like the previous one, BUT has more computation time which is not what I expected…???
How I can map correctly each element into each “ThreadBlock”???
Thanks a lot,