After I use -Minfo , there is a output: # pragma acc for parallel, the vector (256) / * blockIdx.x threadIdx.x * /
For this output, What does 256 means? How do you know the compiler allocates a block and how many threads allocated for each block ?
After I use -Minfo , there is a output: # pragma acc for parallel, the vector (256) / * blockIdx.x threadIdx.x * /
For this output, What does 256 means? How do you know the compiler allocates a block and how many threads allocated for each block ?
What does 256 means? How do you know the compiler allocates a block and how many threads allocated for each block ?
It’s the vector width and translates into CUDA as the thread block size. “parallel” translates to CUDA as the block. So in this case you have a 1-D Grid containing a 1-D Block containing 256 threads. Since “parallel” does not have a width, the number of blocks launched will be determined dynamically at run time depending upon the size of the loop.
/ * blockIdx.x threadIdx.x * /
This indicates the exact CUDA dimension being used.
Hope this helps,
Mat
Thanks a lot.