block size doubt

What’s better…

  1. blockDim.x==16 && blockDim.y==16 ( for a total of 256 threads )
  2. blockDim.x = 256 && blockDim.y=1 ( for a total of 256 thread too )

??? … or are them identical in speed?


Unless someone else corrects me, I believe those are identical. I assume you still launch the same number of blocks in both cases?

Yes, they will be launched using the same number of blocks.

I think the 1D version has some advantages for me… because I access some data using a 1D-linear vector… with the 2D blocks I need to convert the threadIDs to a linear offset… so I bet it will be slower than the 1D version.

Only when you are not memory bandwidth bound… But then again, less code = less room for errors, so I would use a 1D block