block of 16x15( !=16)

Hi All,

I have a huge data structure : ~800mb that I load once on the card(gt280). The structure is a vector of table and the vector length is 10000. So I thought about trying to have a block size of 16.

my problem is the following : if i want to run several blocks and each block size being a multiple of 16 then i have the following combinaison:

  1. 1 block - 625x16 —> not good- too many threads in a block?
  2. 2 block -125x16 ----> not sure
  3. 25 block-25x16 -----> not sure

so my question is what would be the difference in performance between using 1 block 16x16 and a block 16x8 twice.



Your first two configurations won’t run, as the maximum number of threads in a block is 512. Moreover, a GTX 280 has 30 multiprocessors, so you need at least 30 blocks to get full utilization of the device.