Hi All,
I have a huge data structure : ~800mb that I load once on the card(gt280). The structure is a vector of table and the vector length is 10000. So I thought about trying to have a block size of 16.
my problem is the following : if i want to run several blocks and each block size being a multiple of 16 then i have the following combinaison:
- 1 block - 625x16 —> not good- too many threads in a block?
- 2 block -125x16 ----> not sure
- 25 block-25x16 -----> not sure
so my question is what would be the difference in performance between using 1 block 16x16 and a block 16x8 twice.
thanks
Jonathan