But according to the lines:
int BlockOff=bIDx<<8; // 256 thread, each deals with one element
int address0 = tIDx + BlockOff;
when the blockId is 0, I can access the first 0-255 elements, and when the blockID is 1, I can access the 256-512 elements. isn’t it?
trudger
22
if blockID is 1, BlockOff=0, tIDx is still 0~255,
so you are still accessing 0~255, because tIDx + BlockOff is still in the range of 0~255
Why???
int BlockOff=bIDx<<8
does that mean for different blocks I get a 256 jump? I shift bIDx left and that is the same as bIDx*256…
why I am accessing element 0~255 for first block, then 1~256 for the second block?
trudger
24
Sorry, I misunderstood it…
Now I can do the intersection of 16M and 1M in 4.5 ms, thanks guys :)