i accidently run two different setting; grid(43,43) with block (3,3,33) and grid(129,129) with block(1,1,33)
there were speed difference (40% faster with the second setting) and i don’t understand why.
Thats the only change i made when i run the examples.
i assume that load 333=297 threads working parallel is still taking more time compared with just 33 threads running simultaneously?
any comments? Thanks well in advance.