Hi, (I wish NV employees could please kindly help on this question) (8800GTX)
kernel’s para <<<Dg, Db, Ns>>> is hard to config. Best way is to tune. But for a big system, we wish to avoid tuning. We need a principle to config, so that the performance is not far from best case, say, >80% of best performance.
On of my current experiences is that: Ns at least <8KB, at best <4KB.
For Dg, I have contravertial opinion:
A: “Given fixed Ns, Dg * Ns <=256KB is a safe& not bad principle to determine Dg (block num).” Say, given Ns = 4KB, Dg best at 64.
B: “I never meet crush with Dg * Ns > 256KB. Say, Dg.x= 256, Ns = 4KB.”
1, is Dg > 256KB/Ns dangerous?
2, is Dg > 256KB/Ns pointless?
I tested that a big Dg won’t bring significant performance gain than if it’s set to be 256KB/Ns. I aslo tested that Dg doesn’t matter as long as it’s not too small (say, <32).
3, is Dg*Ns <=256KB a good heuristic?