Another simple question:
In sequential call of different kernels (Grid dim and block dim are different), I observed there is serious slow-down.
Kernel 1 only => 0.001 sec
Kernel 2 only => 0.130 sec
But kernel1 + kernel2 would take 0.450 sec.
Can I have some advice on this?