I have a strange problem that…
when I executes the code sequentially ( I mean KerFun<<<1,1,0>>>() ), then it takes some X ms
and when I excute it by using threads ( I mean KerFun<<<10,30,0>>>() ), then it takes X+Y ms
What could be the reason for this? Is it because of kernal function configuration?
Please discuss this…