Less time for sequential execution and more time for parallel execution. why?


I have a strange problem that…

when I executes the code sequentially ( I mean KerFun<<<1,1,0>>>() ), then it takes some X ms
and when I excute it by using threads ( I mean KerFun<<<10,30,0>>>() ), then it takes X+Y ms

What could be the reason for this? Is it because of kernal function configuration?

Please discuss this…


Well, it is 10 blocks instead of 1 block so it’s more things to schedule I guess. I don’t know though.

Are you sure they’re doing the same amount of work?