Hi everybody,
I’ve tested some kernel time in a large project. There is a kernel that is called 5 times by a loop.
Does anybody know why the first call is always longer than other (little bit, but it is…)?
tx
Hi everybody,
I’ve tested some kernel time in a large project. There is a kernel that is called 5 times by a loop.
Does anybody know why the first call is always longer than other (little bit, but it is…)?
tx
My guess is because of caching
caching between different kernel calling, may be…
I must investigate!
tx
There is some driver overhead in loading a new kernel binary onto the GPU. This happens automatically the first time you run the kernel, which is why the first execution is slightly longer.