about kernel time if we do not write data to global mem.


Let’s say I have a kernel which has two parameters. One global memory for input data and another global memory for output data.

If I just read data from the input and do some computing but I do not write the calculated result to the output, why the kernel running time is almost zero?

If I write the result to the output, the kernel takes some time. Does it mean if I do not write data to global memory, then the kernel is optimized such that all the computing inside the kernel is ignored?

Thanks a lot.

The compiler might be considering the read-only code to be “dead code” and emitting a nearly empty kernel…

You should dump the read-only kernel with cuobjdump and compare it to the read/write kernel.

Thanks a lot. I cannot agree with you more.

Hey scorpio,
will you share the code?..