Let’s say I have a kernel which has two parameters. One global memory for input data and another global memory for output data.
If I just read data from the input and do some computing but I do not write the calculated result to the output, why the kernel running time is almost zero?
If I write the result to the output, the kernel takes some time. Does it mean if I do not write data to global memory, then the kernel is optimized such that all the computing inside the kernel is ignored?
Thanks a lot.