I’ve got an simple kernel, that gets few values from global memory, multiplies them and stores them in global memory.
When i run my program through cuda profiler many times single executions of the kernel report 0’es everywhere except of GPU time( the time is comparable to other ‘proper’ executions). That means 0 instructions, 0 loads, 0 stores etc.
What is going on?
Thanks is advance.
Hm i guess, that i might be writing outside of array boundaries, although i don’t have acces to the code ow so i can’t check :).