Hello,
I am working on a kernel that finds minimum of 3x3 sub arrays of an array.
The problem is, i am having hardtime debugging the kernel which is actually not long. It began to feel like working on 70s computer, which you have to review your code over an over again manually and desperately seeking for the possible reason. I calculated like 100 times index numbers manually for every grid, indexes for every shared memory transfers… Still end up with meaningless result file.
So i am pretty sure it is not how pros do… Can you help me about methodology of debugging in GPUs…
Thanks in advance…