I was wondering what would be a better approach in general (no specific problem) to the issue of what to do when an output value is not necessarily produced for every input value.
i.e. if (input[i] < 10)
output[42] += 1
again I re-iterate that this is not a practical piece of code, just used to demonstrate what I mean (in terms of selective output). In this example the idea could be 'increment some global memory counter (at location 42) for every input element which is less than 10.
Obviously when having a condition like this in a kernel, memory accesses are very inefficient (ignoring the fact it’s only incrementing a value).
Redundant output in this case would (for example) output an array containing flags of true/false (less-than/greater-than) and then leave it to the host to filter these.
Which of these theories typically works best, not forgetting that fact that if you wanted to record more than 1 piece of information you could be looking at 4+ times the space required by the output when compared to input. (which feels very wasteful - obviously the bigger picture is the more important one)
Any thoughts on the subject?