I am using a monte-carlo approach to solve a perticular problem, the thing is writting the output to a buffer
Now each thread only handles 1 tiny part of the problem, most threads end up terminating early because there is nothing to write
ie: the output generated by that combination is invalid, the thing is I want to put all the results in a buffer,
The simplest way I can think of this is like a struct
The thread reaches the output stage
location = buffer.NumberOfResults;
Then increments the buffer Counter
Write the output to the buffer
Now I know this can be done with normal threads in the CPU.
With a Lock on b/w reading the NumberOfResults values and the increment.
I don’t know how this can be done with a GPU thread, its a very simple queue