I am using a monte-carlo approach to solve a perticular problem, the thing is writting the output to a buffer
Now each thread only handles 1 tiny part of the problem, most threads end up terminating early because there is nothing to write
ie: the output generated by that combination is invalid, the thing is I want to put all the results in a buffer,
The simplest way I can think of this is like a struct
struct buffer
{
int numberOfResults
int values[2048]
};
The thread reaches the output stage
reads numberOfResults,
location = buffer.NumberOfResults;
Then increments the buffer Counter
NumberOfResults++
Write the output to the buffer
values[location]= output;
Now I know this can be done with normal threads in the CPU.
With a Lock on b/w reading the NumberOfResults values and the increment.
I don’t know how this can be done with a GPU thread, its a very simple queue
Regards