No, because the only changing variables are private per thread.
The three kinds of possible race conditions (read after write, write after read, write after write) are described in the wikipedia article on data hazards. (Never mind the article speaks about pipeline design. Just think of any instructions whose order is not guaranteed).
No, because the only changing variables are private per thread.
The three kinds of possible race conditions (read after write, write after read, write after write) are described in the wikipedia article on data hazards. (Never mind the article speaks about pipeline design. Just think of any instructions whose order is not guaranteed).
Obviously this won’t work as phi and label are both read and written without synchronization. Use different arrays for input and output of your computation.
P.S.: Code posted to the forums is a lot more readable when posted between [font=“Courier New”][code][/font]…[font=“Courier New”][/code][/font] tags.
Obviously this won’t work as phi and label are both read and written without synchronization. Use different arrays for input and output of your computation.
P.S.: Code posted to the forums is a lot more readable when posted between [font=“Courier New”][code][/font]…[font=“Courier New”][/code][/font] tags.
To be onest, I tried it and it works…so why? Moreover, outside the function,( compute_pmin_var) each thread updates the value of phi, and then reads it but the index for each read and write is the same, is this a matter?
To be onest, I tried it and it works…so why? Moreover, outside the function,( compute_pmin_var) each thread updates the value of phi, and then reads it but the index for each read and write is the same, is this a matter?
It probably works (most of the time) if you launch it with a low enough block number so that all blocks run concurrently. I’d guess it fails with a larger number of blocks. Are you checking results against a serial implementation of the CPU?
Having the same thread write and read an array element is no problem, memory access ordering is guaranteed within a thread. The problems appear if one thread writes and a different thread reads (or the other way around).
It probably works (most of the time) if you launch it with a low enough block number so that all blocks run concurrently. I’d guess it fails with a larger number of blocks. Are you checking results against a serial implementation of the CPU?
Having the same thread write and read an array element is no problem, memory access ordering is guaranteed within a thread. The problems appear if one thread writes and a different thread reads (or the other way around).
Yes, I tried a comparison with a sequential implementation and it works. So, if I use two different data structure for both phi and label, I avoid both race conditions and syncrhonization issues?
Yes, I tried a comparison with a sequential implementation and it works. So, if I use two different data structure for both phi and label, I avoid both race conditions and syncrhonization issues?
does threadfence() function help in this case?
And do you think that it would be better using different memory buffer for input and output of the computation or using atomic operations?
does threadfence() function help in this case?
And do you think that it would be better using different memory buffer for input and output of the computation or using atomic operations?
Please can you explain me if in these cases i have problem of syncrhonization or race conditions?
When in the two examples that i have posted I need atomiic operations? Is better to use atomic operations or duplicate the structures to have a memory location for the input and one for the output??
Usually it is better to have separate memory for input and output, as atomic operations are expensive. And for floating point data, atomic operations create the additional nuisance that rounding errors suddenly depend on the specific timing of each execution.