Hello all~ :rolleyes:
I have an data_array which size is 256.
Each value of data_array is ‘1’
I created 256 threads to get the data respectively…
The data of 256 threads are summed up and stored at Result_array[1] (Result_array is global memory…)
After testing it, the result is ‘1’ . :blink:
What’s the main reason causing this problem?
Does CUDA provide any methods that users could set the priority to different threads?