So I have this algorithm that every thread has a float array of 400 + C elements (where C < 10).
Every iteration should do smth like that:
[list=1]
[*]send some data to adjacent threads (connected with a graph)
[*]recieve data from adjacent threads, analyze them add to array at the end (max C elements)
[*]sort every array
At the beginning all arrays are set to 1.0f.
Every sorting would a sort a sorted array with unsorted small tail.
Now what would you propose to be most efficient algorithm for doing that ?
Maybe inserting those items in sorted order ?
Oh, the number of threads may vary between 32 and 1 024 000 (or maybe more).