I am trying to understand the algorithm that thrust::set_intersection uses. It seems to me that at first it applies a partition similar to the merge path algorithm (so I assume a binary search based algorithm is used to segment the arrays). This produces pairs of segments that can be handled independently.
I would like to know if then each such pair of segments is handled by a thread block that does once again binary search, but this time on shared memory.
I found a source about balanced path algorithm in multisets, but i am not sure if it used always for set_intersection. Also, after the first partition (i suppose balanced path algo does partitions as well) is a second binary search being employed on thread block level to find common elements? This is my main question.
As thrust is open source, you could take a look at the implementation. After balanced path, each thread applies the set operation sequentially to its elements. A prefix sum determines the final output positions.