Writng a kernel to operate on/combining all pairs of objects in a sequence

I’ve got a sequence of say five unique objects (e.g. coordinates) indexed 0 … 4 in device memory. I’d like to create a kernel to all pairs of objects as shown below:

(2,0) (2,1)
(3,0) (3,1) (3,2)
(4,0) (4,1) (4,2) (4,3)

Note that I do not wish to consider pairs such as (2,2) or (2,3). In the latter case, this is because (3,2) is equivalent to (2,3).

How would I write a kernel to work like this? The key of course is to map a thread/block id to a pair of numbers.

Question, do you know all of the pairs beforehand?
Could you pre-calculate all of the pairs?

If so, why not just create a 2-D vector where 1 direction is the number of pairs and the other are the pairs. This way you just end up with a vector, passing that to the GPU and have every thread or block or whatever size you end up needing operate on one set.

I don’t think mapping a thread/block id to a pair of numbers is really the idea here. Having thread 0 work on 4,3 and thread 1 work on 2,0 or any other order like that doesn’t matter. It shouldn’t matter which thread or block work on what pair just that all pairs are created equally (data wise.)

Therefore in this situation you would end up with say 10 threads working on these 10 pairs. Doesn’t matter which thread is working on which pair, just that they are all working on their own pair. If you bumped it up to say 15 or 21 same concept.

The bigger problem you are trying to solve is to ensure that all of the sets are the same and can be accessed the same (ie inside the thread it shouldn’t care which pair it is working on) that way you can get the parallelization you are looking for.

At least that is my interpretation of your question.

Thanks for your reply. I know I can generate the pairs before hand but that’s just extra work for the CPU, more CPU to device data transfers and more memory consumption on the GPU, especially when the pairs can be generated by each thread. Fortunately I’ve been able to find help funding such a formula. Here is the link to my stackoverflow question where you will find references/links to the formulas.

This has been discussed (in the context of operating on the upper triangle of a matrix) in an older thread on this forum.