Looking for CUDA developer to rewrite implementation of one paper (ranking from pairwise comparisons)

I have a problem where I have up to 10,000 entities and I’m trying to obtain rankings of these entities from pairwise comparisons (in each comparison one entity is a winner, the other is a loser), doing the least amount of pairwise comparisons.

There is a paper proposing an algorithm for such a problem and I want to have a working implementation for it [1]. The paper does have a code example on GitHub but the provided example implementation on GPU doesn’t work when there are more than 200 entities, it gives an out-of-memory error.

So your job would be to

  1. Evaluate whether there is a memory usage issue related to the complexity of the algorithm, meaning can this algorithm work for me if I have 10,000 entities and I don’t have a supercomputer? I could get 10-20 rented GPUs from vast.ai if each iteration of the algorithm would last up to 10 min)
  2. Fix the GPU implementation to make it properly work when there are many entities, plus add support for distributed computing (e.g. maybe via Ray).

[1] “Active Sampling for Pairwise Comparisons via Approximate Message Passing and Information Gain Maximization” https://arxiv.org/pdf/2004.05691.pdf

[2] GitHub - gfxdisp/asap: Fast and accurate Active SAmpling method for Pairwise comparisons