Are there any recent GPU (e.g., CUDA, OpenCL) implementations (or included in libraries?) of pairwise distance calculation?
The only thing I could find was this paper from 2008, that shows two implementations in CUDA.
In fact I would need not only the pairwise distance calculation but also a sum of the pairwise distances, but I could achieve that with a parallel reduction after calculating the pairwise distances.
Thanks in advance :P