I’m happy to announce the release of the Random Ball Cover (RBC)
package for performing nearest neighbor search on the GPU. The RBC
is new randomized metric data structure designed explicitly for
massively parallel systems. The basics of the data structure, along
with some GPU implementation details, are described in a paper that
will apear at the ADMS workshop (at the Very Large Databases
conference):
L. Cayton, A nearest neighbor data structure for graphics hardware.
The performance of the RBC is quite promising. One experiment that is
in the paper: I ran the code on a database from robotics that contains
1 million points, each of which is 21-dimensional. On the database, I
was able to find the nearest neigbor for 1 million queries in under 4
seconds.
I tried to keep the code as simple and clean as possible. At present,
the CUDA kernels are not highly optimized, but I plan to release a
more tuned version sometime soon. I am also actively expanding the
functionality of the code. If anyone is interesting in getting
involved, please contact me!
I just released a new version of the RBC library. This update adds support
for k-nearest neighbor (NN) search (as opposed to just 1-NN search).
This addition required a substantial rewrite of a couple of basic
kernels. The code now contains a manually-implemented
(Batcher) sorting network which operates in shared memory.
Additionally, the reduction phase of NN search (both in the basic
brute-force version, and in the data structure-based search) is now
implemented as a sorting network merge.
This method is quite fast, and is also memory-efficient; in
particular, it does not require a full set of distances to be stored
in memory, which can be problematic on large data sets.
I just released a new version of the RBC library. This update adds support
for k-nearest neighbor (NN) search (as opposed to just 1-NN search).
This addition required a substantial rewrite of a couple of basic
kernels. The code now contains a manually-implemented
(Batcher) sorting network which operates in shared memory.
Additionally, the reduction phase of NN search (both in the basic
brute-force version, and in the data structure-based search) is now
implemented as a sorting network merge.
This method is quite fast, and is also memory-efficient; in
particular, it does not require a full set of distances to be stored
in memory, which can be problematic on large data sets.
That benchmark remains the same; there is new functionality in this release, but this functionality doesn’t affect that benchmark. With that said, I’ll probably release more tuned versions of the kernels at some point. The current performance is pretty good, but can be improved somewhat using the techniques that Volkov & Demmel applied to matrix multiply. On the downside, these optimizations complicate the code.
That benchmark remains the same; there is new functionality in this release, but this functionality doesn’t affect that benchmark. With that said, I’ll probably release more tuned versions of the kernels at some point. The current performance is pretty good, but can be improved somewhat using the techniques that Volkov & Demmel applied to matrix multiply. On the downside, these optimizations complicate the code.
I’m releasing version 0.2.4 of the RBC code for fast nearest neighbor retrieval on a GPU. This is a fairly minor update: since version 0.2, a few bugs have been fixed, memory leaks caught, and, most importantly, the driver has been simplified considerably. This should make it a bit easier to use.
I recently submitted a paper on the RBC:
L. Cayton, Accelerating nearest neighbor search on manycore systems.
This paper contains the mathematical details underlying the method, and some detailed experimental evaluations, both on a multicore CPU and a GPU. Take a look if you want to understand the method a bit better.
I’m releasing version 0.2.6 of the Random Ball Cover package. This is another fairly minor release: I found and corrected a few bugs, simplified the driver, cleaned up some code, and added some example input files.