I think I found the fastest knn algorithm in the world.It uses TensorCore acceleration, and the utilization of hardware reaches more than 90%.

Here are some test data comparisons with faiss and cublas to demonstrate the processing power of different dimensions.

On the test platform RTX 2080ti, when the input and output are both half, the theoretical processing power is 107TFLOPS; when the input is int8 and the output is int, the theoretical processing power is 214TOPS.

cublasHgemm cublasSgemmEx

cublasGemmEx

The above picture is a screenshot of the execution of the search algorithm. The database size is 2 million, the data type is half, and the TopK is 128. The cycle ends 13 times.

The first red box is the execution time (unit ms) and computing power of the distance calculation function;

The second red box is the time-consuming of result processing and sorting;

The third red box is the amount of data in the current loop;

The fourth red box is the statistical information of the number of filtered results, which are the average value, the average value of the sum of squares, the maximum value, and the minimum value.

The content comes from 史上最快的knn搜索算法介绍 – simba