Even Faster and More Scalable UMAP on the GPU with RAPIDS cuML

jwitsoe · October 31, 2024, 8:24pm

Originally published at: Even Faster and More Scalable UMAP on the GPU with RAPIDS cuML | NVIDIA Technical Blog

UMAP is a popular dimension reduction algorithm used in fields like bioinformatics, NLP topic modeling, and ML preprocessing. It works by creating a k-nearest neighbors (k-NN) graph, which is known in literature as an all-neighbors graph, to build a fuzzy topological representation of the data, which is used to embed high-dimensional data into lower dimensions. …

jdwilliamson41 · November 1, 2024, 4:23pm

All because of my Telescope…

richardedwardhughes · May 1, 2025, 3:22am

I was able to get your example to work for 500k 1024 dimensional embeddings. However, my full sample is about 50M 1024 embeddings (using stella from huggingface). Do I have to be able to load the full 50M into host ram in order to run umap on this? If not, how do I do this. I have looked through the various cudf documentation, and it is not clear how to do this. Can you share the code (or just a snippet) you used for the Wiki-all subsample (since this is 50M 768 dim)?

Topic		Replies	Views
Accelerating K-Nearest Neighbors 600x Using RAPIDS cuML Data Science of the Day fun-facts , open-source-software , machine-learning	0	1191	June 9, 2021
NVIDIA RAPIDS 24.10 Introduces Accelerated NetworkX with Zero Code Change, Updates for UMAP and cuDF-Pandas Technical Blog	1	50	November 14, 2024
Accelerating k-nearest Neighbors 600x Using RAPIDS cuML Technical Blog	0	428	May 28, 2021
Faster HDBSCAN Soft Clustering with RAPIDS cuML Technical Blog	0	377	December 6, 2022
GPU-Accelerated Hierarchical DBSCAN with RAPIDS cuML – Let’s Get Back To The Future Technical Blog	0	708	October 6, 2021
Bath SOM Parallel Algoritm for Matlab Src Matlab, Mex, C CUDA Programming and Performance	1	4179	September 6, 2008
Reusable Computational Patterns for Machine Learning and Data Analytics with RAPIDS RAFT Technical Blog	0	522	March 22, 2023
NVIDIA RAPIDS 24.10: 코드 변경 없이 가속화된 NetworkX, UMAP 및 cuDF-Pandas 업데이트 도입 Technical Blog - South Korea	1	54	December 13, 2024
RAPIDS Brings Zero-Code-Change Acceleration, IO Performance Gains, and Out-of-Core XGBoost Technical Blog	1	59	May 29, 2025
코드 변경 없이 NVIDIA cuML로 scikit-learn 가속화하기 Technical Blog - South Korea	1	44	April 9, 2025

Even Faster and More Scalable UMAP on the GPU with RAPIDS cuML

Related topics