Optimizing Data Transfer Using Lossless Compression with NVIDIA nvcomp

jwitsoe · December 18, 2020, 6:06pm

Originally published at: https://developer.nvidia.com/blog/optimizing-data-transfer-using-lossless-compression-with-nvcomp/

One of the most interesting applications of compression is optimizing communications in GPU applications. GPUs are getting faster every year. For some apps, transfer rates for getting data in or out of the GPU can’t keep up with the increase in GPU memory bandwidth or computational power. Often, the bottleneck is the interconnect between GPUs…

nsakharnykh · December 18, 2020, 7:02pm

Hello CUDA developers!
Hope you enjoyed our blog about compression, and can find time to play with the library. Let us know if you have any questions or comments. Also feel free to submit issues directly to the GitHub page!

fwyzard · December 23, 2020, 2:17pm

@nsakharnykh, very interesting new feature and blog entry !

from the intro

Often, the bottleneck is the interconnect between GPUs or between CPU and GPU.

does it mean we can use nvcomp to compress data on the CPU and decompress it on the GPU (or vice versa) ?

nsakharnykh · January 5, 2021, 3:16pm

Currently, nvcomp only provides GPU implementations for compressors and decompressors, and one can implement CPU variants outside of nvcomp, since the compression format is fully open and explained in the docs. The main use case highlighted in the blog is for compressing GPU-to-GPU communications, and in this case we only need GPU-side compressors/decompressors. In near future we are planning to enable better compatibility with standard LZ4, so one can use existing CPU LZ4 libraries to compress on the CPU and nvcomp to decompress on the GPU - this is tracked here liblz4 compability · Issue #20 · NVIDIA/nvcomp · GitHub. Also see the following issue for tracking general CPU implementations CPU compression/decompression implementations? · Issue #12 · NVIDIA/nvcomp · GitHub, but it’s not on our roadmap at the moment.

Alturkestani · January 8, 2021, 8:04pm

Impressive work! Thank you for the blog post.

Is there an overlap between the compression (or decompression) and the computation happening on the GPUs?

dlasalle · January 11, 2021, 11:06am

@Alturkestani Thanks for the great question. In this example, we are not overlapping compression/decompression with other computations/operations.

Overlapping CPU computations with compression/decompression is relatively easy, as both are implemented asynchronously in our current API, so you could initiate compression/decompression, and then perform computations on the CPU while the GPU is busy.

Overlapping data transfer with compression/decompression, requires splitting the data into smaller chunks, so that while one chunk is being transferred, another can be compressed/decompressed.

Topic		Replies	Views
Accelerating Lossless GPU Compression with New Flexible Interfaces in NVIDIA nvCOMP Technical Blog	0	547	March 18, 2022
Maximizing Unified Memory Performance in CUDA Technical Blog	18	1173	May 14, 2019
Why the performance of tf32 tensor_core is poor? CUDA Programming and Performance	20	1634	August 8, 2023
How to Overlap Data Transfers in CUDA C/C++ Technical Blog	23	2171	January 18, 2023
How to Optimize Data Transfers in CUDA C/C++ Technical Blog	12	1155	January 22, 2022
Boosting Inline Packet Processing Using DPDK and GPUdev with GPUs Technical Blog	17	1816	June 26, 2023
GPU Pro Tip: CUDA 7 Streams Simplify Concurrency Technical Blog	51	2072	February 5, 2020
Wishlist Place your considered suggestions here CUDA Programming and Performance	201	204313	April 13, 2009
nvCOMP - get compressed data from device GPU-Accelerated Libraries nvcomp	17	745	April 19, 2024
GPUDirect Storage: A Direct Path Between Storage and GPU Memory Technical Blog	7	1060	March 22, 2022

Optimizing Data Transfer Using Lossless Compression with NVIDIA nvcomp

Related topics