Originally published at: https://developer.nvidia.com/blog/encoding-and-compression-guide-for-parquet-string-data-using-rapids/
Parquet writers provide encoding and compression options that are turned off by default. Enabling these options may provide better lossless compression for your data, but understanding which options to use for your specific use case is critical to making sure they perform as intended. In this post, we explore which encoding and compression options work…
We’ve learned a lot studying data-dependent behavior of encoding and compression, and we are grateful for the work of Ed Seidl and the broader Parquet community (e.g. https://lists.apache.org/thread/5jyhzkwyrjk9z52g0b49g31ygnz73gxo). Working through this study has made me a huge fan of ZSTD compression. If you have any questions or comments, please let us know!