Optimizing Access to Parquet Data with fsspec

Originally published at: https://developer.nvidia.com/blog/optimizing-access-to-parquet-data-with-fsspec/

This post details how the filesystem specification’s new parquet model provides a format-aware byte-cashing optimization.

Dear Nvidia engineers. Please advise whether do you have C++ library that supports the same approach for access parquet file. Thanks.

Dear Nvidia engineers. Please advise whether do you have C++ library that supports the same approach for access parquet file. Thanks.

Thanks for the question @evgenik !

I am not currently aware of any C++ library that implements all of the optimizations discussed in this article. However, it is my understanding that both Arrow and libcudf perform a subset of these approaches. For example, I know that Arrow uses a pre-buffering strategy, and libcudf coalesces multiple file-system accesses when possible. I suppose that the primary difference is that these are all “read-time” optimizations. I don’t believe either of these libraries offer an optimized file-opening utility (yet).