cub::DeviceSelect::Flagged does not work for large num_items

Hello,

This is my first time asking in this forum, so please let me know if I am asking in the right place.

I am currently using the Flagged() function, described in cub::DeviceSelect — CUDA Core Compute Libraries . According to the documentation, num_items is a signed integer of 64 bits. However, I checked the source code (I am using CUDA 12.2 on a H100) and they declare num_items as a 32-bit signed integer. Indeed, if I pass a large value (larger than 2^31-1) I get errors.

Am I doing something wrong? Or is the documentation wrong?

Thank you!

The documentation is correct. The cccl version shipped with cuda 12.2 is simply not up to date. The latest CCCL is available on GitHub GitHub - NVIDIA/cccl: CUDA Core Compute Libraries

Support for large num items was added in October 2024 according to GitHub. Cuda 12.2 was released June 2023

1 Like