I propose to remove num_items parameter from DoubleBuffer CUB API (there suitable).
I realized that num_item parameter in cub::DeviceSegmentedRadixSort::SortPairs (for example) is not required if I use DoubleBuffer.
It is not obvious from documentation that it is required only for calculation of (temporary) buffer size.
To get actual value for num_items I used additional cudaDeviceSynchronize() and read from device memory between data preparation and Sort() steps. And it was a significant performance penalty.