Remove num_items from DoubleBuffer CUB api


I propose to remove num_items parameter from DoubleBuffer CUB API (there suitable).

I realized that num_item parameter in cub::DeviceSegmentedRadixSort::SortPairs (for example) is not required if I use DoubleBuffer.

It is not obvious from documentation that it is required only for calculation of (temporary) buffer size.

To get actual value for num_items I used additional cudaDeviceSynchronize() and read from device memory between data preparation and Sort() steps. And it was a significant performance penalty.