Thrust `zip_iterator` with arbitrary number of iterators

fredpz · May 15, 2024, 7:58am

Is there any way to make a thrust::zip_iterator on a number of iterators that is known only at runtime?

Currently, I use several thrust::make_zip_iterator on different possible collections of iterators, but the number of possibilities is starting to become unmanageable.

If I carry out the operation on each iterator individually, I notice a significant loss of performance.

striker159 · May 15, 2024, 8:40am

It is not possible. The number of iterators contained in a zip iterator is encoded in the template type, so it must be known at compile time.

fredpz · May 15, 2024, 8:43am

Ok thank you for this explanation. But is there any workaround? Any way to make a custom iterator that would do the same?

striker159 · May 15, 2024, 1:34pm

What do you want to achieve? Why need a dynamic number of iterators?

Which thrust algorithms do you use? For the case of multiple calls with individual iterators it could help to use a caching memory allocator in conjunction with the thrust::cuda::par_nosync execution policy.

Curefab · May 15, 2024, 1:36pm

What range are we talking about?
Perhaps with template programming you can conver all?
And then have to select a suitable template at runtime.

fredpz · May 15, 2024, 1:56pm

@striker159 I typically use algorithms such as copy_if or sort_by_key which must be done on several vectors of equal length. But these vectors are not always the same. There are typically 6 to 12 of those, and users may choose which ones are needed at runtime. Applying the algorithms on all of them would be too much performance and memory loss. Concerning your suggestion, I don’t think I can use thrust::cuda::par_nosync for sorting algorithms.

@Curefab As mentioned just above, between 6 and 12 vectors, with any arbitrary selection of those. In practice there are typically about 10 possible selections, but it doubles every time I want to add a new feature. I don’t see how I could use templating in this situation.

fredpz · May 16, 2024, 8:31pm

@striker159 Thanks to your suggestion, I found a way to solve my problem. You were correct that using thrust::cuda::par_nosync allows for efficient calls on multiple iterators.

This technique, however, requires a little rework for using thrust::sort_by_key. Indeed, a first sort would shuffle all the keys, and they cannot be used again on the remaining iterators. Furthermore, sorting many times potentially becomes inefficient as the same comparisons are done redundantly. My solution is to create an index vector using thrust::sequence, then sort it using thrust::sort_by_key. All the iterators can then be sorted one-by-one, and asynchronously, using thrust::gather where the map is this sorted index vector.

Maybe that can help others. Anyways thank you for your suggestion!

system · May 30, 2024, 8:31pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Robert_Crovella · September 3, 2024, 2:18pm

The situation has changed in the latest thrust (CUDA 12.4.1 and newer) which can compile more than 10 iterators successfully. Also see here.

Topic		Replies	Views
Dispatch iterator (THRUST) CUDA Programming and Performance	7	629	October 17, 2019
Nested Zip_Iterator for output in THRUST GPU-Accelerated Libraries	2	1431	April 3, 2015
Performing N reductions at the same time - N reduction kernels or one kernel for all N reductions? CUDA Programming and Performance	7	647	June 9, 2017
Thrust::gather for input and output sequences that coincide CUDA Programming and Performance	6	1086	November 16, 2020
How to efficiently sort 5 arrays of integers? CUDA Programming and Performance	7	1223	June 19, 2015
thrust::exclusive_scan with thrust::zip_iterator? CUDA Programming and Performance	9	1558	November 24, 2014
custom Thrust iterator to combine consecutive elements of array CUDA Programming and Performance	1	1424	December 14, 2018
Implementation of Thrust Sort By Key for Multi-values (CUDA Fortran & CUDA C) CUDA Programming and Performance cuda	3	1040	February 22, 2023
Thrust::async::for_each() with zip_iterators CUDA Programming and Performance	3	568	January 30, 2023
Is it possible to somehow fuse thrust::partition and thrust::sort? CUDA Programming and Performance	6	1642	September 11, 2016

Thrust `zip_iterator` with arbitrary number of iterators

Related topics