throughput of warp vote functions?

_teju · March 18, 2010, 7:12am

There’s not much info out there on warp vote funcs, neither in the cuda prog. guide nor the google search… :">

So, some questions on warp vote functions:

Is it possible that the predicate to be evaluated is different for different threads in a warp? An example predicate would be (arr1[threadIdx.x] == arr2[threadIdx.x]).
Whether internally they’ll cause the warps to diverge if the predicate evaluates differently for different threads in the warp?
Is there any example code OR an algorithm which could benefit from warp vote functions?
What is the throughput of warp vote functions, __all and __any?

Even if one can provide me with some pointers to these information, that would be much appreciated…

_teju · March 19, 2010, 4:26am

Nobody??

SPWorley · March 19, 2010, 5:31am

Throughput is basically one op, just like a simple add. Can’t be any faster!

The vote operations have been useful to me when you have a thread that needs to load some largish data from global memory… then the whole warp can help load say 256 bytes of data into shared instead of having one thread use many, many memory transactions. (in one example case, it’s whether any thread needs to look at a voxel’s worth of polygon data.)

_teju · March 19, 2010, 6:26am

Really appreciate your response…

Throughput is basically one op, just like a simple add. Can’t be any faster!
Will this be the case even if the threads in a warp evaluate differently on the predicate given?

allanmac · March 19, 2010, 4:43pm

It’s worth noting that there is an equivalent to “[font=“Courier New”]vote.any.pred[/font]” documented/discovered here: [url=“The Aggregate Magic Algorithms”]The Aggregate Magic Algorithms

If your target GPU is at Compute Capability 1.0/1.1 then this can be invaluable in implementing whole-warp loaders like SPWorley describes.

Clever!

_teju · March 19, 2010, 7:10pm

Very cool!
Thanks allan for sharing the link… This link also has some pretty nice algos! :rolleyes:

allanmac · March 29, 2010, 7:39pm

I was reading Appendix G4 Compute Capability 2.0 in the CUDA Programming Guide Version 3.0 and noticed that page 148 states:

So that Aggregate “GPU Any” technique should still work on Compute Capability 2.0 devices.

Topic		Replies	Views
can anybody explain warp vote functions CUDA Programming and Performance	9	11474	February 11, 2011
Vote functions in a warp-divergent branch? Are they allowed? How idle threads are handled? CUDA Programming and Performance	5	19437	September 24, 2010
WARP Voting function CUDA Programming and Performance	6	6590	March 25, 2010
Warp Vote Functions CUDA Programming and Performance	1	1266	December 3, 2009
How to control warps? CUDA Programming and Performance	2	558	May 14, 2018
do warp vote functions cause branching? CUDA Programming and Performance	16	3768	August 11, 2010
What is a warp vote function? CUDA Programming and Performance	1	6455	June 7, 2010
Is there a block vote (analogous to warp vote?) CUDA Programming and Performance	7	20802	July 20, 2009
Warp Vote Functions..When are they useful? CUDA Programming and Performance	1	1434	August 20, 2013
Can one warp be doing one thing while another warp does something else? CUDA Programming and Performance	6	953	July 11, 2017

throughput of warp vote functions?

Related topics