Hi there! I would very greatly appreciate any help with this computation task:

My raw data is 10000 (or whatever the max number that will fit in the gpu memory is) arrays. Each array contains 10 million boolean values. There is also one special array with 10 million fp16 values. For every possible combination of 5 (or as high a number is feasible) of the 10000 boolean arrays, I need to compute the intersection (ie product) of those 5 arrays (ie the result is a new 10 million boolean array), and then finally compute the average of the special array for all elements at which the intersection value is True.

My questions are:

- Where do you think the bottleneck is for this task on a consumer desktop? eg would a 24GB Titan RTX perform much better than an 11GB 2080 Ti if the smaller 11GB memory forces me to shuffle raw data arrays between the gpu and cpu memory while the gpu computes combinations?
- Generally how well suited is this task to gpu computation rather than cpu? (I assume very).
- I will begin learning cuda once I’ve built the pc so any suggestions/tips regarding functions or ideas are most welcome.

My apologies if this isn’t the correct place for this question. And above I use the word array to just mean a list or vector of values.