I am writing a program that parallelises the scan algorithm. The exclusive scan operation takes a binary associative operator âŠ• with identity I, and an array of n elements
[a0, a1, …, an-1],
and returns the array
[I, a0, (a0 âŠ• a1), …, (a0 âŠ• a1 âŠ• … âŠ• an-2)].
Example: If âŠ• is addition, then the exclusive scan operation on the array
[3 1 7 0 4 1 6 3],
[0 3 4 11 11 15 16 22].
More detail on the parallel algorithm by Mark Harris for scanning is here. The algorithm as described in this paper utilizes shared memory.My question is:
Is it possible to get any performance gain for this algorithm by using Pin memory or Texture memory or a combination of both ?
Thanks for your time