Hi,
I am writing a program that parallelises the scan algorithm. The exclusive scan operation takes a binary associative operator ⊕ with identity I, and an array of n elements
[a0, a1, …, an-1],
and returns the array
[I, a0, (a0 ⊕ a1), …, (a0 ⊕ a1 ⊕ … ⊕ an-2)].
Example: If ⊕ is addition, then the exclusive scan operation on the array
[3 1 7 0 4 1 6 3],
returns
[0 3 4 11 11 15 16 22].
More detail on the parallel algorithm by Mark Harris for scanning is here. The algorithm as described in this paper utilizes shared memory.My question is:
Is it possible to get any performance gain for this algorithm by using Pin memory or Texture memory or a combination of both ?
Thanks for your time