Is there an efficient way to implement a function similar to np.add.at() function in NumPy?

yanghang162 · September 9, 2024, 3:42am

I want to implement a function similar to np.add.at() in NumPy, which accumulates data based on another index array. However, due to the randomness of the indices, memory access efficiency is very low. Additionally, I need to perform this operation on large datasets, so using shared memory doesn’t seem reliable.

njuffa · September 9, 2024, 4:39am

I don’t have first-hand experience in this direction, but is seems the first place where you would want to look for this functionality is in PyCUDA and CuPy. If you cannot find anything relevant there, maybe look at Thrust.

That is a fundamental issue that you may be able to mitigate depending on the specifics of the use case, but is here to stay. The old joke applies: “Doctor, it hurts when I push here.” “Don’t push!”

yanghang162 · September 9, 2024, 6:57am

Thank you for your suggestion. I found a function called thrust::scatter to implement similar functionality. Following this clue, I used the keywords ‘CUDA scatter and gather’ and found some related papers.

Topic		Replies	Views
adding array elements in shared memory CUDA Programming and Performance	3	1424	February 10, 2009
Unparallizable problem solvable with atomic ops? CUDA Programming and Performance	6	3913	March 12, 2008
Using atomicAdd to step through an array CUDA Programming and Performance	7	4002	May 24, 2011
How to index last element of a row/column of an array selectively index specific elements of an arra CUDA Programming and Performance	5	5907	November 9, 2010
Atomic Functions CUDA Programming and Performance	1	807	September 22, 2011
shared iterator for all threads CUDA Programming and Performance	1	622	May 7, 2015
Global memory access how to access the same location sequentially from different threads CUDA Programming and Performance	4	4402	July 29, 2010
Simple problem - but how to do fast! Suggestions welcome CUDA Programming and Performance	1	455	February 9, 2012
[beginner] indexing an array in a non continous way Accessing every 3rd element in the array CUDA Programming and Performance	5	1222	April 10, 2012
PyCUDA: very big 1D array indexing CUDA Programming and Performance	1	788	February 27, 2019

Is there an efficient way to implement a function similar to np.add.at() function in NumPy?

Related topics