Controlling where a thread writes its data

I was wondering if it is possible to control where a thread will write its data within an array. I’m dealing with a sparse matrix and having to add a handful of non-sparse elements together at the end. They have to be added in a specific order. If I could have each thread write to the next number available in a temporary array, and then use atomicAdd to find out how many threads wrote to the data, I would only have to add a couple of numbers, vs the entire width of the matrix.

Thanks in advance for any suggestions.