Admittedly, I’m a complete beginner when it comes to parallel algorithms so I apologize in advance if this question is overly simplistic. Here’s the situation I’m facing:

I have a triangular mesh that has both nodal forces and face-based forces. I’m looking to compute the sum of the forces on each node for simulation purposes. Computing the nodal forces is easy with a model of 1 thread per node. However, I’m a bit more stuck trying to efficiently distribute the face-based forces. Options I have considered include:

- Sparse matrix multiplication
- A data structure listing each face that a node has to read from and then a simple read from those faces (which would likely result in many read conflicts unless there is some sort of smart sorting algorithm).
- Some sort of segmented reduction. The keys would never be properly ordered though (is this allowed?). Also, since it’s a mesh, there would likely be no more than 12 (or maybe 16 worst case scenario) connections per node.

Thoughts?

EDIT: It should be clarified that this is merely a mapping from faces to nodes. Therefore, every element in the space matrix would be 1.