Reduction on Triangular Mesh

Admittedly, I’m a complete beginner when it comes to parallel algorithms so I apologize in advance if this question is overly simplistic. Here’s the situation I’m facing:

I have a triangular mesh that has both nodal forces and face-based forces. I’m looking to compute the sum of the forces on each node for simulation purposes. Computing the nodal forces is easy with a model of 1 thread per node. However, I’m a bit more stuck trying to efficiently distribute the face-based forces. Options I have considered include:

  1. Sparse matrix multiplication
  2. A data structure listing each face that a node has to read from and then a simple read from those faces (which would likely result in many read conflicts unless there is some sort of smart sorting algorithm).
  3. Some sort of segmented reduction. The keys would never be properly ordered though (is this allowed?). Also, since it’s a mesh, there would likely be no more than 12 (or maybe 16 worst case scenario) connections per node.

Thoughts?

EDIT: It should be clarified that this is merely a mapping from faces to nodes. Therefore, every element in the space matrix would be 1.