Implement a Particle Filter with multi-thread c++

Hi, I am writing a particle filter on TX2 with C++. The code is like this:

//LanMaker
struct LandMarker{
    double x, y, theta;
}

//Particle class
class Particle{
void update(double move, double steer, vector<LandMarker> cones, vector<LandMarker> map){
    for (LandMarker& l1 : cones){
        for (LandMarker& l2 : maps){
            ///do something
       }
    }
}
};

//Particle Filter main function
//this loop is the most time consuming part
vector<LandMarker> cones = ....
vector<LandMarker> maps = ....
for (Particle& p : particleArray) {
     double move = ...
     double steer = ...

     p.update(move, steer, _cones, _map); 
}

I am thinking putting the “p.update” into multi-thread since TX2 has many cores. The code needs to wait for all Particle got updated and then exit the for loop.

I personally has not experience with multi thread code. I am wondering if anyone can help me and show some sample code?

Also, is it a good idea to use the GPU to implement the p.update?

Thank you very much.