Utilize GPU for ROS node

Hi, I am working on a robot project on Jetson TX2. One important component of the project is Particle Filter that matches LiDAR scan with a exiting map. My implementation is current on CPU and it is very slow. I am thinking about use GPU. The background information is as follow:

  1. There is a map which is a 1024x1024 8 bit image. All particles share this same map.
  2. Each particle has a Lidar scan data which is a array of 1024 float numbers.
  3. Each particle needs to access the map, and do some processing (match its LiDAR scan to the map).
  4. Each particle will update its weight based on the matching results.

I am not sure what’s the high level strategy to do this with GPU. One particle per GPU core?

Also, I am pretty new to GPU and CUDA, if you know some CUDA example specifically for Jetson TX2, please let me know.



TX2 only have one GPU.

Maybe you can use different stream for the particles.
Different streams can execute simultaneously but the task assigned to the same stream will be executed in order.

Here is our document for your reference: