How to use GPU to do coordinate-transformation of Point Cloud?

The tf-transform is really very slow under CPU. I’m thinking about how to transform coordinate of the point cloud using GPU. This process can be done only with a matrix multiplication.

However, how to use add .cu file in ROS package and how to call the function in .cu file?

Anyone has done this or is there any example about this? It seems there is no API in PCL.

Thank you so much for helping me.