Running multiple ML algorithms(Xgboost,LGBM) concurrently in GPU

I am training multiple XGboosts in for loop inside python. The reason behind running multiple XGboost in for loop is that each XGboost in running on different feature sets. This whole exercise takes almost two hours. I want to reduce this time. I used multiprocessing library of CPU to speed up this thing and it time reduced to 25minutes, I want to further reduce this time by utilizing multiple cores of GPU instead of just 16 cores of my CPU.

I have seen numba but i cannot write XGboost algorithm in for loop numba, it gives error, i think numba works with only the selected feature set of python.

The next option left is pycuda, how would I use Xgboost inside SourceModule of pycuda. i need to run multiple Xgboost on different dataframes. What would be best strategy to deal with this problem.

XGBoost is available in GPU-accelerated versions. Use those from your ordinary python code. One such method is via RAPIDS.