(PyCuda) How would I use omething like Python's itertools.permutations in CUDA to make it faster?

I’ve seen people do generate permutations through CUDA (but not with Python), I just don’t know what I’m doing with all of the C++ code, and how to reference it from Python.