The software that I’m migrating to CUDA works with OpenMP. Then, I have a question… can I execute the software with two or more threads and each thread execute the CUDA Kernels on the same GPU?

Thank you.

The use of OpenMP (especially #pragmas applied to loops) is often a pretty good indication that this particular chunk of code should be re-implemented as a CUDA kernel.

Yes, take a look at the cuda OMP sample code.