Invoking different CUDA modules from different CPU threads/processes in parallel

Dear all
I have some cuda modules which I’m planning to invoke from two different threads within a process with different input data to work on.

Within each CPU thread, the cuda module runs parallel but not sure how the CUDA at the device/driver level handles multiple cuda modules( either same or different modules) with independent data-sets to work on?
Can some one point me to any source of information on this?

Thank You