I have an MPI+OpenACC code that uses
before later calling routines in a C code that uses CUDA library calls.
Do the CUDA calls know which device to use automatically because they are called after the
acc device_num(N) or do I need to send the device number
N into the C code and set the device manually with
a call to