Some things are not clear.
If you have an openACC code that compiles correctly, and you have decorated for loops either with the
#pragma acc kernels
#pragma acc parallel
directive, then there shouldn’t be anything else needed to get those accelerated regions to run on the GPU. If you are using the PGI compiler toolchain for this, the -Minfo=accel compiler switch is a useful thing to use and learn to understand its output.
When you say “It appears that OpenACC won’t run a pre-compiled C program.”, I’m not sure what you mean.
Yes, its correct that a pre-compiled program (say, one that does not use OpenACC) will not automatically use the GPU. You have to compile code specifically to use the GPU. It is also correct that if you are calling a library function that is calling into a pre-compiled host code library, such calls can’t be made from accelerator regions (i.e. those regions decorated as I described above).
It might be useful just to go through an introductory course on OpenACC.