CUDA+OpenMP+non-Gnu-compiler Having build problems with this combination

I have a multi-GPU code composed of Fortran90 files and CUDA .cu files. Each has OpenMP directives—this is how I do my multi-GPU task splitting. Until recently I have only used the Gnu compiler suite to build this software, and it works fine on my local machine. NCSA’s lincoln cluster has the PGI as well as Intel compilers, and I would like to try those. I learned, though, that nvcc will only a Gnu C compiler, which means that I must use “–compiler-options -fopenmp” to compile the .cu files (I compile them all into object files and assemble them into a library before the final linking). But the rest of the code is compiled with, say, pgf90, which would use its own internal OpenMP implementation via “-mp=allcores”. If my final pgf90 link statement does not contain “-L/usr/local/gcc-4.2.4/lib64 -lgomp”, I get an linking error similar to the following for every OpenMP directive in my .cu files:

./libMystuff.a(CudaResources.o)(.text+0x223): In function `findnumcudagpus_': : undefined reference to `GOMP_parallel_start'

But if I include libgomp in that final linking statement, my program hangs the first time it enters a parallel section in the Fortran/pgf90 code.

Has anybody had success using a non-Gnu compiler alongside nvcc with OpenMP, or did I shoot myself in the foot by putting OpenMP directives in my .cu files?

[Computer on which this all works: Fedora 10, CUDA 2.2, GCC/gfortran 4.3.2;

Computer on which this fails: RHEL5, CUDA 2.2, GCC/gfortran 4.2.4, PGI 9.0-4 or Intel 10.1.017]