Cannot compile OpenMP directives to offload to Nvidia GPU from Windows 10


I hope someone can help me. I am new with cuda, not as new with OpenMP. I am trying to compile this file

void cuda_matricesSubstract(int pIntCols, int pIntRows, float* pMatrixA, float* pMatrixB, float* pMatrixC)
#pragma omp parallel
int i, j;
#pragma omp target teams distribute parallel for map(to: i, j, pIntRows, pIntCols, pMatrixA, pMatrixB) map(tofrom: pMatrixC)
for (i = 0; i < pIntRows; i++)
for (j = 0; j < pIntCols; j++)
int lIntPos = i * pIntCols + j;
pMatrixC[lIntPos] = pMatrixA[lIntPos] - pMatrixB[lIntPos];

With CMD command:

nvcc -c -o kernel.o -Xcompiler " -openmp"

I obtain this error (I have translated it a little bit from spanish): error C3001: ‘target’: expected a name of an OpenMP directive

I have been trying different approaches but none worked.

I have:

  • Cuda v11.7
  • Windows 10
  • GPU NVIDIA GeForce GTX 1060.

I look forward to your help so that I can get deeper into cuda and NVIDIA.

Thanks in advance and best regards.

Moved to CUDA Setup forum

cuda and OpenMP (including target offload) are mostly orthogonal (they don’t relate to each other).

nvcc is not the correct compiler to use, nor is the CUDA toolkit intended to support OpenMP target offload to a GPU.

You won’t be able to use the CUDA toolkit for what you are trying to do here.

If you have a windows compiler that supports OpenMP target offload to a GPU, use that, and follow the instructions provided by the provider of that compiler.

Hello Robert, thank you for your answer. I will find that compiler. Any idea of such a compiler?

Thanks in advance.

for windows? no. Google may help you locate something.

For linux, you could try the NVIDIA HPC SDK. See here.

Hi Robert,

Is it on the roadmap for HPC SDK to support the OpenMP + Offloading to GPU features?

Should I assume that performance of OMP+Offloading to Nvidia GPUs is comparable to OpenACC’s ?

Outside CUDA, what would the best in terms of performance offload coding? OpenACC?

OMP codes are very pervasive so support for OMP Offloading is probably a lower cost transition to using Nvidia GPUs from OMP ready codes.


For linux, it’s already available. Please reread my previous comment. Also see here:

Use -⁠mp=gpu to parallelize OpenMP regions for offload to an NVIDIA GPU.

If you have questions about the HPC SDK and related compilers, I suggest asking those on the HPC Compilers forum.

1 Like

Thanks Robert!