Is it possible to call a CUDA kernel from PGI compiled code?

Hi,

I’m just curious if the statement of PGI’s OpenACC FAQ ist still valid:

“PGI is working on the design of a feature to allow you to call kernel functions written in CUDA or PTX or other languages directly from your C or Fortran program. We will announce this feature when it is available.”

If this is still up-to-date, could you please tell me when this feature is likely
to be added?

I’m looking forward to this feature since it would allow me to “hand tune”
some “hot” kernels.

Best,
Paul

Hi Paul,

This is part of the proposed OpenACC 2.0 Spec. See: http://www.openacc.org/node/173 and http://www.openacc.org/sites/default/files/Proposed%20Additions%20for%20OpenACC%202.pdf. In particular, see the “routine” directive.

We expect to have these features implemented by mid next year.

  • Mat

Hi Mat,

as always, thank you :)

EDIT: A closer look into the OpenACC_2.pdf reveals that we might be talking about two different things. You are talking about routines that can be called within an open acc region, right?

I was talking about a cuda kernel that can be called outside of an acc region.
E.g.:

some_cuda_kernel<<<num_blocks, threads_per_block>>>(in_out);
    // do something before acc region
#pragma acc parallel deviceptr(in_out)
    //do something with in_out
//copy in_out to C code

Best,
Paul

Hi Paul,

You are talking about routines that can be called within an open acc region, right?

Correct. I’ll see if we can make the FAQ more clear that it’s about calling CUDA device functions from within an OpenACC compute region.

I was talking about a cuda kernel that can be called outside of an acc region.

With Fortran you can do this now by using PGI CUDA Fortran.

As for C, the problem has to do with NVIDIA’s header files where they need to make some changes to allow PGCC to be used as a host compiler. We’ve asked several times, but they haven’t as of yet. Feel free to put a request in to NVIDIA. Not sure it will help, but wouldn’t hurt.

In the mean time, what you need to do is have your PGCC compiled code call your NVCC compiled CUDA C code.

  • Mat

My project is structured as follows:

  • main.c
  • compute.c (this file issues calls to the cuda runtime API)
  • kernels.cu

I can compile this project with Intel’s compiler for the *.c files and NVIDIA’s nvcc for the *.cu file and finally linking everything with Intel’s compiler.

If I try exactly the same with the pgi compiler I receive the following error, while compiling the compute.c file:

PGC-F-0249-#error --  --- !!! UNKNOWN COMPILER: please provide a CUDA compatible definition for '__align__' !!! --- (/usr/local/cuda/5.0.35/include/host_defines.h: 128)

That’s what you are talking about, right?

In the mean time, what you need to do is have your PGCC compiled code call your NVCC compiled CUDA C code.

Please correct me if I’m mistaken, my compute.c file is not allowed to include the cuda_runtime.h and make cuda api calls? So I have to move all
cuda api calls to a different file which needs to be compiled with the nvcc?

Best,
Paul

That’s what you are talking about, right?

Yes. The CUDA header files are configured to use GNU or MSC (I’m guessing icc sets some macros making it appear like gcc).

Please correct me if I’m mistaken, my compute.c file is not allowed to include the cuda_runtime.h and make cuda api calls? So I have to move all
cuda api calls to a different file which needs to be compiled with the nvcc?

Thanks correct. All CUDA API calls and headers need to be put in a separate source file which then can be compiled by NVCC.

  • Mat

Thanks for the clarification. It would be nice if either PGI or NVIDIA could do something about this in the future, since the current workaround is somehow inconvenient.

Best, Paul

It would be nice if either PGI or NVIDIA could do something about this in the future,

Please send in a request to NVIDIA. We keep asking but maybe having a few user ask as well would help.

  • Mat