How to produce OpenCL executable on an NVidia card?

I started to use pgcc compiler lately, and I have 2 questions:

  1. Is it possible to produce OpenCL code on an Nvidia card
  2. Is it possible to acquire the kernel code that generated (.cu or .cl file) by pgcc?

Hi Jing Li,

For NVIDIA devices, we only target CUDA C or LLVM. Though we target OpenCL or LLVM when targeting AMD device.

To see the generated device code, use the “keep” sub-option: “-ta=:keep”.

The kernels will be located in the “filename.*.gpu” files.

Hope this helps,
Mat

FYI, here’s the current 14.6 list of “-ta” sub-options:

% pgfortran -help -ta
-ta=tesla:{[no]autocollapse|[no]fma|[no]flushz|keep|llvm|loadcache:{L1|L2}|[no]unroll|maxregcount:<n>|[no]rdc|[no]required|cc1x|tesla|cc1+|tesla+|cc2x|fermi|cc2+|fermi+|cc3x|kepler|cc3+|kepler+|fastmath|pin|cuda5.5|cuda6.0}|nvidia|radeon:{keep|llvm|[no]unroll|[no]required|tahiti|capeverde|spectre|buffercount:<n>}|host
                    Choose target accelerator
    tesla           Select NVIDIA Tesla accelerator target
     [no]autocollapse
                    Automatically collapse tightly nested loops
     [no]fma        Generate fused mul-add instructions (default at -O3)
     [no]flushz     Enable flush-to-zero mode on the GPU
     keep           Keep kernel files
     llvm           Use LLVM back end; disables cc1x
     loadcache      Choose what hardware level cache to use for global memory loads
      L1            Use L1 cache
      L2            Use L2 cache
     [no]unroll     Enable automatic inner loop unrolling (default at -O3)
     maxregcount:<n>
                    Set maximum number of registers to use on the GPU
     [no]rdc        Generate relocatable device code
     [no]required   Issue compiler error if the compute regions fail to accelerate
     cc1x|tesla     Compile for compute capability 1.x
     cc1+|tesla+    Compile for compute capability 1.x and above
     cc2x|fermi     Compile for compute capability 2.x
     cc2+|fermi+    Compile for compute capability 2.x and above (default)
     cc3x|kepler    Compile for compute capability 3.x
     cc3+|kepler+   Compile for compute capability 3.x and above
     fastmath       Use fast math library
     pin            Set default to pin host memory
     cuda5.5        Use CUDA 5.5 Toolkit compatibility
     cuda6.0        Use CUDA 6.0 Toolkit compatibility
    nvidia          nvidia is a synonym for tesla
    radeon          Select AMD Radeon GPU accelerator target
     keep           Keep kernel source files
     llvm           Use LLVM/SPIR back end
     [no]unroll     Enable automatic inner loop unrolling (default at -O3)
     [no]required   Issue compiler error if the compute regions fail to accelerate
     tahiti         Compile for Radeon Tahiti architecture (default)
     capeverde      Compile for Radeon Capeverde architecture
     spectre        Compile for Radeon Spectre architecture
     buffercount:<n>
                    Set max number of device buffers used by OpenCL kernel
    host            Compile for the host, i.e., no accelerator target

Hi Mat,
Thanks for your quick response. So PGI compiler does not support generating OpenCL code that can run on a CUDA enabled device, am I correct on this?

So PGI compiler does not support generating OpenCL code that can run on a CUDA enabled device, am I correct on this?

Correct. Given CUDA is available on NVIDIA device and the underlying OpenACC device code should transparent to the user, there was no reason to support OpenCL target generation.

  • Mat