I started to use pgcc compiler lately, and I have 2 questions:
- Is it possible to produce OpenCL code on an Nvidia card
- Is it possible to acquire the kernel code that generated (.cu or .cl file) by pgcc?
I started to use pgcc compiler lately, and I have 2 questions:
Hi Jing Li,
For NVIDIA devices, we only target CUDA C or LLVM. Though we target OpenCL or LLVM when targeting AMD device.
To see the generated device code, use the “keep” sub-option: “-ta=:keep”.
The kernels will be located in the “filename.*.gpu” files.
Hope this helps,
Mat
FYI, here’s the current 14.6 list of “-ta” sub-options:
% pgfortran -help -ta
-ta=tesla:{[no]autocollapse|[no]fma|[no]flushz|keep|llvm|loadcache:{L1|L2}|[no]unroll|maxregcount:<n>|[no]rdc|[no]required|cc1x|tesla|cc1+|tesla+|cc2x|fermi|cc2+|fermi+|cc3x|kepler|cc3+|kepler+|fastmath|pin|cuda5.5|cuda6.0}|nvidia|radeon:{keep|llvm|[no]unroll|[no]required|tahiti|capeverde|spectre|buffercount:<n>}|host
Choose target accelerator
tesla Select NVIDIA Tesla accelerator target
[no]autocollapse
Automatically collapse tightly nested loops
[no]fma Generate fused mul-add instructions (default at -O3)
[no]flushz Enable flush-to-zero mode on the GPU
keep Keep kernel files
llvm Use LLVM back end; disables cc1x
loadcache Choose what hardware level cache to use for global memory loads
L1 Use L1 cache
L2 Use L2 cache
[no]unroll Enable automatic inner loop unrolling (default at -O3)
maxregcount:<n>
Set maximum number of registers to use on the GPU
[no]rdc Generate relocatable device code
[no]required Issue compiler error if the compute regions fail to accelerate
cc1x|tesla Compile for compute capability 1.x
cc1+|tesla+ Compile for compute capability 1.x and above
cc2x|fermi Compile for compute capability 2.x
cc2+|fermi+ Compile for compute capability 2.x and above (default)
cc3x|kepler Compile for compute capability 3.x
cc3+|kepler+ Compile for compute capability 3.x and above
fastmath Use fast math library
pin Set default to pin host memory
cuda5.5 Use CUDA 5.5 Toolkit compatibility
cuda6.0 Use CUDA 6.0 Toolkit compatibility
nvidia nvidia is a synonym for tesla
radeon Select AMD Radeon GPU accelerator target
keep Keep kernel source files
llvm Use LLVM/SPIR back end
[no]unroll Enable automatic inner loop unrolling (default at -O3)
[no]required Issue compiler error if the compute regions fail to accelerate
tahiti Compile for Radeon Tahiti architecture (default)
capeverde Compile for Radeon Capeverde architecture
spectre Compile for Radeon Spectre architecture
buffercount:<n>
Set max number of device buffers used by OpenCL kernel
host Compile for the host, i.e., no accelerator target
Hi Mat,
Thanks for your quick response. So PGI compiler does not support generating OpenCL code that can run on a CUDA enabled device, am I correct on this?
So PGI compiler does not support generating OpenCL code that can run on a CUDA enabled device, am I correct on this?
Correct. Given CUDA is available on NVIDIA device and the underlying OpenACC device code should transparent to the user, there was no reason to support OpenCL target generation.