Is there any way to force NVIDIA’s OpenCL compiler to use LDG instruction?
I’ve tried to:
- use const restrict;
- explicitly use ld.global.nc via inline assembly.
None of these approaches work for me.
Moreover, it seems like NVIDIA’s OpenCL compiler generate only PTX 3.0 code and there is no way to force usage of PTX 3.1. So all attempts to use ld.global.nc via inline assembly failed, because it is only available starting from 3.1. I’ve tried to pass “-cl-nv-arch sm_35” and “-cl-nv-arch compute_35” options to OpenCL compiler, but first one did not affect anything while usage of compute_35 ended up with ptxas error about incompatibility of PTX ISA 3.0 and compute_35 arch.
I’m a bit confused, because NVIDIA’s OpenCL compiler most likely based on LLVM and an appropriate beckend (NVPTX) supports PTX ISA 3.1.
Maybe anyone know how to use LDG in OpenCL kernels?
BTW, I’m using CUDA 5.5 and NVIDIA 327 driver on Ubuntu Linux 13.04.