When I dump the PTX internally compiled with OpenCL toolkit, the version reported is PTX 1.5. I cannot seem to find a specification for this version of PTX. Does anyone know where this specification lives?
The PTX description of an OpenCL function is also much more complicated than its CUDA counterpart. I hope it does not mean that it is much less optimized.