Hello,
I have an AVX compatible CPU. The header files for NVIDIA OpenCL bundled with the latest Ubuntu 12.10 don’t compile with GCC 4.7.2 using the ‘-march=native’ compile flag because the ‘/usr/include/nvidia-current/CL/cl_platform.h’ header includes gmmintrin.h instead of immintrin.h.
I hope you will be adding OpenCL 1.2 support soon. As it is I am offloading some computations to the CPU namely because I can’t invoke popcount in the NVIDIA OpenCL implementation although it works fine in the CPU implementations from either Intel or AMD and your hardware supports it internally.
I don’t understand why it is taking so long for NVIDIA to release a product compliant with OpenCL 1.2 given that even when we had Cg you still provided top notch GLSL support. Some of us aren’t interested in using a single-vendor programming language which probably won’t be in use a decade from now.
So I spent some time reading the documentation and it turns out you can use inline PTX assembly in OpenCL. Then I wrote this wrapper so I can invoke the NVIDIA hardware population count instruction inside OpenCL:
What I wrote works as is. I am using the Linux NVIDIA OpenCL implementation bundled with Ubuntu.
I think it should work in any other system which uses nvcc to compile OpenCL source code.
Of course the ‘asm’ construct might not work on OpenCL implementations other than NVIDIAs.
What did I read? The NVIDIA OpenCL example oclInlinePTX and the PTX documentation. nvcc supports the GCC inline assembly construct.