OpenCL linux header files; OpenCL status

Hello,
I have an AVX compatible CPU. The header files for NVIDIA OpenCL bundled with the latest Ubuntu 12.10 don’t compile with GCC 4.7.2 using the ‘-march=native’ compile flag because the ‘/usr/include/nvidia-current/CL/cl_platform.h’ header includes gmmintrin.h instead of immintrin.h.

I hope you will be adding OpenCL 1.2 support soon. As it is I am offloading some computations to the CPU namely because I can’t invoke popcount in the NVIDIA OpenCL implementation although it works fine in the CPU implementations from either Intel or AMD and your hardware supports it internally.

I don’t understand why it is taking so long for NVIDIA to release a product compliant with OpenCL 1.2 given that even when we had Cg you still provided top notch GLSL support. Some of us aren’t interested in using a single-vendor programming language which probably won’t be in use a decade from now.

So I spent some time reading the documentation and it turns out you can use inline PTX assembly in OpenCL. Then I wrote this wrapper so I can invoke the NVIDIA hardware population count instruction inside OpenCL:

inline uint popcnt(const uint i) {
  uint n;
  asm("popc.b32 %0, %1;" : "=r"(n) : "r" (i));
  return n;
}

Cumbersome and non portable but hey, it works. Now I just need to implement reduce so I can count the number of bits in an 1D array.

Amazing! How did you solve this? Which documentation have you read, Nvidia’s or OpenCL’s?

Do you have to include some openCL pragma extension; or do you look at the compiled code and insert it there; something else entirely?

What I get is the following error: implicit declaration of function ‘asm’ is invalid in OpenCL

Can you point us to your full solution? Other people are interested: cuda - popcnt in OpenCL? - Stack Overflow

What I wrote works as is. I am using the Linux NVIDIA OpenCL implementation bundled with Ubuntu.
I think it should work in any other system which uses nvcc to compile OpenCL source code.

Of course the ‘asm’ construct might not work on OpenCL implementations other than NVIDIAs.

What did I read? The NVIDIA OpenCL example oclInlinePTX and the PTX documentation. nvcc supports the GCC inline assembly construct.

nVidia drivers release 350 just added OpenCL 1.2 support, according to the heise newsticker. yay.