My _mm_extract_epi16() function is the same except for a type conversion (although removing it had no effect):
extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_extract_epi16 (__m128i const __A, int const __N)
return (unsigned short) __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, __N);
I do not follow the question about the builtin function being defined in the CUDA compiler. I ran strings and objdump on nvcc just to check, but I would expect the builtin functions to be defined by the host compiler to which nvcc will delegate compiling host code.
If you compile with ‘nvcc -Xcompiler=-v ssetest.c’, then it should give verbose output from the host compiler, which will include the path to the compiler executable. In my case, the host compiler is gcc with corresponding executable /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/cc1.
I can then find the referenced builtin in that executable:
strings /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/cc1 | grep builtin_ia32_vec_ext_v8hi
Hope that helps