NVCC compile errors when using SSE intrinsic functions, with GCC as host compiler

Uncle_Joe · April 11, 2013, 1:39am

Hello, I’m getting compile errors when compiling this code:

#include <emmintrin.h>
int main()
{
  __m128i x;
  int y;
  y = _mm_extract_epi16(x, 1);
  return 0;
}

the error message is

a.cu(13): error: identifier “__builtin_ia32_vec_ext_v8hi” is undefined

I’m using the CUDA 5 toolkit on SUSE 11 enterprise (gcc 4.3), which should be a supported platform. I also tried using gcc 4.6, but the same error occurs.

Can this problem be fixed or will I just have to avoid this combination?
Also, is this compile error coming from NVCC itself or gcc ?

njuffa · April 11, 2013, 2:51am

I am aware that there were some issues with the use of SSE intrinsic header files in the host portion of CUDA programs (.cu files) in the past. Not sure what the issue was, may have had to do with certain #ifdefs used inside the header files, but that is just speculation. The easy workaround was to put all host code containing SSE intrinsics into separate C or C++ source files.

However, more recently I did not encountered any problems when using xmmintrin.h (SSE) in the host portion of my CUDA program (across Linux, Windows, and MacOS X), but I have not tried emmintrin.h (SSE2). You probably need to enable SSE2 by passing the -msse2 command line flag to gcc, which may also take care of necessary #ifdefs in the intrinsic header files. To do so from the nvcc commandline, use -Xcompiler -msse2.

Uncle_Joe · April 11, 2013, 3:49am

I tried -msse2, but it didn’t help. Besides, SSE2 is part of the x86-64 instruction set, so it’s always on. I also tried it for other intrinsic functions that don’t use __builtin* functions and those compile fine.

I have a feeling it’s a problem with the NVIDIA compiler. It shouldn’t be compiling that code anyways, especially if it uses compiler intrinsic functions.

If I don’t see any solution soon, I’m going to file a bug report.

njuffa · April 11, 2013, 4:09am

Host code must be pre-processed by the CUDA compiler before it is passed to the host compiler. There could be an issue with that pre-processing, or maybe nvcc does not pass some flag to the host compiler that is required for the successful processing of SSE intrinsics.

Filing a bug report (through the registered developer website) with a self-contained repro program is the best approach to getting this resolved. Thank you for your help.

For now, as a workaround, you can simply separate host code containing SSE intrinsic into a separate file that is directly compiled by the host compiler.

tbenson · April 11, 2013, 2:02pm

FWIW, I was able to compile the above without error (using ‘nvcc ssetest.cpp’) with the CUDA 5.0 toolkit and gcc versions 4.5.1 and 4.6.3 as the host compilers.

Uncle_Joe · April 15, 2013, 10:54pm

tbenson, can you post the code for _mm_extract_epi16() in your emmintrin.h ?

Mine is like this:

#ifdef __OPTIMIZE__
extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_extract_epi16 (__m128i const __A, int const __N)
{
  return __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, __N);
}

Then I noticed the intrinsic does not follow the naming pattern of the others, which is to use the same machine instruction mnemonic from Intel. I searched for the symbol __builtin_ia32_vec_ext_v8hi in the CUDA compiler and didn’t find it, but sure enough, I found __builtin_ia32_pextrw, where pextrw is the real name of the instruction.

I tried #define __builtin_ia32_vec_ext_v8hi __builtin_ia32_pextrw, but that didn’t work because __builtin_ia32_pextrw only takes __m64, instead of __m128i.

Also, can you grep to see if __builtin_ia32_vec_ext_v8hi is defined in the CUDA compiler?

Thank you

tbenson · April 16, 2013, 5:30pm

Uncle Joe,

My _mm_extract_epi16() function is the same except for a type conversion (although removing it had no effect):

#ifdef __OPTIMIZE__
extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_extract_epi16 (__m128i const __A, int const __N)
{
  return (unsigned short) __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, __N);
}

I do not follow the question about the builtin function being defined in the CUDA compiler. I ran strings and objdump on nvcc just to check, but I would expect the builtin functions to be defined by the host compiler to which nvcc will delegate compiling host code.

If you compile with ‘nvcc -Xcompiler=-v ssetest.c’, then it should give verbose output from the host compiler, which will include the path to the compiler executable. In my case, the host compiler is gcc with corresponding executable /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/cc1.

I can then find the referenced builtin in that executable:

strings /usr/libexec/gcc/x86_64-redhat-linux/4.6.2/cc1 | grep builtin_ia32_vec_ext_v8hi
__builtin_ia32_vec_ext_v8hi

Hope that helps

Uncle_Joe · April 17, 2013, 10:31pm

I’ve filed a bug report to NVIDIA and a helpful representative was able to reproduce the problem and says that it has been fixed in CUDA 5.5, which will be out soon.

Right, I’m puzzled too why NVCC is complaining about code that it’s not even suppose to compile, but apparently, it has stubs for all the GCC builtin functions.

Topic		Replies	Views
SSE/MMX compilation using nvcc CUDA Programming and Performance	2	7525	December 5, 2008
Compilation Errors with GCC Versions 11-14 and CUDA Toolkit 12.5/12.6 Due to Undefined `__builtin_ia32_ldtilecfg` and `__builtin_ia32_sttilecfg`, etc GPU-Accelerated Libraries cuda	2	1064	October 15, 2024
nvcc with avx support cannot find gcc builtin intrinsics CUDA Programming and Performance	5	3407	October 12, 2014
Including CUB header breakes compilation with GCC 12 and SSE2 or better CUDA Programming and Performance	3	661	August 30, 2023
Compiler Error saying 'identifier "__builtin_ia32_serialize" is undefined' while running CUDA CUDA Programming and Performance	4	1266	October 17, 2023
GPU SDK 3.0-beta1 on GCC4.4 Problem with stdarg.h appears with gcc4.4 CUDA Programming and Performance	10	25870	June 3, 2010
Intel compiler support for front-end CUDA compilation CUDA Programming and Performance	11	15406	January 27, 2010
intrinsic SSE Legacy PGI Compilers	1	4729	February 22, 2006
Missing or broken MMX intrinsics Legacy PGI Compilers	7	8692	May 8, 2017
cant compile with opencv lib? CUDA Programming and Performance	4	11883	July 25, 2008

NVCC compile errors when using SSE intrinsic functions, with GCC as host compiler

Related topics