I’ve been trying to convert heavily SSE/MMX optimized code to CUDA.
I want to specifically target a loop where most of the time is spend.
Unfortunately at this point I can’t even compile my code under CUDA (there is no kernel code yet - I’m just trying to compile original code using nvcc)
I run into problems with SSE/MMX instructions.
I get tons of compile errors like this:
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(48): error: identifier “__builtin_ia32_emms” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(61): error: identifier “__builtin_ia32_vec_init_v2si” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(90): error: identifier “__builtin_ia32_vec_ext_v2si” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(114): error: identifier “__builtin_ia32_packsswb” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(129): error: identifier “__builtin_ia32_packssdw” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(144): error: identifier “__builtin_ia32_packuswb” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(158): error: identifier “__builtin_ia32_punpckhbw” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(172): error: identifier “__builtin_ia32_punpckhwd” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(186): error: identifier “__builtin_ia32_punpckhdq” is undefined
…
I’m developing under LINUX using CUDA 2.0
Any suggestions? Is SSE/MMX support broken in nvcc?
(I know that one option would be to remove all SSE/MMX code but I don’t want to got that way. There is a lot of SSE/MMX code that is called outside the main loop that I want to optimize)
Thanks