SSE/MMX compilation using nvcc

I’ve been trying to convert heavily SSE/MMX optimized code to CUDA.
I want to specifically target a loop where most of the time is spend.
Unfortunately at this point I can’t even compile my code under CUDA (there is no kernel code yet - I’m just trying to compile original code using nvcc)
I run into problems with SSE/MMX instructions.
I get tons of compile errors like this:
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(48): error: identifier “__builtin_ia32_emms” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(61): error: identifier “__builtin_ia32_vec_init_v2si” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(90): error: identifier “__builtin_ia32_vec_ext_v2si” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(114): error: identifier “__builtin_ia32_packsswb” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(129): error: identifier “__builtin_ia32_packssdw” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(144): error: identifier “__builtin_ia32_packuswb” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(158): error: identifier “__builtin_ia32_punpckhbw” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(172): error: identifier “__builtin_ia32_punpckhwd” is undefined
/usr/lib/gcc/x86_64-linux-gnu/4.1.2/include/mmintrin.h(186): error: identifier “__builtin_ia32_punpckhdq” is undefined

I’m developing under LINUX using CUDA 2.0
Any suggestions? Is SSE/MMX support broken in nvcc?
(I know that one option would be to remove all SSE/MMX code but I don’t want to got that way. There is a lot of SSE/MMX code that is called outside the main loop that I want to optimize)

Thanks

Just compile your performance tuned SSE code in the normal compiler, GPU kernel code in nvcc and link the objects together.

Does your code compile in regular gcc? I think the problem is that MMX is not really supported on x86_64. You should convert your code to true SSE. You’ll get a nice performance boost from that.