a question: LLVM IR for CUDA code

Hi All,

This is my first post.
Actually I not sure whether this is the correct place for this question.

So here is what I wanna do:
Say I have a *.cu code.
I plan to obtain the *.bc files, and then using these two bitcode files, compile and link in order to get an executable.

For example my source file is axpy.cu.
And I do the following:

clang++ -g -emit-llvm axpy.cu -c
( The command above generates two *.bc file. I name them axpy.bc and kernel.bc. )

llc axpy.bc -filetype=obj -o axpy.o
llc kernel.bc -filetype=asm -o kernel.ptx
nvcc --device-c kernel.ptx -o kernel.o
nvcc -arch=sm_20 -dlink axpy.o kernel.o -o gpucode.o
g++ gpucode.o axpy.o -L/usr/local/cuda/lib64 -lcudart -o exe

Everything goes well until the last one throws an error:
gpucode.o: In function __cudaRegisterLinkedBinary_[somthing]_kernel_o_680fffcc': link.stub:(.text+0x50): undefined reference to _fatbinwrap[something]_kernel_o_680fffcc’

Could someone point me where the problem is?
Any help or suggestion is appreciated!

thank you,