how execute CUDA kernel using C script

In my program I want to run CUDA kernel using my C script but I tried several ways however each time I end up with lot of errors can anyone help me to do this (In my program there are 12 CUDA kernels which should be execute through standard C header file)

e.g if I want add two numbers; in C file I allocate device memory and pass those references in to the CUDA kernel (.cu) after adding I print result using printf that method also is include in C file

how I do that example can anyone give sample code fragments or any clue its great help to me.

Thanks in advanced.

When you say “a lot of errors”, what do you mean? Are you having problems compiling the code, or problems running it?

I have errors in compile time I use nvcc command to compile my whole program but in compile time it give cuda.h not founding error (also not recognize cutil.h and set of errors which unable to find CUDA functions) and it does not recognize <<< >>> syntax also. But when compile and execute CUDA kernels only then it properly compile and give correct results thats the reason.

Rename the file containing the cuda code so it ends with a .cu extension. NVCC uses the extenstion to determine the compilation rules.


in my source code structure main method is located in C file thats the reason I try to combine both C source code and CUDA kernel code

kernel code segment is already in .cu extension but device memory pointers are in C header because in my application I should keep track of how use CUDA memory pointers If I convert it in to the .cu how I link .cu file with my C file which include main method.

Thanks your great help avidday please help me to solve this problem.

If I understand correctly, you’re just looking to have your .c code call your .cu kernels, right? If so, I think the easiest method is to declare your kernels in your .cu file, in addition to a extern regular c-wrapper function which will invoke your kernel, then call that wrapper function form your host c code. (this is assuming you’re using the Runtime API)


in your .cu file:

__global__ void yourKernel(...){ ... }


extern "C" yourKernelWrapper(...){



then in your .c file:

extern "C" void yourKernelWrapper(...);


int main(){





Hope I got what you were asking correctly, no offense but it took a few reads to decipher…