Why is function cudaLaunchKernel passed a host-code function pointer?

I use the following command to preprocess this file.

nvcc --cuda axpy.cu -o axpy.cu.cpp.ii

However, in axpy.cu.cpp.ii, I don’t understand why function

void __device_stub__Z4axpyfPfS_(float __par0, float *__par1, float *__par2)

passes a function pointer void ( *)(float, float *, float *))axpy to

 cudaLaunchKernel(const T *func, dim3 gridDim, dim3 blockDim, void ** args, size_t sharedMem = 0, cudaStream_t stream = 0)

Shouldn’t cudaLaunchKernel have accepted an function pointer to kernel function?
Accroding to The CUDA Compilation Trajectory, axpy.cu.cpp.ii should have included .cudaf1.stub.c, in which kernel function (axpy) has been defined. However, within axpy.cu.cpp.ii, it defines a function with the same name as kernel function:

void axpy (float __cuda_0, float *__cuda_1, float* __cuda2){
   __device_stub__Z4axpyfPfS_(__cuda_0, __cuda_1,__cuda_2);
}

So my question are:

  1. Why does axpy.cu.cpp.ii define such a function with the same name as kernel function? Is it possible to overload the kernel function?
  2. What is calling logic of host code to device code? How does it finish?
1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.