I use the following command to preprocess this file.
nvcc --cuda axpy.cu -o axpy.cu.cpp.ii
However, in axpy.cu.cpp.ii
, I don’t understand why function
void __device_stub__Z4axpyfPfS_(float __par0, float *__par1, float *__par2)
passes a function pointer void ( *)(float, float *, float *))axpy
to
cudaLaunchKernel(const T *func, dim3 gridDim, dim3 blockDim, void ** args, size_t sharedMem = 0, cudaStream_t stream = 0)
Shouldn’t cudaLaunchKernel
have accepted an function pointer to kernel function?
Accroding to The CUDA Compilation Trajectory, axpy.cu.cpp.ii
should have included .cudaf1.stub.c
, in which kernel function (axpy) has been defined. However, within axpy.cu.cpp.ii
, it defines a function with the same name as kernel function:
void axpy (float __cuda_0, float *__cuda_1, float* __cuda2){
__device_stub__Z4axpyfPfS_(__cuda_0, __cuda_1,__cuda_2);
}
So my question are:
- Why does
axpy.cu.cpp.ii
define such a function with the same name as kernel function? Is it possible to overload the kernel function? - What is calling logic of host code to device code? How does it finish?