Hi, I’m new here so excuse me if I do any mistake but I really need your help, my job depend on it.
Currently, I work on a big project with lot of calculations that take few minutes. The idea is to use CUDA to reduce that calculation time. It’s a C# program.
I create a CUDA DLL in C that I import in C#. The call to the DLL function and the link work. I check it while debugging. So in this DLL, I have a Distribution.cu, in this file (I entered in it), I call another function with CUDA thanks to that code :
Distribution_Kernel<<<1, block_size, z_max * sizeof(double)>>>(
dev_Alpha,
dev_Beta_j,
dev_Cf,
dev_Dcc,
dev_Dccm1,
dev_Dccm2,
dev_Dcr,
dev_Dm,
dev_Dpcc,
dev_Dpcc_temp,
dev_Dpe,
dev_Dpi,
dev_Dpmax,
dev_Dpp,
dev_Dw,
dev_Fac_Nk,
dev_Frc_Nk,
dev_Fon,
dev_Fon_max,
dev_J_max,
dev_L_Am,
dev_L_Am_max,
dev_L_da,
dev_Leff,
dev_Mt_Nk,
dev_Nb_r,
dev_Pmax,
dev_Pon,
dev_Pon_extr,
dev_Pon_max,
dev_Pon_maxIndex,
dev_Rg,
dev_Rp,
dev_SROg,
dev_SROp,
dev_TValues,
dev_Type_cr,
dev_X0r,
dev_Xbpe,
dev_Xm1,
dev_Xm2,
dev_Ybe,
dev_Yccm1,
dev_Yccm2,
dev_Ycr,
dev_Z,
dev_Zccm1,
dev_Zccm2,
dev_Zcr,
Beta[nk],
Da[nk],
Dr[nk],
*Eps_eq,
*Gamma,
*GammaM,
ringType,
nk,
31,
z_max,
nb_Rmax,
*N_co,
nb_cas
);
cudaStatus = cudaGetLastError();
if (cudaStatus != cudaSuccess)
{
printf("Distribution_Kernel launch failed : %s\n", cudaGetErrorString(cudaStatus));
}
I precise that variable in arg was well created before. So yes, I call a Kernel in a kernel, I know it’s possible since 3.5. I’m so sorry, I would like to give you more but it’s a top classified code for my company.
Unfortunaly when I execute that, I have that cudaStatus 8 : cudaErrorInvalidDeviceFunction which mean “The requested device function does not exist or is not compiled for the proper device architecture” according to http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__TYPES_g3f51e3575c2178246db0a94a430e0038.html. But it exist. I even try to put my function in the same file, it doesn’t work either. While I built, I check several architure : compute_50,sm_50 ou compute_35,sm_35 ( i can’t do less otherwise I can’t call an kernel from another kernel). I have a GeForce GTX 760 with her last update installed. I got the “Generate Relocatable Device Code” to “Yes(-rdc=true)”
I’m stuck, I don’t know why I have this “Invalid device function” error. Please can you help me…Please.