Hello,
I’m trying to convert a C++ program over to use CUDA for a research project.
I have all the code compiling and building but when running the program the kernel is never launched and exits with a “Invalid device function”
However since the kernel doesn’t launch i can’t debug what is wrong. So after digging around I found that there is a limit to the size of parameters you can pass into a kernel call. currently I have 46 double parameters for input (all are required for the program) yet the max limit is 256 byte maximum. I’m trying to implement a struct to pass the parameters since that was the option I found to be suggested by by others online.
However after implementing the struct I am still getting the same error message. I’m not sure if I have created the struct correctly for CUDA use though. I was wondering if someone could help with my struct and getting it to work the way I need it to.
I have been using the example at http://forums.nvidia.com/index.php?showtopic=106708 as a model of my own code. The code for it that I am using now is:
//structure
typedef struct inputStruct{
double *dev_beta; //temp K, beta=1/kT (kcal/mole)
int *dev_nTotal; //number of iterations
double *dev_xVn, *dev_yVn, *dev_zVn; //xyz values for n'th iteration
double *dev_thetaXn, *dev_thetaYn, *dev_thetaZn; //xyz angle values
double *dev_dXn, *dev_dYn, *dev_dZn; //dx, dy, dz values for n'th iteration
double *dev_dthetaXn, *dev_dthetaYn, *dev_dthetaZn; //angle values for n'th iteration
double *dev_dXmax, *dev_dYmax, *dev_dZmax; //max diff between xyz values for molecule 1 and 2
double *dev_dthetaXmax, *dev_dthetaYmax, *dev_dthetaZmax; //max diff between angles for molecule 1 and 2
double *dev_dispMC, *dev_dispTheta, *dev_maxDistance;
//double *dev_results[blocks * threads];
//electrostatic
double *dev_I1x, *dev_I1y, *dev_I1z; //Isomer 1 xyz values 1
double *dev_SP1x, *dev_SP1y, *dev_SP1z; //Isomer 1 xyz values 2
double *dev_I2x, *dev_I2y, *dev_I2z; //Isomer 2 xyz values 1
double *dev_SP2x, *dev_SP2y, *dev_SP2z; //Isomer 2 xyz values 2
double *dev_QI1, *dev_QSP1, *dev_QI2, *dev_QSP2; //charges: Isomer 1 - 1, Isomer 1 - 2, Isomer 2 - 1, Isomer 2 - 2
//vandervals
double *dev_rStarI1, *dev_rStarI2; //rStar for both isomers
double *dev_epsilonI1, *dev_epsilonI2; //epsilon for both isomers
double *dev_seed;
curandState *state;
curandState *dev_state;
};
int main(){
cudaSetDevice(0);
check_cuda_errors(__FILE__, __LINE__);
inputStruct host_struct;
inputStruct *dev_struct;
maxDistance = 20.01; //max allowed distance between analyte and SP
cudaMalloc((void**)&host_struct.dev_maxDistance, size * sizeof(double));
cudaMemcpy(host_struct.dev_maxDistance, &maxDistance, size * sizeof(double), cudaMemcpyHostToDevice);
//other cudaMalloc/cudaMemcpy calls
cudaMalloc((void**)&dev_struct, sizeof(inputStruct));
cudaMemcpy(dev_struct, &host_struct, sizeof(inputStruct), cudaMemcpyHostToDevice);
kernel<<<blocks,threads>>>(dev_struct);
//kernel
__global__ void kernel(inputStruct *dev_struct){
//do stuff
}
I’m referencing the variables in the struct by calling *dev_struct->dev_maxDistance
It compiles and builds but still exits with the same error.
Any suggestions? I really need the help.
Thank you!