//~ Timer 1
StartTimer(&timer1);
numLoops = iDivUp(numLoops,16)*16;
int numPts = data->numPts;
int numPtsUp = iDivUp(numPts, 16)*16;
float *d_coord, *d_homo;
int *d_randPts, h_randPts;
int randSize = 7sizeof(int)numLoops;
cudaMalloc((void **)&d_randPts, randSize); <=========================================================
======= Statement I
CUDA_SAFE_CALL(cudaMalloc((void **)&d_homo, 9sizeof(float)numLoops)); <======================================== Statement II
h_randPts = (int)malloc(randSize);
int *validPts = (int *)malloc(sizeof(int)*numPts);
int numValid = 0;
//validPts is a list of index of all valid pts
numValid = numPts;
gpuTime1 = StopTimer(timer1);
//~ -----------------
printf("\n%f ", gpuTime1);
Now, when I run my program, without Statement I and II, I get gpuTime1 == 0.013000
When I run it with Statement I = 44.78
When I run it with Statement II = 43.34
When I run it with Statement I and II = 44.98
My question is, why dont the time add up, and is this is normal behaviour (about 40 msec for cudaMalloc?) Am I doing something wrong?
The very first substantial CUDA call incurs the cost of initializing the runtime. The suggestion to insert a dummy “cudaFree(0);” is interesting. This won’t crash anything, now or in the future?
nope, it doesn’t. cudaSetDevice was changed in 2.1 to return an error when you call it after a context has been created. in your case, I think the context never went away at all (are mex files executed from the same thread as the main matlab computation thread?), so successive calls broke things (so calling cudaThreadExit destroyed the context and life was good upon successive calls).
Yeah, mex files seem to live in the same thread as matlab. And I was indeed not thinking straight, I should put a block on forum posting after 23:00 ;)