So, I have the following code
int sum = 0;
//Create Arrays for CPU float *cpuA; //freed float *cpuB; //freed float *cpuC; //freed //Create Arrays for the GPU float *gpuA; //freed float *gpuB; //freed float *gpuC; //freed //Create Vectors for various functions float *vector; //freed float *vector2; //freed float *meanVectorGPU; //freed float *meanVectorCPU; //freed //Create i, j for various loops int i, j; //Declare sizes for the arrays int nRows = 8100; int nColumns = 8100; //Used for call to kernel, so that threads does not exceed 512 dim3 threads2(nColumns); dim3 grid2(nColumns); dim3 threads(nRows,nColumns); dim3 grid(nRows,nColumns); const dim3 dimBlock(1); float divisor = ceil((float)nRows*(float)nColumns/256.0f)+1; int dim = ceil(sqrt((float)(nColumns*nRows)/divisor)); const dim3 dimGrid(dim, dim); //Create the items for the timer unsigned int timer = 0; unsigned int elapsed = 0; CUT_SAFE_CALL(cutCreateTimer(&timer)); CUT_SAFE_CALL(cutStartTimer(timer)); //Initialize cutil CUT_DEVICE_INIT();
The problem I am having is that my code works wonderfully with CUT_DEVICE_INIT() commented out (I have more code below this, but this is where the error is). However, if I call CUT_DEVICE_INIT() then the size of my nColumns & nRows can only be 1581 each. The same thing happens if I use cublasInit() . I am wondering if it is because of the slow initialization time for each, causing certain portions of the code to attempt to be run before initialization is complete.