cudaError at memory locat.... HELP

Hey

i am getting this error :

Microsoft C++ exception: cudaError at memory location …

and this is my .cu file

.
.
.
.
.
dim3 dimBlock(1, NUM_THREADS_Y);
dim3 dimGrid(1, grid_size);

	  // [1.M]
   	  CUDA_SAFE_CALL(cudaMalloc((void**)&esc_d, no_elements_Y * sizeof(int))); 
	  CUDA_SAFE_CALL(cudaMemcpy(esc_d, hmm->esc, no_elements_Y * sizeof(int), cudaMemcpyHostToDevice));
	 
	  //[0.MAXCODE-1][1.M]
   	  CUDA_SAFE_CALL(cudaMalloc((void**)&msc_d,  (MAXCODE-1) * no_elements_Y * sizeof(int))); 
	  CUDA_SAFE_CALL(cudaMemcpy(msc_d, hmm->msc, (MAXCODE-1)*  no_elements_Y * sizeof(int), cudaMemcpyHostToDevice));
	  
	  //[0.MAXCODE-1][1.M-1]
      CUDA_SAFE_CALL(cudaMalloc((void**)&isc_d, (MAXCODE-1) * no_elements_Y * sizeof(int))); 
	  CUDA_SAFE_CALL(cudaMemcpy(isc_d, hmm->isc[0], (MAXCODE-1) *  no_elements_Y * sizeof(int), cudaMemcpyHostToDevice));
	 	  
	  //[0.6][1.M-1] 
   	  CUDA_SAFE_CALL(cudaMalloc((void**)&tsc_d, (no_elements_Y - 1) * sizeof(int) * 6));
   	  CUDA_SAFE_CALL(cudaMemcpy(tsc_d, hmm->tsc[0],  (no_elements_Y - 1) * sizeof(int) * 6, cudaMemcpyHostToDevice));
   	  
	  CUDA_SAFE_CALL(cudaMalloc((void**)&imx_d, size_TOTAL));
	  CUDA_SAFE_CALL(cudaMemcpy(imx_d, imx[0],  size_TOTAL, cudaMemcpyHostToDevice));
	  
	  CUDA_SAFE_CALL(cudaMalloc((void**)&mmx_d, size_TOTAL)); 
	  CUDA_SAFE_CALL(cudaMemcpy(mmx_d, mmx[0],  size_TOTAL, cudaMemcpyHostToDevice));
	  
	  CUDA_SAFE_CALL(cudaMalloc((void**)&dmx_d, size_TOTAL));
	  CUDA_SAFE_CALL(cudaMemcpy(dmx_d, dmx[0],  size_TOTAL, cudaMemcpyHostToDevice));
	  
	  CUDA_SAFE_CALL(cudaMalloc((void**)&xmx_d, size_X));
	  CUDA_SAFE_CALL(cudaMemcpy(xmx_d, xmx[0],  size_X, cudaMemcpyHostToDevice));
	  //[1.L]  
	  CUDA_SAFE_CALL(cudaMalloc((void**)&dsq_d, sizeof(unsigned char) * no_elements_X)); 
	  CUDA_SAFE_CALL(cudaMemcpy(dsq_d, dsq,  sizeof(unsigned char) * no_elements_X, cudaMemcpyHostToDevice));
	  
	  //xsc[4][2]
	  CUDA_SAFE_CALL(cudaMalloc((void**)&xsc_d, 8 * sizeof(int))); 
	  CUDA_SAFE_CALL(cudaMemcpy(xsc_d, hmm->xsc[0],  8 * sizeof(int), cudaMemcpyHostToDevice));
	  
	  CUDA_SAFE_CALL(cudaMalloc((void**)&sc_d, sizeof(int))); 
	  CUDA_SAFE_CALL(cudaMemcpy(sc_d, sc,  sizeof(int), cudaMemcpyHostToDevice));
	   
	  //[1.M]
	  CUDA_SAFE_CALL(cudaMalloc((void**)&bsc_d, (no_elements_Y) * sizeof(int))); 
	  CUDA_SAFE_CALL(cudaMemcpy(bsc_d, hmm->bsc,  (no_elements_Y) * sizeof(int), cudaMemcpyHostToDevice));

	 
	  CUDA_SAFE_CALL(cudaMalloc((void**)&no_elements_Y_device, sizeof(int)));
	  CUDA_SAFE_CALL(cudaMemcpy(no_elements_Y_device, &no_elements_Y, sizeof(int), cudaMemcpyHostToDevice));
	  
	  CUDA_SAFE_CALL(cudaMalloc((void**)&no_elements_X_device, sizeof(int)));
	  CUDA_SAFE_CALL(cudaMemcpy(no_elements_X_device, &no_elements_X, sizeof(int), cudaMemcpyHostToDevice));	


			cudaEventRecord(start, 0);
			P7Viterbi_cuda_device_loop_one<<<dimGrid, dimBlock>>>(imx_d, mmx_d, dmx_d, no_elements_Y_device, no_elements_X_device);
			cudaEventRecord(stop, 0);	
				

			while( cudaEventQuery(stop) == cudaErrorNotReady );

//ASSERT(AfxCheckMemory());
// printf(“2here = %i”, mmx[0][2]);
// system(“PAUSE”);

	  dim3 dimBlock1(NUM_THREADS_X, NUM_THREADS_Y);
	  dim3 dimGrid1(get_grid_size_X(no_elements_X), get_grid_size_Y(no_elements_Y));				

	  //** Initialize stop and start variables
	  cudaEvent_t start1, stop1;
	  CUDA_SAFE_CALL( cudaEventCreate(&start1));
	  CUDA_SAFE_CALL( cudaEventCreate(&stop1));
	  
	  
		cudaEventRecord(start1, 0);
		P7Viterbi_cuda_device_loop_two<<<dimGrid1, dimBlock1>>>(imx_d, mmx_d, dmx_d, no_elements_Y_device, no_elements_X_device, xmx_d, dsq_d, sc_d, xsc_d, tsc_d, bsc_d, msc_d, isc_d, esc_d);

//print_error(cudaGetErrorString(cudaGetLastError()));

		cudaEventRecord(stop1, 0);	
 
	  while(cudaEventQuery(stop1) == cudaErrorNotReady );

// printf(“PAUSE”);
// system(“PAUSE”);

	  CUDA_SAFE_CALL(cudaMemcpy(imx[0], imx_d, size_TOTAL, cudaMemcpyDeviceToHost));
	  CUDA_SAFE_CALL(cudaMemcpy(mmx[0], mmx_d, size_TOTAL, cudaMemcpyDeviceToHost));
	  CUDA_SAFE_CALL(cudaMemcpy(dmx[0], dmx_d, size_TOTAL, cudaMemcpyDeviceToHost));
	  CUDA_SAFE_CALL(cudaMemcpy(xmx[0], xmx_d, size_X, cudaMemcpyDeviceToHost));
	  
	 
	  CUDA_SAFE_CALL(cudaFree(xmx_d));
	  CUDA_SAFE_CALL(cudaFree(imx_d));

.
.
.
.
.

The error happens in the second call to the device kernel

IS THERE SOMETHING WRONG WITH THIS…I HAVE BEEN DEBUGGING THIS FOR HOURS…

THANKS !!

put print_error(cudaGetErrorString(cudaGetLastError())); also after the first kernel (afther the while). It might be that your first kernel errored out, and that only gets noticed by starting the second kernel.

hi !

thank you for ur response

when I print it, i get this after the second kernel call

no error global function call is not configured out of memory initialization error unspecified launch failu
re unspecified launch failure in prior launch the launch timed out and was terminated too many resources requested for
launch invalid device function invalid configuration argument invalid device ordinal invalid argument invalid
pitch argument invalid device symbol mapping of buffer object failed unmapping of buffer object failed invalid host
pointer invalid

I dont know what it means, because it says first no_error and then it list a bunch of errors…

I just need the code to enter into

P7Viterbi_cuda_device_loop_two<<<dimGrid1, dimBlock1>>>(imx_d, …

so that i can start debugging there, but for some reason, it is not even entering there…it just skips it !

this happens in both emu and real modes

I have tested the first kernel call on isolationa and it works perfectly, but the second it is not even called, it just jumps through it…

I need to finish this as soon as posible I need to write a paper about all the performance improvements that i did and i need to have the software working.

Thank you !!

also, I have deleted the first call and replaced with cpu code and still it skips the second call to the kernel.

I just need to enter into it and then I can start debugging.

Apparently it is printing all the error-codes, so that code does not work.

Put a CUT_CHECK_ERROR() before and after both kernel calls, and compile in debug mode. These macros will print the real error message (if there is one), but only in debug mode.

the error is :

…/…/fast_algorithms.cu’ in line 282 : invalid configuration argument.

I figured out. It was not the code, it was that the number of treats I was using was too high.

Thank you very much !

you saved me

Hi,

I’m struggling with the same error message that started you thread of inquiry in March.

That is: cudaError at memory location …

Can you tell me what a treat is? How do you know when it is too high?


I just came across this problem, and sort of fixed it.

For the kernel call kernel<<<a,b>>> where a and b are of dim3 type, one of the dimensions of the dim3 is out of bounds for the device. In my case the a.z was more than 1.

Hope that helps.