CUDA code randomly works, and returns wrong results

Hi. Is it possible that I am having problems due to fault hardware (specifically motherboard)?

I checked and my CUDA toolkit is correctly installed (SDK version, drives etc). I just bought a new video board this week, with compute capability 3.5 (Kepler), but my program still works randomly and when it works, it returns wrong results. I have a OpenMP version of the code (which works correctly), and checked many times the “translation”, there is nothing wrong apparently, I really cannot find kernel errors. Sample CUDA code of the sum of two arrays, both with size 105000000, works correctly though. I would appreciate any enlightenment.

I would suspect your code first, before focusing on HW or system/infrastructure. That’s just a general statement; obviously I have no knowledge of your code. But the CUDA sum of two arrays working correctly is a reasonable test of the system.

Good practices here are to make sure your code uses proper CUDA error checking (google that phrase, take the first hit, apply it to your code) and also run your code with cuda-memcheck or compute-sanitizer. If any errors are reported by any of that, start your debug focus there.

Thank you for the fast reply.

Actually, any CUDA call I make, the program just closes randomly. So, if I try to call cudamemcheck, it may or may not work, but I am assured that it is not a memory problem (int the sense of total storage in device), as I need much less than 1GB in memory (about 0.007 GB).

There are any number of possibilities such as improper use of managed memory, stack corruption in your program and many others, that we are unlikely to be able to zero in on with a sequence of questions.

Another possible approach would be to strip your code down to a minimal reproducer and share that code here; if you wish. The community may spot something for you. Even if you don’t post it here, this is often a good debugging practice to narrow the scope of your focus.

If your claim is that any CUDA call you make, the program closes randomly, then it should only be necessary to write about 5 lines of code to see if that holds. If those 5 lines work reliably, keep adding more of your code until the problem appears. That is just one possible approach based on what you’ve shared so far.

If you decide to provide an example, I would be sure to do the following for the best possible help:

  1. Provide a minimal but complete code. I should be able to copy, paste, compile, and run the code you post without having to change anything or add anything.
  2. Test to see that the code you post actually demonstrates the problem in your case.
  3. Provide a complete description of how you compile the code.
  4. Provide a complete description of your environment: The exact GPU model, the host operating system, the CUDA version and the driver version.
  5. Provide the actual commands you use to run the code.

Much of this can be done simply by copying a pasting an appropriate portion of a console session. You will find many examples on these forums if you poke around.

If you want to follow my instructions, I’ll take a look. Otherwise perhaps someone else will be able to help you. Good luck!

1 Like

Yes, please take a look.

If you follow my instructions, I will take a look. So far you have not followed my instructions. You’re welcome to do as you wish of course, perhaps someone else will be able to help you.

Oh I see, you mean providing complete code etc. Actually it is a very big program, with GUI interface programmed with Dear ImGui on Love2D. I think I will have to wait for other answers, maybe I find the problem. Unless you could install Dear ImGui for Love2D in Windows, then it is possible. Very grateful anyway.

Ok, so I am posting some part of the code, to check if someone helps me find if there lies the problem. I am associating each cartesian index with onde posisiton (x,y,z) in a 3D grid (I heard this is kind of bad practice, but I have read some papers that express this makes no difference at all):

__global__ void bounds(float *pnn,float *pn,float *p,float *np,float* vEE,int *tp,int X,int Y,int Z,float lambda,float tau_T){	
	int i = blockIdx.x*blockDim.x + threadIdx.x + 1; // starts at 1
	int j = blockIdx.y*blockDim.y + threadIdx.y + 1; // starts at 1
	int k = blockIdx.z*blockDim.z + threadIdx.z + 1; // starts at 1
	
	float S1, S2, S3, S4, S5, S6;
	
	if ((i < X-1) && (j < Y-1) && (k < Z-1)){

		.
		.
		.

		pnn[flat(i,j,k,X,Y)] = (S1 + S2 - S3 + tau_T*(S4 + S5))/S6;
		
		// exchange section
		
		__syncthreads();
		np[flat(i,j,k,X,Y)] = p[flat(i,j,k,X,Y)];
		__syncthreads();
		p[flat(i,j,k,X,Y)] = pn[flat(i,j,k,X,Y)];
		__syncthreads();
		pn[flat(i,j,k,X,Y)] = pnn[flat(i,j,k,X,Y)];
	}
}

Hi.

Just posting to say that the problem was solved. It was a host problem related to file straming, nothing related to CUDA nor hardware.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.