Memory Checker detected 12 access violations. error = access violation on load (global memory)

hu365958153 · January 18, 2018, 5:01pm

when I use cuda debug to debug my code, there is always error like this:

Memory Checker detected 12 access violations.
error = access violation on load (global memory)
gridid = 1
blockIdx = {2,0,0}
threadIdx = {3,25,0}
address = 0x03f100d8
accessSize = 4

sometimes it shows “error = access violation on load (share memory)”, but I don’t use any share memory.

However, when I decrease my input data size, it won’t hint error, why? Is there any possibility that too large data size causes this kind of error?

Here is my kernel function:

__global__ void DefferentialSysMatrixKernel(int* k, int** address, float** length, int* kDiffer, int** addressDiffer, float** lengthDiffer)
{
	int x = threadIdx.x + blockDim.x*blockIdx.x;
	int y = threadIdx.y + blockDim.y*blockIdx.y;

	if ((x < detecterN - 1) && (y < frameN))
	{
		int rayIndex = y*detecterN + x;
		int index = y*(detecterN - 1) + x;

		int *addr1 = address[rayIndex];
		int *addr2 = address[rayIndex + 1];
		int * addr3 = addressDiffer[index];
		float *leng1 = length[rayIndex];
		float *leng2 = length[rayIndex + 1];
		float *leng3 = lengthDiffer[index];
		int num1 = k[rayIndex];
		int num2 = k[rayIndex + 1];

		int t1 = 0;
		int t2 = 0;
		int t3 = 0;
		while ((t1 < num1) && (t2 < num2))
		{
			if (addr1[t1] == addr2[t2])
			{
				addr3[t3] = addr1[t1];
				leng3[t3] = leng2[t2] - leng1[t1];
				t1++;
				t2++;
				t3++;
			}
			else if (addr1[t1] < addr2[t2])
			{
				addr3[t3] = addr1[t1];
				leng3[t3] = -leng1[t1];
				t1++;
				t3++;
			}
			else
			{
				addr3[t3] = addr2[t2];
				leng3[t3] = leng2[t2];
				t2++;
				t3++;
			}
		}
		while (t1 < num1)
		{
			addr3[t3] = addr1[t1];
			leng3[t3] = -leng1[t1];
			t1++;
			t3++;
		}
		while (t2 < num2)
		{
			addr3[t3] = addr2[t2];
			leng3[t3] = leng2[t2];
			t2++;
			t3++;
		}
		kDiffer[index] = t3;

	}
}

It is a little complicated. I use it to calculate difference between a set of sparse vectors. Parameters detecterN and frameN are Macro definitions according to input data size.

I check it for many times to make sure that array subscripts not out of bounds.

Any one can tell me why?

cbuchner1 · January 18, 2018, 5:11pm

Could mean you’re dereferencing uninitialized pointers in unified memory space that (by chance) point to shared memory.

Robert_Crovella · January 18, 2018, 5:30pm

I wouldn’t be surprised if you’re not handling the double pointers correctly:

DefferentialSysMatrixKernel(int* k, int** address, float** length,
                                       ^^               ^^

because it’s not trivial to get that right. But it’s impossible to say without seeing the host code.

with a little bit of effort you can have the cuda-memcheck tool narrow down the error to a specific line of kernel source code:

https://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy/27278218#27278218

hu365958153 · January 20, 2018, 7:49am

Thanks for your advice. I have tried to use cuda-memcheck, but it succeed and hint no error, here is the output:

E:\Learning\Bei Hang\lab\statistical iteration reconstruction\code\PCMAP_cuda_modify_2\x64\Debug>cuda-memcheck PCMAP.exe
========= CUDA-MEMCHECK
Read file successfully
System Matrix Loaded successfully!
Weight Loaded successfully!
cpu load time: 0.843s
total iteration time: 117.613min

These output are set in code by myself. So it means no error?

Howerver, I still get error massage when I use cuda debug in VS2015. Why?

Also, you can see it costs almost 2 hours when I use cuda-memcheck. But in VS, when I run the code, not in cuda debug, it can finish in just twenty seconds. What causes this?

txbob:

I wouldn’t be surprised if you’re not handling the double pointers correctly:
DefferentialSysMatrixKernel(int* k, int** address, float** length,
                                       ^^               ^^
because it’s not trivial to get that right. But it’s impossible to say without seeing the host code.

with a little bit of effort you can have the cuda-memcheck tool narrow down the error to a specific line of kernel source code:

https://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy/27278218#27278218

hu365958153 · January 20, 2018, 8:34am

Also, I always have the question that whether my double pointer assigned correctly.

Actually, my input double pointers are assigned like this:

cudaMalloc((void**)&d_k, rayN * sizeof(int));
	cudaMalloc((void**)&d_addr, numsum * sizeof(int));
	cudaMalloc((void**)&d_leng, numsum * sizeof(float));

	cudaMemcpy(d_k, k, rayN * sizeof(int), cudaMemcpyHostToDevice);
	cudaMemcpy(d_addr, address, numsum * sizeof(int), cudaMemcpyHostToDevice);
	cudaMemcpy(d_leng, length, numsum * sizeof(float), cudaMemcpyHostToDevice);

        int** addrLocate = (int**)malloc(rayN * sizeof(int*));
	float** lengLocate = (float**)malloc(rayN * sizeof(float*));

	t = 0;
	for (i = 0; i < rayN; i++)
	{
		addrLocate[i] = d_addr + t;
		lengLocate[i] = d_leng + t;
		t = t + k[i];
	}

	int** d_addrLocate = NULL;
	float** d_lengLocate = NULL;

	cudaMalloc((void**)&d_addrLocate, rayN * sizeof(int*));
	cudaMalloc((void**)&d_lengLocate, rayN * sizeof(float*));

	cudaMemcpy(d_addrLocate, addrLocate, rayN * sizeof(int*), cudaMemcpyHostToDevice);
	cudaMemcpy(d_lengLocate, lengLocate, rayN * sizeof(float*), cudaMemcpyHostToDevice);

	free(addrLocate);
	free(lengLocate);

Then the parameters int** d_addrLocate and int** d_lengLocate are passed to the parametric parameters int** address and int** locate.

I doubt that when data size(that is numsum) is too big, the memory that pointers int* d_addr and int* d_leng point to are not continues, which might caused double pointers pointed to wrong places.

However, I don’t get any wrong massage when I use cuda-memcheck. Do you know why?

Thank you a lot anyway :)

txbob:

I wouldn’t be surprised if you’re not handling the double pointers correctly:
DefferentialSysMatrixKernel(int* k, int** address, float** length,
                                       ^^               ^^
because it’s not trivial to get that right. But it’s impossible to say without seeing the host code.

with a little bit of effort you can have the cuda-memcheck tool narrow down the error to a specific line of kernel source code:

https://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy/27278218#27278218

Robert_Crovella · January 20, 2018, 3:05pm

If cuda-memcheck is working properly, at the end of your program printout it will print a line something like this:

========= ERROR SUMMARY: 0 errors

I don’t see that in your printout. Either you cut it off when you were posting, or something is wrong.

Yes, cuda-memcheck makes your code run much slower. This is documented in the cuda-memcheck manual.

hu365958153 · January 21, 2018, 2:40am

Ah, yes! I added system(“pause”) at end of my code, so it was actually stopped at here.
Thank you!

Topic		Replies	Views
How to interpret cudaMemCheck output of access violation? CUDA Programming and Performance	7	852	September 7, 2017
an illegal memory access was encountered CUDA Programming and Performance	7	62773	November 10, 2017
CUDA_ERROR_ILLEGAL_ADDRESS CUDA Programming and Performance	6	10630	September 26, 2017
matrix multiplication with its transpose in cuda(cudamemcpy from device to host not working) . CUDA Programming and Performance	6	1716	October 5, 2018
memory access error CUDA Programming and Performance	11	1394	January 12, 2013
Cuda application crashes works fine for small data and crashes for big data CUDA Programming and Performance	3	402	October 12, 2021
illegal memory access - any help appreciated CUDA Programming and Performance	5	6331	February 8, 2018
Cuda code performance CUDA Programming and Performance	14	3052	December 16, 2014
Memory transfer error CUDA Programming and Performance	5	2264	April 19, 2012
Tracking Invalid read size and illegal memory access CUDA Programming and Performance	3	7573	May 24, 2016

Memory Checker detected 12 access violations. error = access violation on load (global memory)

Related topics