illegal memory access - any help appreciated

Hello,

I am a student who just started learning to use gpu parallel programming for a project on my own. Currently using cuda 8.0. I have encountered an illegal memory access error when running my program for image processing. No problem with compilation.

I am hoping for some advice on cudaMalloc, and if i did it correctly.
In this code, i am attempting to get a pointer to the data stored in the type Eigen ArrayXXf color, then passed over to another function to do mathematical manipulation on gpu in parallel and finally store it back to color ArrayXXf.

I believe the issues lies with my allocation of pointer from host to device and if the pointer at the device could link back. But i cant figure out where i went wrong.
The problem should lie on line 50 whenever I attempt to write to dev_color. Reading from dev_color however does not result in any illegal memory error and i am able to obtain 0.5 as initialised

the two functions, cudaSafeCall and cudaCheckError, are obtained from https://gist.github.com/ashwin/2652488 as a method to check if the kernel function runs properly.

inline void __cudaSafeCall( cudaError err, const char *file, const int line )
{
#ifdef CUDA_ERROR_CHECK
    if ( cudaSuccess != err )
    {
        fprintf( stderr, "cudaSafeCall() failed at %s:%i : %s\n",
                 file, line, cudaGetErrorString( err ) );
        exit( -1 );
    }
#endif

    return;
}

inline void __cudaCheckError( const char *file, const int line )
{
#ifdef CUDA_ERROR_CHECK
    cudaError err = cudaGetLastError();
    if ( cudaSuccess != err )
    {
        fprintf( stderr, "cudaCheckError() failed at %s:%i : %s\n",
                 file, line, cudaGetErrorString( err ) );
        exit( -1 );
    }

    err = cudaDeviceSynchronize();

    if( cudaSuccess != err )
    {
        fprintf( stderr, "cudaCheckError() with sync failed at %s:%i : %s\n",
                 file, line, cudaGetErrorString( err ) );
        exit( -1 );
    }
#endif

    return;
}

__global__ void kernel(int dev_size, int width, int height, float* dev_color)
{
	int i = blockIdx.x * blockDim.x + threadIdx.x;  
	if(i < dev_size) //check that kernal function call is within loop size
	{

		...//other parts of the code
	

		if((0<=x && x<width) && (0<=y && y<height))
		{
			dev_color[x*width + y] = 12.3; //crash here
		}
	}
}


void test_cu(size_t size, int width, int height, float *c_ptr)
{
	size_t free, total;
	int dev_size = (int) size; //size = 307200

	printf("height= %d, width=%d", height, width); //h=360, w=480

	float *dev_color;						
	int mem_size = sizeof(float) *width *height;  //691200 
	CudaSafeCall(cudaMalloc( (void**) &dev_color, mem_size));	
	cudaMemcpy(dev_color, c_ptr, mem_size, cudaMemcpyHostToDevice);	
	
	cudaMemGetInfo(&free,&total); 
	printf("\n1) %d KB free of total %d KB\n",free/1024,total/1024); //checking memory allocation


	int ThreadsPerBlk = 1024;	
	float blk_div = (float) dev_size/1024;
	int NumBlk = (int) ceil(blk_div);
	
   	kernel<<<NumBlk, ThreadsPerBlk>>>(dev_size, width, height, dev_color); //run this kernel function on gpu

   	CudaCheckError();			//cuda sync error
    cudaDeviceSynchronize();
 	cudaMemcpy(c_ptr, dev_color, mem_size, cudaMemcpyDeviceToHost); 
 	cudaFree(dev_color);
}

void main()
{
      ...//other parts of codes
      
      color = ArrayXXf::Constant(width, height, 0.5); //width=480, height=360
      float *c_ptr = color.data();  //obtain pointer to arrayXXf

      test_cu(size, width, height, c_ptr);

      //remap the float pointer back to the ArrayXXf
      Map<ArrayXXf>(c_ptr, width, height) = color;

      ...//other parts of codes
}

the error is shown below:

cudaCheckError() with sync failed at /path/test_cu.cu:339 : an illegal memory access was encountered

the line ^339 refers to the function CudaCheckError(); on line 78 of the part of the code shown above.

I have done some research/googling that illegal memory access could be due to allocating too much memory or if the array is out of bound? Thus, perhaps my allocation of memory space or pointer is performed wrongly.

Any advice/help is appreciated.
Thank you,

Samuel

how did you determine that line 50 is the culprit? i.e. that it is the line that is generating the illegal memory access?

if that is the actual problem line, don’t you think it would be important to show how x and y are computed in the kernel code?

Finally, it looks to me like your indexing on that line is wrong:

dev_color[x*width + y]

I think it should be:

dev_color[y*width + x]

however without seeing the x and y calculations I’m not totally certain of that. Since you are comparing x against the width dimension, I assume that is your “horizontal” index. Similarly y being compared against the height dimension would be the “vertical” index, i.e. indicating which line of the image. If y indicates which line of the image, then the y value should be multiplied by the line width when computing a 1-D equivalent index.

Thanks for replying.

As for why line 50: I have commented out the line and there is no problem with the kernel function. But that defeats the function of my program.

As for the x and y, I did some checks to Ensure it is within the dimension of 480x360.
They are both initialised as int.
But the “illegal memory access” error also occured when I replaced line 50 with:

dev_color[0] = 1.0;

Which I believed should not have an error unless perhaps my pointer memory allocation on cuda is performed incorrect.
The error occurs whenever I attempt to write to dev_color.

And I am new to cuda so I couldn’t figure out what went wrong.
Pardon my embarrassing indexing…

That is a generally flawed technique either for debugging or for performance analysis. The reason is that eliminating lines of code that write to global state will allow the compiler to optimize your code differently, perhaps removing large sections of “dead” code that no longer have any effect on global state, even if they perform other reads and writes.

Therefore this technique for fault localization should be avoided. Instead, the cuda-memcheck tool provides a useful fault isolation mechanism for illegal address errors.

Please read this:

https://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy/27278218#27278218

apply it to your code, and identify the actual line of code that is generating the fault, per cuda-memcheck analysis.

Thanks and sorry for the delay.

Attempting to run cuda-memcheck on the .exe have proved slightly troublesome as I am using nvcc to just compile this single part of the program using nvcc into a static library eg test_cu.cu.o to be linked and compiled together with multiple .cpp files on ROS. Have to add that to my Cmakelist before running the .bag file.

Anyway here’s the result:

cudaCheckError() with sync failed at /path/test_cu.cu:339 : an illegal memory access was encountered
========= Internal error (7)
========= No CUDA-MEMCHECK results found

So it have 7 internal error, how do i view those? I did included -lineinfo as following the post.

This could happen if your governing application is hitting a seg fault, and probably in other cases as well. I wouldn’t be able to give any further help without a full test case.

You might want to create a simplified test case that just calls the CUDA code from a test harness, not involving ROS or any other libraries/modules.

Alternatively you could run the code under a debugger such as cuda-gdb. You would need to learn how to use that if you don’t know already.