code is giving internal error 20 program did not terminate successfully.

below code is giving me the error
Error: process didn’t terminate successfully
========= CUDA-MEMCHECK
========= Internal error (20)
========= No CUDA-MEMCHECK results found
~

could any one help me with this , thanks in advance .

#include <stdio.h>
#include “cuda.h”
#define h 10
#define w 10

device void devFunction(int * arr_d , int *drr_d,int pitch)
{
int i,j;
for(i=0;i<10;++i)
{
arr_d[i]=arr_d[i]+1;
//drr_d[1][i]=drr_d[1][i]+1;
}

for(i=0;i<10;++i)
{
	int *row=(int*)(drr_d+(i*pitch));
	for(j=0;j<10;++j)
	{
		row[j]=row[j]+1;
	}
}

}

global void test(int arr_d ,int drr_d,int pitch)
{

devFunction(arr_d,drr_d,pitch);	

}

int main()
{

int *arr_h=(int*)malloc(sizeof(int)*10);                         //   working fine for 1D array
int * arr_d;	
cudaMalloc((void**)&arr_d,sizeof(int)*10);
int i;
for(i=0;i<10;i++)
{
	arr_h[i]=i;
}
cudaMemcpy(arr_d,arr_h,sizeof(int)*10,cudaMemcpyHostToDevice);
//test<<<1,1>>>(arr_d);
//cudaMemcpy(arr_h,arr_d,sizeof(int)*10,cudaMemcpyDeviceToHost);
/*for(i=0;i<10;i++)
{
	printf("%d",arr_h[i]);
}
*/

int ** drr_h=(int**)malloc (sizeof(int*)*10);
/*for(i=0;i<10;i++)
{
	drr_h[i]=(int*)malloc(sizeof(int)*10);
}*/
int j;
for(i=0;i<10;i++)
{
	for(j=0;j<10;j++){
	printf("%d\n",drr_h[i][j]);
}}
int *drr_d;
size_t  pitch;
cudaMallocPitch((void**)&drr_d,&pitch,sizeof(int)*10,10);
cudaMemcpy2D(drr_d,pitch,drr_h,sizeof(int) *10, 10 * sizeof(int), 10, cudaMemcpyHostToDevice);
test<<<1,1>>>(arr_d,drr_d,pitch);

//printf("%d\n",pitch);    //value is 512

cudaMemcpy2D(drr_h,10*sizeof(int),drr_d, pitch, 10 * sizeof(int),10, cudaMemcpyDeviceToHost);
printf("\n");

return 0;
}

If you run your code normally, it generates a segmentation fault. This is a problem in host code, not device code. You will need to fix this seg fault before you can get cuda-memcheck results. (It has to do with the fact that you have commented out some of the allocation code that is being used by subsequent printf)

Start by isolating the seg fault to the line that it occurs on. You can use printf or a debugger, whatever method you like.

A quick scan of your code shows that you are confused about what cudaMemcpy2D can be used for. You appear to want to create a double-pointer allocation on the host (**drr_h with nested malloc operations) and then copy that “region” to the device. Your method won’t work and that is not what cudaMemcpy2D is used for.

I would suggest you flatten your data into a single ordinary malloc allocation, and then use ordinary cudaMemcpy. You can also create a double-pointer but contiguous allocation on the host, and then use a single-pointer to refer to the underlying contiguous allocation, and use cudaMemcpy also for that.