cudaMallocManaged error on my machine

Hi there,

I was playing with cudaMallocManaged using the SDK code: UnifiedMemoryStreams.cu
However, I wasn’t able to make it run correctly. I was wondering if there was something wrong with my driver version? I’m using cuda-6.0.26rc1 and driver version was 331.49. My machine has dual-socket Intel E5-2690 and two K20c and the OS is RHEL 6.

Here’s the reported error from checkCudaErrors Macro:

CUDA error at UnifiedMemoryStreams.cu:78 code=71() “cudaMallocManaged((void **)&data, sizeof(T)sizesize)”

Thanks,
Jing

This is pure conjecture, but do you have a single socket server you can test with? I am curious if the kernel driver that handles managed memory does not work with dual-socket servers…

I have run following code 1 and it returned some out put without error. But when I run same code with cudaMallocManaged() in code 2, it returned unhadled memory exception error (like segmentation fault in linux) in my
visual studio 2012. Please tell me what is the reason for this.

code 1
+++++++++++++

int main(){

int* data;
int* data1;
char* c1;
char* c1_d;
char* c2;
char* c2_d;
char c;


	cudaMalloc((void **)&c1_d,sizeof(char)*100);
	cudaMalloc((void **)&c2_d,sizeof(char)*100);
	c2=(char *)malloc(sizeof(char)*100);
	
	c1="hellloooo";

	cudaMemcpy(c1_d,c1,sizeof(char)*100,cudaMemcpyHostToDevice);

kernel<<<1,100>>>(c1_d,c2_d);
cudaDeviceSynchronize();

cudaMemcpy(c2,c2_d,sizeof(char)*100,cudaMemcpyDeviceToHost);

	
for (int i=0;i<100;i++)
{
	printf("%c",c2[i]);
}
printf("\n");
getchar();
return 0;

}

global void kernel(char* c1,char* c2){

int i=threadIdx.x;

c2[i]=c1[i];
__syncthreads();

}

code 2 (this will return error. What is the reason)

int main(){

char* c1;
char* c2;

	cudaMallocManaged((void **)c1, sizeof(int)*100);
	cudaMallocManaged((void **)c2, sizeof(int)*100);
	
	c1="hellloooo";

kernel<<<1,100>>>(data,data1,c1_d,c2_d);
cudaDeviceSynchronize();

printf("\n");	
for (int i=0;i<100;i++)
{
	printf("%c",c2[i]);
}
printf("\n");
getchar();
return 0;

}

global void kernel(int* d,int * d1,char* c1,char* c2){

int i=threadIdx.x;
c2[i]=c1[i];
__syncthreads();

Your code 2 doesn’t compile for me. So I don’t think that is the code you are running. There are a variety of problems with it, for example you are allocating c1, c2, but passing c1_d, c2_d to the kernel.

Also, you are missing some ampersands here:

cudaMallocManaged((void **)c1, sizeof(int)*100);
cudaMallocManaged((void **)c2, sizeof(int)*100);

If you post the code that you are actually using, that someone else could compile, you may be able to get better assistance.