cudaMemcpy error

Hi,

I try to copy from host to device memory.
when nx = 50, ny = 50, nz = 50 everything is fine.
However when nx = 512, ny = 512, nz > 8
cudaMemcpy generates a fault. It’s looked like segmentation error.

Is there any maximum size for cudaMemcpy ?
As I understand 512512200*sizeof(float) = 419430400 = 400 Mb

My video card is Nvidia optimus GT540M(1Gb memory). OS: ubuntu 12.10.

My code:

void Test(float *a)
{
  int nx = 512;
  int ny = 512;
  int nz = 200;
  size_t size = nx*ny*nz* sizeof(float);
  float *cuda_a;
  cudaMalloc((void **) &cuda_a, size);
  cudaMemset(cuda_a ,0,size);
  cudaMemcpy(cuda_a, a, nx*ny*nz, cudaMemcpyHostToDevice);
}

What is wrong with my code?

Thanks!

cudaMemcpy(cuda_a, a, nx*ny*nz*sizeof(float), cudaMemcpyHostToDevice);

Though that might not be causing the problem, the size is wrond. Also you do not need the cudaMemset because you will copying over that data.

Sorry, it’s my fault. actually I have nxnynz*sizeof(float), as you have written.
However, it does not work.
What do you mean: you do not need the cudaMemset.
I need to copy “a” array to the cuda device.

Could you give me more tips?

Thanks!

The memset is not causing any issues, you just do not need it. It is like painting a wall white before you paint it green.

My uneducated guess would be that the operating system is reserving/using some of the device memory and will not let you allocate that size. Maybe there are some programs running in the background which also are using device memory.

There may be some OS setting you can modify to free up some space.

Probably the a variable you are passing to the function is not allocated to be large enough. You should show a complete code or at least the allocation of a. Are you allocating it on the stack? That likely won’t work.

If possible please post your full code. The problem is in memory allocation. You are allocating more memory.

Error At Lin10:
cudaMemcpy(cuda_a, a, nxnynz, cudaMemcpyHostToDevice);
Dear you are copying nxnynz bytes of memory. As you are working with FLOAT, you have to copy “nxnynz*sizeof(float)” amount of memory.

Replace With:
cudaMemcpy(cuda_a, a, nxnynz*sizeof(float), cudaMemcpyHostToDevice); will work for you.