cudaMalloc2D

dear all;

I want to use cudaMalloc2D to define an 2d array in the main, and send it as a parameter to a kernel to use it as 2d array in the kernel

i found the cudaMalloc2D function , but i found an argument called pitch, what value shall i set for it?

I didn’t know there is a cudaMalloc2D(), and I can’t find such a function in the CUDA documentation. There is cudaMallocPitch(), is that what you are referring to? The prototype is

cudaError_t cudaMallocPitch (void** devPtr, size_t* pitch, size_t width, size_t height)

So “pitch” is something that the function returns. You need to pass in a pointer to a size_t object for CUDA to store it in. Example:

size_t width = 100;
size_t height = 200;
size_t pitch = 0;
void * data = 0;
cudaMallocPitch (&data, &pitch, width, height);
printf ("the allocated data resides at address %p, its pitch is %d bytes\n", data, (int)pitch);

see this link page 68-69

That’s very old.

current docs are here:

http://docs.nvidia.com/cuda

There is (currently) no cudaMalloc2D

And the purpose of cudaMallocPitch is not to enable you to reference a doubly-subscripted C array in the kernel:

int x = my_global_data[i][j];

To do that requires extra effort.

I would strongly suggest to make use of the current documentation rather than outdated documentation from 2007. What you have there looks like the very first Programming Guide from before the CUDA 1.0 release; the current CUDA version is 6.5. Please refer to

[url]CUDA Toolkit Documentation

ok, thanks

how can i define NxN global memory to use it in a kernel?

CUDA ships with many example programs that demonstrate basic concepts such as memory allocation. I would suggest working through them. You may also find it helpful to work with an introductory CUDA book, such as “CUDA by Example”.

If you are a beginner GPU programmer, I would encourage you to “flatten” your 2D array and handle it in 1D fashion, perhaps using subscript arithmetic to simulate 2D access.

That means:

  1. allocate an ordinary 1D array of size NxN (your example). You can use malloc on the host, and cudaMalloc on the device
  2. use ordinary 1D cudaMemcpy to transfer this array to the device.
  3. Access it as an ordinary 1D array on the device. If you want to simulate 2D access, do something like:

int x = global_data[i*num_cols+j];

where i would be your row index (first subscript in a doubly subscripted array) and j would be your column index. num_cols is just N in your case, the number of columns in your 2D matrix.

thank you