accessing a multidimensional array in a kernel


If I use something like the following:

	cudaExtent extent = make_cudaExtent(W*sizeof(float), H, 9);

	cudaPitchedPtr covOut;
	cudaMalloc3D(&covOut, exent);//allocate the space on device

where W and H and 9 are the dimensions of my 3D array on the host,

can I access the variables in the 3D array on the device with the bracketed notation like


Or do I have to use pointer arithmetic like

covOut[i*W*H+j*W+k] to access it as if it were a single dimension array? If I have to access it like this, I thought I read somewhere that the CUDA arrays are stored in column major, so would that change the access to this:



Malloc2D and Malloc3D are only for allocating arrays for textures, you can’t use them to allocate general purpose global memory. Just use 3d indexing into linear memory using something like your second indexing scheme.

If this is the case, do I have to flatten my host 3D array into a 1D array before I can do a cudaMemCpy to the device 1D array, or is there a method or something for that?

Flattened arrays are considerably easier to work with, so flattening the array on the host is probably the best strategy.

I think I will go with this strategy for now.

I have one more question - the data that is being used in my kernel is actually using complex

is this going to be a problem within my cuda kernel?

Oh and one other thing - I guess this may be my C/C++ noobishness showing through -my code is currently C++ and i was just using new complex to allocate complex data. How would I do this in cudaMalloc?

You could use the float2 data type to represent your complex numbers.

So for example:

float2 complexVar;

complexVar.x = 1.0f;
complexVar.y = 3.0f;

So complexVar can now be used to represent 1+3i. About cudaMalloc :

int size = N*sizeof(float2); // number of complex elements times size of float2 (8 bytes)

cudaMalloc((void**)&d_complexArray, size);

Hope this is of some help!

What is float2? Have the operators been overridden to allow for things like complex addition and multiplication, absolute value, etc?

Also, what is cuComplex? I found some reference to it, and apparently I have this library in my lib directory. Can I use cuComplex for device and host code?

I split this topic off to

If a mod could move the relevant posts it’d be appreciated. Thanks

A float2 is 2 floats as the name indicates. Basically just:

struct FLOAT_2 {

float x;

float y;


A cuComplex is the same thing as far as i know. float2 and cuComplex can be used in both host and device code.

You need to perform the complex multiplication, addition etc,. explicitly yourself, i.e.:

float2 x; x.x = 1.0f; x.y = 3.0f;

float2 y; y.x = 1.0f; y.y = 3.0f;

// x = x+y - complex addition

x.x = x.x + y.x;

x.y = x.y + y.y;


Yeah, but CudaMath.h isn’t an offical part of CUDA, it is just a convenience file shipped in the SDK.

Sorry, avidday, for leaving your post ‘hanging’.
I realized exactly what you said and deleted my post … but too late ;}