Matrix Multiplication -- Why do we 'flatten' matrices into a linear space? Tradeoffs between

Carlo_del_Mundo · May 12, 2011, 7:30pm

Suppose I have two square matrices, A (2x2) and B (2x2). The product AB is stored in C.

If I declare the arrays in C with the notation:

int A[2][2];

int B[2][2];

int C[2][2];

I can easily refer to elements using operator condition. However, I am seeing from numerous examples (incl. NVIDIA_SDK) that programmers seem to flatten these 2D matrices into a 1D dimension (where one can choose from row-order or column-order).

Why do we do this? Is it easier to manipulate matrix elements by referring to one index (Using: x + y * Dim.x)? Performance-wise, would be faster to compute C[x+y*Dim.x] than C[y]?

Thanks

Gregory_Diamos · May 13, 2011, 12:50am

When we do it with static arrays, the compiler is able to flatten it into a single linear array and do the indexing for you. When you do it with dynamic arrays allocated with malloc/new, the compiler has no idea what the dimensions are (they change at runtime), so it can’t do the indexing for you. You have to flatten it yourself and do the indexing yourself.

Carlo_del_Mundo · May 13, 2011, 2:35am

Thanks. I forgot about static vs. non-static arrays (created with new/malloc). After some additional research also: in the end, it’s simply easier to deal with a 1D matrix.

hocheung20 · May 13, 2011, 11:39am

I was under the impression that A[i][j] was nothing more than ((A+i)+j).

In fact, I do this for dynamically allocated arrays all the time in CPU host code.

The real problem is that CUDA does not support pointer indirection. You cannot load an array of pointers from host to device and then load the arrays those pointers are pointing to in one seamless step.

avidday · May 13, 2011, 12:29pm

Not on a statically declared two dimensional array (which is what the question is about). The compiler will compute the total size, allocate the space statically and use linear indexing into that static allocation. Only a single level of pointer indirection required.

Of course it does. How could CUDA support pointers at all if it didn’t support indirection?

That isn’t pointer indirection. That is portability of pointers between host and device memory spaces, which is a completely different issue.

hocheung20 · May 13, 2011, 1:15pm

Yes, but I was referring more to:

I dont understand why the compiler needs to know the dimension size. Assuming you could load the pointers and the array correctly into the GPU memory, you could write ((A+i)+j) everywhere you want A[i][j]. Why is this not done? It’s ok with 2D arrays, but when you get to 4D arrays, it’s quite a pain in the behind to figure out the indexing.

You are correct that I was abusing “pointer indirection”. Pointer to pointer within the device obviously actually works. I was trying to point out that you cant load host pointers into the device and automagically expect them to be properly indirected when on the GPU.

eelsen · May 13, 2011, 5:21pm

It would also be incredibly slow.

Topic		Replies	Views
Static 2D array problem CUDA Programming and Performance	5	1782	October 8, 2009
2D arrays with cuda confusion CUDA Programming and Performance	2	1127	May 9, 2010
multi dimension array CUDA Programming and Performance	26	32927	February 12, 2010
How to allocate a 3d array such that you can use the indecies to access its elements CUDA Programming and Performance	20	5486	October 24, 2009
2 Dimensional Array CUDA Programming and Performance	3	1605	July 28, 2010
2D Matrix operation CUDA Programming and Performance	5	2213	January 26, 2015
2D arrays, pointers to pointers CUDA Programming and Performance	1	1055	February 11, 2010
accessing a multidimensional array in a kernel CUDA Programming and Performance	12	12819	January 27, 2011
2D array indexing with double pointers CUDA Programming and Performance	1	1408	February 11, 2010
Multidimensional Arrays multidimensional array allocation CUDA Programming and Performance	6	6361	December 8, 2007

Matrix Multiplication -- Why do we 'flatten' matrices into a linear space? Tradeoffs between

Related topics