2D matrix in device memory

jeffsk · November 9, 2010, 8:50pm

Here’s the problem:

I wanna create a 2D matrix in the device memory to avoid the frequent memory exchange, However I didn’t find a proper way to do so. The reason I need a 2D structure is because there are something that 1D structure can not provide, for example:

if M is a 2D matrix, then I can use M[i] directly as a vector; this can’t happen with a 1D matrix, it WILL cost additional memory copy.

I also can’t use shared memory because it’s too small.

So, is there anyway?

jeffsk · November 9, 2010, 8:50pm

Here’s the problem:

I wanna create a 2D matrix in the device memory to avoid the frequent memory exchange, However I didn’t find a proper way to do so. The reason I need a 2D structure is because there are something that 1D structure can not provide, for example:

if M is a 2D matrix, then I can use M[i] directly as a vector; this can’t happen with a 1D matrix, it WILL cost additional memory copy.

I also can’t use shared memory because it’s too small.

So, is there anyway?

KChou · November 10, 2010, 1:54am

If you’re talking about memory exchange between host and device, this isn’t a problem since you can just load everything onto the device global memory
and when you want to operate on a vector, load that into shared memory before operating on it.

But this way requires repeatedly loading a vector from global memory to shared memory, operating on the vector, then write the result back into global memory
this results in a lot of memory read and writes from global into the shared memory. I’m wondering about this as well. Would this slow down the application at all? or do threads read from the shared memory so quickly, compared to the global memory, that this is worth all the extra memory copies???

KChou · November 10, 2010, 1:54am

If you’re talking about memory exchange between host and device, this isn’t a problem since you can just load everything onto the device global memory
and when you want to operate on a vector, load that into shared memory before operating on it.

But this way requires repeatedly loading a vector from global memory to shared memory, operating on the vector, then write the result back into global memory
this results in a lot of memory read and writes from global into the shared memory. I’m wondering about this as well. Would this slow down the application at all? or do threads read from the shared memory so quickly, compared to the global memory, that this is worth all the extra memory copies???

avidday · November 10, 2010, 4:44am

8

No it won’t. While you imagine that you must form a “sliceable” matrix like this:

int vector0[5] = {0, 1, 2, 3, 4};

int vector1[5] = {5. 6, 7, 8, 9};

int * matrix[2];

matrix[0] = &vector0;

matrix[1] = &vector1;

int * slice1 = matrix[1];

you can just as easily do this with a chunk of linear memory:

int matrix[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};

int * slice1 = &matrix[5];

There is no additional overhead associated with obtaining the slice in the second case compared to the first.

avidday · November 10, 2010, 4:44am

8

No it won’t. While you imagine that you must form a “sliceable” matrix like this:

int vector0[5] = {0, 1, 2, 3, 4};

int vector1[5] = {5. 6, 7, 8, 9};

int * matrix[2];

matrix[0] = &vector0;

matrix[1] = &vector1;

int * slice1 = matrix[1];

you can just as easily do this with a chunk of linear memory:

int matrix[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};

int * slice1 = &matrix[5];

There is no additional overhead associated with obtaining the slice in the second case compared to the first.

Topic		Replies	Views
2D Matrix operation CUDA Programming and Performance	5	2136	January 26, 2015
Shared Memory Access - Matrix Multiplication CUDA Programming and Performance	1	1042	October 24, 2015
Matrix - Vector Multiplication Can't get any faster with shared memory CUDA Programming and Performance	4	7133	September 6, 2011
achieve inter process communication, host<->GPU to avoid copying between global and shared m CUDA Programming and Performance	2	670	December 20, 2011
Create 2D Matrices from 3D matrix CUDA Programming and Performance	3	572	September 27, 2018
2D float matrix x vector: global vs. shared memory: CUDA Programming and Performance	1	552	October 1, 2018
Matrix Partitioning how would you implement it? Sparse matrix partitioning CUDA Programming and Performance	16	14670	March 30, 2010
Trade offs between loading cost of loading to shared memory and working directly on global memory CUDA Programming and Performance	4	521	November 8, 2021
Matrix Reduction CUDA Programming and Performance	7	8339	November 18, 2009
3d arrays in memory CUDA Programming and Performance	1	2062	May 21, 2009

2D matrix in device memory

Related topics