Newbie - Need to use shared mem?

RoofusGreen · December 4, 2008, 5:35pm

Thanks for the replies. If I were to use shared memory, would I be able to copy it over just once? And then have each block that goes through that multiprocessor read from the data already stored there by the previous block? I not quite sure how it works.
My program has 2 for loops that go through the array, calculations are done on each piece, all the results from the calculations are summed up, and the result is stored back in global memory.

alex_dubinsky · December 4, 2008, 8:10pm

No. But maybe you could benefit from constant memory.

liv · December 5, 2008, 4:28pm

I’ll keep that in mind, thanks. Right now it’s premature for us to outsource, but soon we could be in a position to consider it.

RoofusGreen · December 9, 2008, 6:52pm

I read that you only get 64kb of constant memory and my 512x512 array of floats won’t fit in that. Should I instead try texture memory or optimizing it for global memory? Or just stay with constant memory, but break up my calculations into smaller tiles? Thanks!

RoofusGreen · December 11, 2008, 6:20pm

I think I’m back to the beginning again. Constant memory won’t fit my 512x512 array (I think >.>) and you said texture memory is rarely useful (any particular reason why? I read in the programming manual that both are cached. Is there any other significant property of texture memory?). So, if I tile my calculations, I think shared memory is best. Each thread needs to perform many calculations on each piece of the 512x512 array, and if I use shared memory, I can just load each new tile over the previous tile at each iteration in the thread. I’m not very experienced with CUDA, but is this a good idea compared to the alternatives? I still don’t get to avoid all that memory copying, but it should work better than global memory I suppose.

alex_dubinsky · December 11, 2008, 6:50pm

Yes, tiling and shared memory are generally the best solution. Other solutions may be slightly better in specific circumstances, but shared memory is always good. If you do many calculations, and you can check the specifics, the extra copying will hardly matter. (Make sure your copies are nice and coalesced.)

RoofusGreen · December 17, 2008, 7:11pm

This is probably a stupid question, but what happens to variables you don’t manually copy to the GPU? In your kernel invocation, you pass the variables you already copied over to global mem, constant mem etc, but what about the parameters like matrix dimensions?

alex_dubinsky · December 17, 2008, 7:44pm

You can pass ordinary values (ie not pointers) and these get copied automatically. In fact, the value of pointers gets copied the same way. You can also pass structs by value (ie automatically), up to 256 bytes total.

Topic		Replies	Views
Doubts related to CUDA CUDA Programming and Performance	17	11815	November 18, 2010
Larbee description CUDA Programming and Performance	53	52216	December 14, 2009
winrar, winzip or 7zip on GPU like the topic says CUDA Programming and Performance	55	154580	May 29, 2011
Some advice needed pls Doubts we have, we're starting with CUDA programming CUDA Programming and Performance	16	4734	June 22, 2011
Wishlist Place your considered suggestions here CUDA Programming and Performance	201	204411	April 13, 2009
Transfer-Bound Application Looking for ideas to speed it up CUDA Programming and Performance	36	29340	April 23, 2010
Verdict: GLSL vs CUDA kind of a not-so-dead post-mortem CUDA Programming and Performance	16	27741	February 11, 2011
Unified Memory in CUDA 6 Technical Blog	87	1913	August 16, 2019
Using Shared Memory in CUDA C/C++ Technical Blog	36	2015	October 8, 2020
What can't you do in CUDA that you'd like? Requests for the future CUDA Programming and Performance	407	134734	May 26, 2010

Newbie - Need to use shared mem?

Related topics