Managing Constant Memory

rdelcueto · November 18, 2010, 8:43am

Hi,
I need some advice regarding the Cuda architecture constant memory management.
Throughout the Cuda documentation, programming guide, and the “Cuda by Example” book, all I seem to find regarding constant memory, is how to assign/copy into a constant declared array, by using the cudaMemcpyToSymbol() function. But there’s never any mention on how to modify or “free” this allocations. ( Unlike Texture memory, which can be unbinded )

Regarding modification:

I’m working on a problem, were I have to update the values of my constant memory array after each kernel invocation. While searching for answers, I read that it wasn’t possible to modify constant memory, once it had been assigned, but I recently found this post in this forums, which shows it’s actually possible to do what I need:

My guess is that, by calling cudaMemcpyToSymbol(), I can modify this values before each call to my kernel. Is this correct?

Regarding allocation:

What if I need certain amount of constant memory, say 64k for a table of integers at one point, and later on I need another table of another 64k of floats, and I don’t longer need the first table of integers. Is there a way to “free” the first table, in order to allocate the second table?

As far as I understand, constant memory allocation is done at compiling time, which means that I can’t allocate different amounts or sets throughout my program.
Is there a way around this?

I was thinking on using Texture memory, to allow the dynamic allocation of my tables. Yet I was really looking for the Broadcasting benefits of the Constant memory, and not the Spatial locality caching benefits from Texture memory.

Thanks in advance,
Rodrigo

rdelcueto · November 18, 2010, 8:43am

Hi,
I need some advice regarding the Cuda architecture constant memory management.
Throughout the Cuda documentation, programming guide, and the “Cuda by Example” book, all I seem to find regarding constant memory, is how to assign/copy into a constant declared array, by using the cudaMemcpyToSymbol() function. But there’s never any mention on how to modify or “free” this allocations. ( Unlike Texture memory, which can be unbinded )

Regarding modification:

I’m working on a problem, were I have to update the values of my constant memory array after each kernel invocation. While searching for answers, I read that it wasn’t possible to modify constant memory, once it had been assigned, but I recently found this post in this forums, which shows it’s actually possible to do what I need:

My guess is that, by calling cudaMemcpyToSymbol(), I can modify this values before each call to my kernel. Is this correct?

Regarding allocation:

What if I need certain amount of constant memory, say 64k for a table of integers at one point, and later on I need another table of another 64k of floats, and I don’t longer need the first table of integers. Is there a way to “free” the first table, in order to allocate the second table?

As far as I understand, constant memory allocation is done at compiling time, which means that I can’t allocate different amounts or sets throughout my program.
Is there a way around this?

I was thinking on using Texture memory, to allow the dynamic allocation of my tables. Yet I was really looking for the Broadcasting benefits of the Constant memory, and not the Spatial locality caching benefits from Texture memory.

Thanks in advance,
Rodrigo

seibert · November 18, 2010, 1:39pm

If you can target compute capability 2.x devices, you can rely on the L1 and L2 cache and just use global memory for your lookup tables. Otherwise, you are right that constant memory allocation size is “baked into” the kernel by the compiler and can’t be altered at runtime.

seibert · November 18, 2010, 1:39pm

If you can target compute capability 2.x devices, you can rely on the L1 and L2 cache and just use global memory for your lookup tables. Otherwise, you are right that constant memory allocation size is “baked into” the kernel by the compiler and can’t be altered at runtime.

rdelcueto · November 19, 2010, 7:10am

Thanks for the information. I’ve been reading a bit about Fermi’s L1 and L2 cache, and as far as I could understand, it will only benefit the concurrent or reiterative reads from one single address, after it’s been fetched ( Basically how any cache works ). That won’t really give me the warp broadcasting feature from Constant Memory that I need for the table access. I understood that the L1 cache is basically shared memory being used as cache, and in that case, won’t I get better results by pre-storing my tables in shared memory, which does feature a broadcasting mechanism access?

rdelcueto · November 19, 2010, 7:10am

Thanks for the information. I’ve been reading a bit about Fermi’s L1 and L2 cache, and as far as I could understand, it will only benefit the concurrent or reiterative reads from one single address, after it’s been fetched ( Basically how any cache works ). That won’t really give me the warp broadcasting feature from Constant Memory that I need for the table access. I understood that the L1 cache is basically shared memory being used as cache, and in that case, won’t I get better results by pre-storing my tables in shared memory, which does feature a broadcasting mechanism access?

seibert · November 19, 2010, 1:43pm

Keep in mind that constant memory is just global memory being accessed through a small (6-8 kB) cache on the multiprocessor optimized for broadcast. The first fetch of a word still has to go out to global memory, and if your table is larger than the cache, words will be evicted and possibly require refetching. You are correct that if you can fit your entire lookup table into shared memory, that will be as good, if not better, than relying on the L1 cache.

To decide whether constant memory or cached global memory is better for table lookup on Fermi, I think we need a small microbenchmark. I don’t think I’ve seen anyone directly compare those two cases.

seibert · November 19, 2010, 1:43pm

Keep in mind that constant memory is just global memory being accessed through a small (6-8 kB) cache on the multiprocessor optimized for broadcast. The first fetch of a word still has to go out to global memory, and if your table is larger than the cache, words will be evicted and possibly require refetching. You are correct that if you can fit your entire lookup table into shared memory, that will be as good, if not better, than relying on the L1 cache.

To decide whether constant memory or cached global memory is better for table lookup on Fermi, I think we need a small microbenchmark. I don’t think I’ve seen anyone directly compare those two cases.

cbuchner1 · November 19, 2010, 3:35pm

Constant memory is local to the source file in which it is declared. There is a 64k limit per file.

Put several huge tables into different .cu modules - and voila you have different 64kb pages of cached constant memory available in each module. I am using that strategy without any problems.

Christian

cbuchner1 · November 19, 2010, 3:35pm

Constant memory is local to the source file in which it is declared. There is a 64k limit per file.

Put several huge tables into different .cu modules - and voila you have different 64kb pages of cached constant memory available in each module. I am using that strategy without any problems.

Christian

Topic		Replies	Views
is it possible to modify or even destroy constant memory in cuda? CUDA Programming and Performance	6	3297	June 28, 2012
Constant memory allocation and initialization CUDA Programming and Performance	12	83014	November 20, 2010
Am I using the constant memory properly? CUDA Programming and Performance	4	1865	May 15, 2013
Constant Memory Allocation __constant__ memory runtime allocation? CUDA Programming and Performance	1	2565	July 31, 2007
Should I use constant memory or Texture? CUDA Programming and Performance	8	11671	February 20, 2008
Constant memory confusion CUDA Programming and Performance	11	11019	October 22, 2010
constant memory as pointer? CUDA Programming and Performance	7	1375	May 7, 2014
dynamic allocation of constant memory possible? CUDA Programming and Performance	4	12073	June 26, 2008
Global Memory to constant memory CUDA Programming and Performance	3	6567	October 22, 2010
CUDA memory management CUDA Programming and Performance	3	2484	February 21, 2012

Managing Constant Memory

Related topics