copying structure to constant memory?

sundog314 · April 28, 2009, 8:17pm

Hi all-

I am trying to copy a structure into constant device memory. The structure is defined as:

#define NLAM 32 //nevermind what these refer to…
#define NPRM 8
struct KernelParams {
float lc;
float I0;
float IC;
float qs[NLAM];
float wv[NLAM];
float amin[NPRM];
float amax[NPRM];
};

I am trying to use cudaMemcpyToSymbol and cudaMemcpyFromSymbol to first copy the structure from the host to the device, then back to the host to double check that the structure safely arrived in constant device memory. This is after I had discovered (using cuda-gdb) that the structure available to the kernel was populated with all zeros. Does this mean the structure was never successfully copied?

Below is a fragment showing what I am trying to do.

int main()
{
…

CUDA_SAFE_CALL( cudaMemcpyToSymbol( “params”, &hostParams, sizeof(hostParams) ) );
CUDA_SAFE_CALL( cudaMemcpyFromSymbol( &dumParams, “params”, sizeof(dumParams) ) );

…
}

However, I get no agreement between the “ToSymbol” structure and the “FromSymbol” structure. Note, I have trolled google looking for examples of cudaMemcpyTo/FromSymbol, and pretty much everything I read showed different syntax in the function call. I am not sure the above calls are correct, because of this variance.

I’ve been trying to resolve this for 2 days now, and I’m thoroughly stumped. Any ideas from the gurus/maestros/jedi out there? I can post whatever other information/code anyone would like. Thank you in advance!

navier-stokes · April 29, 2009, 6:14am

Hi!

You should read the Reference Manual!

The right declarations are

template < class T >

cudaError_t cudaMemcpyToSymbol( const T& symbol, const void* src, size_t count, size_t offset, enum cudaMemcpyKind kind);

template < class T >

cudaError_t cudaMemcpyFromSymbol( void *dst, const T& symbol, size_t count, size_t offset, enum cudaMemcpyKind kind);

The parameter kind is needed to know whether src (in …ToSymbol) or dst (in …FromSymbol) are pointers to device or host.

BTW: you must declare params using

__constant__ struct KernelParams params;

Regards

Navier

sundog314 · April 29, 2009, 4:49pm

Hi Navier-

Believe you me, I have the reference manual, nvcc manual, and cuda-gdb manual bound together (with an extra copy under my pillow:)

Anyway, I fixed the problem…I had declared the structure params like:

extern device constant KernelParams params;

which, when changed to simply:

constant KernelParams params (note absence of “struct” after “constant”)

fixed the problem. Oh, BTW, if you take a look at cuda_inline.h, the offset and kind are defaulted to 0 and cudaMemcpyHostToDevice, respectively for cudaMemcpyToSymbol, so there’s really no need to specify them unless you’re doing something different. This is logical, since the name cudaMemcpyToSymbol already implies that you’re copying to constant device memory. The same holds true for cudaMemcpyFromSymbol.

navier-stokes · April 29, 2009, 5:29pm

Sorry, I didn’t know that.

cudaMemcpyToSymbol implies copying TO the device – but where from? The same holds true for cudaMemcpyFromSymbol…

sundog314 · April 29, 2009, 6:59pm

Well, the only other kind argument that makes sense for copying to constant memory is cudaMemcpyDeviceToDevice, but I can’t imagine a situation (for me and my application, at least) where I would need to populate some constant memory directly from the kernel, so I have always used the defaults.

Can you think of a reason to do this kind of constant memory write from within the kernel that wouldn’t be improved by just using shared memory instead? I’m just curious…

Thank you for your input, though!

jma · April 30, 2009, 8:10pm

You meant to say “in between kernel calls”, right? Whatever …

Yes I can think of situations where it is an advantage to use ‘cudaMemcpyDeviceToDevice’, one is mostly because there is no cudaMemcpyToSymbolAsync(). So I copy the input stream to device memory (async) and then from device to constant memory (which is also async) The threads will all look at the same index so the access pattern should be a good fit for constant memorys underlying broadcasting cache mechanism.

sundog314 · May 1, 2009, 7:02pm

Alright, well I am getting agreement between the structure that I copy to constant memory and the structure that I’m reading back from constant memory, but cuda-gdb still shows all the variables defined in the structure (as accessed from the kernel) as all zeroes.

(cuda-gdb) print params
$29 = {lc = 0, IC = 0, I0 = 0, qs = {0 <repeats 32 times>}, wv = {0 <repeats 32 times>}, amin = {0, 0, 0, 0, 0, 0, 0, 0}, amax = {0, 0, 0, 0, 0, 0, 0,0}}

any ideas from those wiser than myself? Thanks!

MisterAnderson42 · May 1, 2009, 7:12pm

I have yet to actually try out the debugger, but given the known issues list I wonder if it is capable of reading constant memory. It would seems that the debugger is only really capable of reading from global memory or from automatic variables (which are dumped to local/global memory when you compile in debug mode).

Have you tried loading values from params into a register variable in your kernel and checked their values with the debugger?

sundog314 · May 1, 2009, 7:36pm

[sinister_voice]

Hello MisterAnderson…42.

[/sinister_voice]

Couldn’t resist. :)

Anyway, thanks for the reply. By “loading values from params into a register variable”, do you mean something like:

[codebox]constant KernelParams params

…

global void my_kernel(…)

{

 float lc_reg = params.lc;

 ...

}[/codebox]

or is there some other way to declare a register variable? Thanks!

sundog314 · May 1, 2009, 8:18pm

Ok, so I’ve inserted the following in my kernel:

[codebox]global void my_kernel(…)

{

 register float lc_reg = params.lc;

 ...

}[/codebox]

and set a breakpoint just after the declaration. Printing the value of lc_reg still shows 0.

Topic		Replies	Views
copying array value of structure from host to device CUDA Programming and Performance	5	1315	December 11, 2013
directly copy device memory to the constant memory how to directly copy CUDA Programming and Performance	3	4725	October 30, 2008
copy device memory to constant memory CUDA Programming and Performance	4	12533	November 11, 2008
Copy structs from host to device (and vice versa) CUDA Programming and Performance	5	5541	May 16, 2011
How do you copy an array into constant memory? CUDA Programming and Performance	11	15874	December 17, 2010
__const__ writting with cudaMemcpyToSymbol CUDA Programming and Performance	15	10933	August 30, 2007
cudaMemcpyToSymbol do not copy data CUDA Programming and Performance	3	5839	August 12, 2009
copy to __constant__ by cudaMemcpyToSymbol CUDA Programming and Performance	2	8149	July 7, 2008
Constant Memory - When are variables cleaned? Multiple kernels for same constant... CUDA Programming and Performance	10	8921	January 22, 2008
cudaMemcpyToSymbol use details (translation/compilation unit?) CUDA Programming and Performance	0	457	October 29, 2013

copying structure to constant memory?

Related topics