copying structure to constant memory?

Hi all-

I am trying to copy a structure into constant device memory. The structure is defined as:

#define NLAM 32 //nevermind what these refer to…
#define NPRM 8
struct KernelParams {
float lc;
float I0;
float IC;
float qs[NLAM];
float wv[NLAM];
float amin[NPRM];
float amax[NPRM];
};

I am trying to use cudaMemcpyToSymbol and cudaMemcpyFromSymbol to first copy the structure from the host to the device, then back to the host to double check that the structure safely arrived in constant device memory. This is after I had discovered (using cuda-gdb) that the structure available to the kernel was populated with all zeros. Does this mean the structure was never successfully copied?

Below is a fragment showing what I am trying to do.

int main()
{

CUDA_SAFE_CALL( cudaMemcpyToSymbol( “params”, &hostParams, sizeof(hostParams) ) );
CUDA_SAFE_CALL( cudaMemcpyFromSymbol( &dumParams, “params”, sizeof(dumParams) ) );


}

However, I get no agreement between the “ToSymbol” structure and the “FromSymbol” structure. Note, I have trolled google looking for examples of cudaMemcpyTo/FromSymbol, and pretty much everything I read showed different syntax in the function call. I am not sure the above calls are correct, because of this variance.

I’ve been trying to resolve this for 2 days now, and I’m thoroughly stumped. Any ideas from the gurus/maestros/jedi out there? I can post whatever other information/code anyone would like. Thank you in advance!

Hi!

You should read the Reference Manual!

The right declarations are

template < class T >

cudaError_t cudaMemcpyToSymbol( const T& symbol, const void* src, size_t count, size_t offset, enum cudaMemcpyKind kind);

template < class T >

cudaError_t cudaMemcpyFromSymbol( void *dst, const T& symbol, size_t count, size_t offset, enum cudaMemcpyKind kind);

The parameter kind is needed to know whether src (in …ToSymbol) or dst (in …FromSymbol) are pointers to device or host.

BTW: you must declare params using

__constant__ struct KernelParams params;

Regards

Navier

Hi Navier-

Believe you me, I have the reference manual, nvcc manual, and cuda-gdb manual bound together (with an extra copy under my pillow:)

Anyway, I fixed the problem…I had declared the structure params like:

extern device constant KernelParams params;

which, when changed to simply:

constant KernelParams params (note absence of “struct” after “constant”)

fixed the problem. Oh, BTW, if you take a look at cuda_inline.h, the offset and kind are defaulted to 0 and cudaMemcpyHostToDevice, respectively for cudaMemcpyToSymbol, so there’s really no need to specify them unless you’re doing something different. This is logical, since the name cudaMemcpyToSymbol already implies that you’re copying to constant device memory. The same holds true for cudaMemcpyFromSymbol.

Sorry, I didn’t know that.

cudaMemcpyToSymbol implies copying TO the device – but where from? The same holds true for cudaMemcpyFromSymbol…

Well, the only other kind argument that makes sense for copying to constant memory is cudaMemcpyDeviceToDevice, but I can’t imagine a situation (for me and my application, at least) where I would need to populate some constant memory directly from the kernel, so I have always used the defaults.

Can you think of a reason to do this kind of constant memory write from within the kernel that wouldn’t be improved by just using shared memory instead? I’m just curious…

Thank you for your input, though!

You meant to say “in between kernel calls”, right? Whatever …

Yes I can think of situations where it is an advantage to use ‘cudaMemcpyDeviceToDevice’, one is mostly because there is no cudaMemcpyToSymbolAsync(). So I copy the input stream to device memory (async) and then from device to constant memory (which is also async) The threads will all look at the same index so the access pattern should be a good fit for constant memorys underlying broadcasting cache mechanism.

Alright, well I am getting agreement between the structure that I copy to constant memory and the structure that I’m reading back from constant memory, but cuda-gdb still shows all the variables defined in the structure (as accessed from the kernel) as all zeroes.

(cuda-gdb) print params
$29 = {lc = 0, IC = 0, I0 = 0, qs = {0 <repeats 32 times>}, wv = {0 <repeats 32 times>}, amin = {0, 0, 0, 0, 0, 0, 0, 0}, amax = {0, 0, 0, 0, 0, 0, 0,0}}

any ideas from those wiser than myself? Thanks!

I have yet to actually try out the debugger, but given the known issues list I wonder if it is capable of reading constant memory. It would seems that the debugger is only really capable of reading from global memory or from automatic variables (which are dumped to local/global memory when you compile in debug mode).

Have you tried loading values from params into a register variable in your kernel and checked their values with the debugger?

[sinister_voice]

Hello MisterAnderson…42.

[/sinister_voice]

Couldn’t resist. :)

Anyway, thanks for the reply. By “loading values from params into a register variable”, do you mean something like:

[codebox]constant KernelParams params

global void my_kernel(…)

{

 float lc_reg = params.lc;

 ...

}[/codebox]

or is there some other way to declare a register variable? Thanks!

Ok, so I’ve inserted the following in my kernel:

[codebox]global void my_kernel(…)

{

 register float lc_reg = params.lc;

 ...

}[/codebox]

and set a breakpoint just after the declaration. Printing the value of lc_reg still shows 0.