How to define variables in device memory?

i have lots of variables to define and i want to put some of them in device memory and when i use device it shows errors and when i do not use anyting to define it will automatically put them in registers but it will be out of memory.
how can i do?

You could try local, though the compiler should automatically move things to local if you use too much registers.

How about using the compiler option that limits the number of registers forcing the excess into memory? Granted you don’t have a great deal of control. Also, whatever number you specify might be rounded up. I forget the rules… seems like the number is rounded up to the next multiple of four or something like that.

I’ve read the programming guide which says by changing the parameter of the compiler i can put variables in device memory or in registers. but i don’t know what exactly it means. it doesn’t make it clear what i should do in detail.

How many variables? (i.e. bytes, kilobytes, megabytes?) Does each thread need its own copy of the variable, or is it constant across the kernel launch? How are they accessed? (i.e. random pattern, predictable pattern) Do the values change over time?

I’m curious as to why you get errors with device memory. It should be a simple matter of cudaMalloc and cudaMemcpy. But device memory might not be the right choice, depending on your answers to the questions above. Constant memory may be the better choice. Or perhaps device memory read via a texture.

i think there’s over 100 separate parameters and some arrays over 4000 elements. Some of them are changing over time. And what’s important is the input file from harddisk about 200Mbs.So i’m considering to move some arrays to device memory.

But when i use device to define arrays or parameters it always shows errors.I’ve read the programming guide which says “device and constant variables are only allowed at file scope” what does it mean?

That means that those variables can only be declared as global variables in a file.

For your 100 separate parameters, constant memory and kernel arguments are your best options. If they change constantly in time, kernel arguments are probably better. If they remain fixed constant is probably better. Though, you can modify constant memory from the host with cudaMemcpyToSymbol.

For your arrays of 4000 elements, the best memory depends on your access pattern. If every thread in a warp always accesses the same element: constant memory is an option, as long as you don’t have too many of these arrays (only 65k constant memory). Device memory (with or without textures) is the alternative.

As for the 200MB data, device memory (with texture if your access pattern is random) is going to be your only option. Just allocate the memory on the host with cudaMalloc, copy your data there with cudaMemcpy and then pass the pointer as a kernel argument for it to use.

Actually declaring and using a device variable is not usually needed.

that’s not the case
what exactly the way is to define variables in device memory??

yeah , I’ve done it