constant and shared memoty

ykhong · December 12, 2008, 6:20am

Hello

I have two problems about constant and shared memory on CUDA

(1) Can I copy a float “array” on host memory to device

  constant memory? If it can, how to do that? Can anyone give

   me a simple example copy a float array from host memory 

   to device constant memory?

(2) I want to create shared memory dynamically. Suppose I want to

  create two 2D float array on shared memory depend on a variable 

  like __shared__ float array1;

        __shared__ float array2;

  I know it has to use the dynamical creation from calling the kernel function,

  but I encountered some problems on compiling.

  Can anyone give me a very simple example to do that?

Thank you very much.

SPWorley · December 12, 2008, 7:19am

Declaration:

constant float dFloats[1024];

in main():

float hFloats[1024]={123.4, 1.0} ; // fill hFloats on host…

CUDA_SAFE_CALL(cudaMemcpyToSymbol(dFloats, hFloats, 1024*sizeof(float), 0, cudaMemcpyHostToDevice));

Your kernel can then read (but not write) from the dFloats array.

#2

No, you can’t dynamically allocate shared memory.
But you could statically allocate a large chunk and just use whatever portion you like of it. If the dimensions keep changing dynamically, you could change to a 1D array and do your own array multiplies, ie array1[x+y*xsize] .

Gimurk · December 12, 2008, 7:30am

(1) constant memory has nothing with the type of data element, just make sure that its declaration is placed outside any function, and it seem can’t be dynamically allocated, copying data between constant mem and other mem region should use

cudaMemcpyToSymbol(). for example:

[codebox]constant float cons[256];

unsigned float *vals;

cudaMemcpyToSymbol(cons, vals, sizeof(float) * 256);[/codebox]

(2) dynamically allocated array in shared memory by execution configuration should be one dimension, i suggest that you access it as the way to one-dimension array, not the way to 2D

ykhong · December 12, 2008, 8:43am

I try the following code

but the answer is very strange

1.000000000

2.000000000

3.000000000

4.000000000

5.000000000

6.000000000

7.000000000

8.000000000

9.000000000

10.000000000

Does anyone have any idea about it?

[codebox]global void kernel(float *devicearray, const float *arrayD)

{

for(int loop=0; loop<10; loop++)

*(devicearray+loop) = *(arrayD+loop)*10;

return;

};

constant_ float arrayD[100];

int main(void)

{

cudaError error;

float arrayH[10];

float *devicearray;

for(int loop=0; loop<10; loop++)

*(arrayH+loop) = loop+1;

cudaMemcpyToSymbol(arrayD, arrayH, sizeof(float)*10, cudaMemcpyHostToDevice);

cudaMalloc((void **)&devicearray, sizeof(float)*10);

kernel<<<1,1>>>(devicearray, arrayD);

error = cudaGetLastError();

if(error!=cudaSuccess)

printf(“%s\n”,cudaGetErrorString(error));

cudaMemcpy(arrayH, devicearray, sizeof(float)*10, cudaMemcpyDeviceToHost);

for(int loop=0; loop<10; loop++)

printf(“%10.9f\n”, *(arrayH+loop));

cudaFree(devicearray);

return 0;

};[/codebox]

Reimar · December 12, 2008, 3:03pm

Can’t work, the compiler can’t know that the second function argument is in constant memory.

You have to use the “arrayD” global variable directly in your kernel code.

ykhong · December 12, 2008, 3:21pm

I also use the globle variable directly

but it also gives the same result.

Can anyont help me for this problem?

I will deeply appricate fot you. Thanks.

frea1 · December 12, 2008, 3:30pm

You can allocate ‘dynamic’ amount of used shared memory, before kernel execution. In other words array sizes don’t have to be hardcoded.

Examples are in the SDK, search for file sharedmem.cuh.

SPWorley · December 12, 2008, 4:02pm

You’re not using the right arguments for the copy to symbol function. It specifies an offset (probably 0). You’re giving only 4 arguments to a 5 argument function.

See the CUDA reference manual for the function definition.

gshi · December 12, 2008, 5:06pm

You made a couple of mistakes

here is a modified version of your code that works

[codebox]__global__ void kernel(float *devicearray, const float *arrayD)

{

for(int loop=0; loop<10; loop++)

*(devicearray+loop) = *(arrayD+loop)*10;

return;

};

constant_ float arrayD[100];

int main(void)

{

cudaError error;

float arrayH[10];

float *devicearray;

for(int loop=0; loop<10; loop++)

*(arrayH+loop) = loop+1;

cudaMemcpyToSymbol(arrayD, arrayH, sizeof(float)*10, cudaMemcpyHostToDevice);

cudaMalloc((void **)&devicearray, sizeof(float)*10);

kernel<<<1,1>>>(devicearray, arrayD);

error = cudaGetLastError();

if(error!=cudaSuccess)

printf(“%s\n”,cudaGetErrorString(error));

cudaMemcpy(arrayH, devicearray, sizeof(float)*10, cudaMemcpyDeviceToHost);

for(int loop=0; loop<10; loop++)

printf(“%10.9f\n”, *(arrayH+loop));

cudaFree(devicearray);

return 0;

};[/codebox]

[/quote]

Topic		Replies	Views
Constant memory allocation and initialization CUDA Programming and Performance	12	82993	November 20, 2010
Constant memory CUDA Programming and Performance	3	3011	November 4, 2008
How to dynamically alloc constant memory? CUDA Programming and Performance	2	3854	July 16, 2007
Can you dynamically allocate constant arrays? Host treating const, global, texture arrays equally? CUDA Programming and Performance	3	5307	October 4, 2010
Shared Memory initialization CUDA Programming and Performance	19	45496	March 26, 2007
Efficient way of reading dynamic array in kernel? CUDA Programming and Performance	5	1680	July 12, 2010
errors while usin shared memory error: expression must have constant value in line 16n 17 CUDA Programming and Performance	7	5699	August 20, 2009
constant memory as pointer? CUDA Programming and Performance	7	1375	May 7, 2014
Initialized shared memory in CUDA CUDA Programming and Performance	3	3331	July 25, 2015
Constant Memory Allocation __constant__ memory runtime allocation? CUDA Programming and Performance	1	2563	July 31, 2007

constant and shared memoty

Related topics