Initialize with (float) value instead of (int). cudaMemset2D( void* dstPtr, size_t pitch, int value,

Hi all,

I was wondering if it was possible to use the following CUDA function to initialize for example with FLT_MAX, or 52.34f, instead of juste an int value (0,1 etc…)

cudaMemset2D( void* dstPtr, size_t pitch, int value, size_t width, size_t height)

In this case, it would be used as follows:

float * d_idata;

size_t pitch_idata;

cudaMallocPitch( (void**) &d_idata, &pitch_idata,sizeof(float)*nbcol, nbrow);

cudaMemset2D( (void**) &d_idata, pitch_idata, FLT_MAX, sizeof(float)*nbcol, nbrow);

I tried it but it doesn’t work correctly? Is there another Cuda function that we can use. Or should I write a kernel to execute this?

I don’t understand the memset functions well enough, but I suspect that you can reinterpret-cast the float to an int. Might be enough to get her going.

cudaMemset2D( (void**) &d_idata, pitch_idata,(int) 12.34f, sizeof(float)*nbcol, nbrow);

Is that what you mean when you say “reinterpret-cast” ?

Unfortunately, with or even without a cast, I always get the same result (in device emulation of course… otherwise I couldn’t see much of what’s going on within the GPU).

And the value that I read, from d_idata is

-0.0013270393	float    

(Copy-Paste from the VS2005 debugger)

As if it hadn’t been at all intialized… This is surely inappropriate since int and float don’t have the same length…

Anyway even with int values. The only intialization that I get is with zeroes, even though I tried to initialize with 5 :

int * d_idata;

size_t pitch_idata;

cudaMallocPitch( (void**) &d_idata, &pitch_idata,sizeof(int)*nbcol, nbrow);

cudaMemset2D( (void**) &d_idata, pitch_idata, 5, sizeof(int)*nbcol, nbrow);

0	int

Can’t you use cudaMemset2D() with your floating point value encoded in an integer? Traditionally, memset() takes char values and does byte level initialization, but the driver API includes half word and word sized versions of Memset2D, so you might imagine it would work.

Nope, that just turns into 12, instead get the actual bits in that float variable exactly into an int, you can do it with pointer-munging:

float bleh = 12.34f;

int strangeInt = (int) &bleh;

– something like that at least

I prefer to use unions for that sort of thing

union bleh {

	float thefloat

	int   theint

};

bleh.thefloat=12.34f;

int i=bleh.theint;

this code however is not portable between 32 bit and 64 bit platforms, I think.

sizeof(float) == sizeof(int) on x86-64 platforms

oh, that was new to me. Thanks for clarifying.

When using “Union” the ptx code looks like the below: (It uses local memory to store the variable.)

.local .align 4 .b8 someLocMem[4];   

....

st.local.s32 	[someLocMem], someIntReg;  <--very expensive

ld.local.f32 	someFloatReg, [someLocMem];  <--very expensive

When using ( “int strangeInt = (int) &somefloat”) the output looks like this:

mov.b32 	someFloatReg, someIntReg;

And would be much faster on the gpu.