Type conversions on-board the GPU What's the most efficient way?

If I want to convert an array of chars into an array of floats, or vice versa, on the device, do I have the right idea with the following functions?

Will this work? Is there a more efficient way to do this? I’m not sure I understand how to optimize memory accesses, especially when the data types are different sizes (8 bit vs 32 bit units)

__global__

void charsToFloats(char* src, float *dest, int numItems)

{

  int idx = blockIdx.x * blockDim.x + threadIdx.x;

if(idx < numItems) dest[idx] = (float)src[idx];

}

__global__

void floatsToChars(float* src, char *dest, int numItems)

{

  int idx = blockIdx.x * blockDim.x + threadIdx.x;

if(idx < numItems) dest[idx] = (char)src[idx];

}

If I want to convert an array of chars into an array of floats, or vice versa, on the device, do I have the right idea with the following functions?

Will this work? Is there a more efficient way to do this? I’m not sure I understand how to optimize memory accesses, especially when the data types are different sizes (8 bit vs 32 bit units)

__global__

void charsToFloats(char* src, float *dest, int numItems)

{

  int idx = blockIdx.x * blockDim.x + threadIdx.x;

if(idx < numItems) dest[idx] = (float)src[idx];

}

__global__

void floatsToChars(float* src, char *dest, int numItems)

{

  int idx = blockIdx.x * blockDim.x + threadIdx.x;

if(idx < numItems) dest[idx] = (char)src[idx];

}

A couple things off the top of my head

You may want to multiply / divide by CHAR_MAX to normalise the floats (depending on your app)

If there’s a char4 type, I’d use that, and map it to a float4. Each thread converts 4 values and writes 4 values, and you get pretty good speed I think.

void charsToFloats(char4* src, float4 *dest, int numItems)

{

char4 tmpChar4 = src[idx];

float4 tmpFloat4 = make_float4(tmpChar.x, tmpChar.y, tmpChar.z, tmpChar.w);

dest[idx] = tmpFloat

}

This may not be the best way but it shows some ideas I hope!

Good luck and keep us posted, thanks!

A couple things off the top of my head

You may want to multiply / divide by CHAR_MAX to normalise the floats (depending on your app)

If there’s a char4 type, I’d use that, and map it to a float4. Each thread converts 4 values and writes 4 values, and you get pretty good speed I think.

void charsToFloats(char4* src, float4 *dest, int numItems)

{

char4 tmpChar4 = src[idx];

float4 tmpFloat4 = make_float4(tmpChar.x, tmpChar.y, tmpChar.z, tmpChar.w);

dest[idx] = tmpFloat

}

This may not be the best way but it shows some ideas I hope!

Good luck and keep us posted, thanks!