Fast type casting (short -> float)

gct · October 16, 2008, 5:57pm

OK so I’ve got a buffer of short data that I’m trying to do some stuff with, things like transpose, filter, transpose again, etc. Does anyone know a good way to do a fast type conversion between a short buffer and a float buffer on the GPU? I tried something like this:

typedef struct dual_short {

     short a;

     short b;

} dual_short;

typedef struct dual_float {

     float a;

     float b;

} dual_float;

__global__ void convert_input(short* inp_buffer, float* out_buffer, int width, int height) {

     dual_short* ds_ptr = (dual_short*)inp_buffer;

     dual_float* df_ptr = (dual_float*)out_buffer;

    unsigned int xIndex = 2*(blockIdx.x * CONV_BLOCK_DIM + threadIdx.x);

     unsigned int yIndex =    blockIdx.y * CONV_BLOCK_DIM + threadIdx.y;

     

     if ((xIndex < width) && (yIndex < height)) {

          unsigned int index_in = yIndex * width/2 + xIndex;

         // Convert two shorts at a time

          dual_short temp_ds = ds_ptr[index_in];

          dual_float temp_df;

          {

               temp_df.a = (float)temp_ds.a;

               temp_df.b = (float)temp_ds.b;

          }

         df_ptr[index_in] = temp_df;

     }              

}

But the visual profiler is still reporting it as un-coalesced. Note the dual_float type is there so I can try to coalesce on the 64-bit type. CONV_BLOCK_DIM = 16

gct · October 16, 2008, 6:54pm

Aha, got it working, shouldn’t have had that factor of two on the xIndex calculation:

    unsigned int xIndex = blockIdx.x * CONV_BLOCK_DIM + threadIdx.x;

     unsigned int yIndex = blockIdx.y * CONV_BLOCK_DIM + threadIdx.y;

SPWorley · October 16, 2008, 10:29pm

By luck you can do better… The hardware supports FREE short → float conversion on read, useful in graphics texture maps where you have a fixed-point image intensity.

Look at section 4.3.4 in the programming guide to see how to set up a texture that will let you stream in your shorts and convert to floats.
You’ll be 100% memory bandwidth limited of course. To be honest you would be even using the (float) conversion directly.

But you might also ask why you’re converting to a float. If its for some later calculation that uses floats, you might use the free texture conversion when you read the data.

Topic		Replies	Views
unsigned short to float conversion on GPU CUDA Programming and Performance	2	3051	January 22, 2010
from unsigned short to float with tmp array CUDA Programming and Performance	1	3469	February 8, 2008
Convert array of shorts to array of floats CUDA Programming and Performance	1	1204	August 15, 2008
from float to unsigned short CUDA Programming and Performance	6	10608	November 25, 2008
Fast way (on device) to convert from byte to float CUDA Programming and Performance	6	11703	August 20, 2007
How to make data transfer be fatest CUDA Programming and Performance	2	1556	February 10, 2009
Type conversions on-board the GPU What's the most efficient way? CUDA Programming and Performance	3	3584	February 27, 2009
coalescing memory in short to float conversion CUDA Programming and Performance	3	4521	January 23, 2009
More data movement/conversion functions CUDA Programming and Performance	3	3361	May 28, 2008
Memory bank access during int to short conversion CUDA Programming and Performance	2	695	June 23, 2011

Fast type casting (short -> float)

Related topics