Fast type casting (short -> float)

OK so I’ve got a buffer of short data that I’m trying to do some stuff with, things like transpose, filter, transpose again, etc. Does anyone know a good way to do a fast type conversion between a short buffer and a float buffer on the GPU? I tried something like this:

typedef struct dual_short {

     short a;

     short b;

} dual_short;

typedef struct dual_float {

     float a;

     float b;

} dual_float;

__global__ void convert_input(short* inp_buffer, float* out_buffer, int width, int height) {

     dual_short* ds_ptr = (dual_short*)inp_buffer;

     dual_float* df_ptr = (dual_float*)out_buffer;

    unsigned int xIndex = 2*(blockIdx.x * CONV_BLOCK_DIM + threadIdx.x);

     unsigned int yIndex =    blockIdx.y * CONV_BLOCK_DIM + threadIdx.y;


     if ((xIndex < width) && (yIndex < height)) {

          unsigned int index_in = yIndex * width/2 + xIndex;

         // Convert two shorts at a time

          dual_short temp_ds = ds_ptr[index_in];

          dual_float temp_df;


               temp_df.a = (float)temp_ds.a;

               temp_df.b = (float)temp_ds.b;


         df_ptr[index_in] = temp_df;



But the visual profiler is still reporting it as un-coalesced. Note the dual_float type is there so I can try to coalesce on the 64-bit type. CONV_BLOCK_DIM = 16

Aha, got it working, shouldn’t have had that factor of two on the xIndex calculation:

    unsigned int xIndex = blockIdx.x * CONV_BLOCK_DIM + threadIdx.x;

     unsigned int yIndex = blockIdx.y * CONV_BLOCK_DIM + threadIdx.y;

By luck you can do better… The hardware supports FREE short -> float conversion on read, useful in graphics texture maps where you have a fixed-point image intensity.

Look at section 4.3.4 in the programming guide to see how to set up a texture that will let you stream in your shorts and convert to floats.
You’ll be 100% memory bandwidth limited of course. To be honest you would be even using the (float) conversion directly.

But you might also ask why you’re converting to a float. If its for some later calculation that uses floats, you might use the free texture conversion when you read the data.