Question about the CUDA SDK DCT example

Hello,
there is something about the CUDA SDK DCT example that I would like to make sure/don’t understand.
The CUDA SDK DCT implementation does the following:

  • uses byte view of the source data (each data element is 1 byte)
  • converts each byte to float (so 1 byte becomes 4 bytes, for each element)
  • subtracts 128 from each float (probably to normalize it around 0, so that each value x is -128 <= x <= 127)
  • calculates the DCT

I am pretty sure it is done so that the DCT calculation will have enough precision. Because if I have input values around 1 billion (which is likely if I convert unsigned int values to float instead of byte values to float), not much space is left for the digits to the right of the decimal point.

But I would like to integrate the DCT routines into an encoder project that contains other compression stages such as entropy coding. In its current state however, the DCT wouldn’t be useful, since the data size is multiplied by 4. So my question is: could I instead of bytes (8-bit) values just convert unsigned integer (32-bit) values to floats and calculate the DCT afterwards? If not, could anyone give me a hint how to proceed, and also how the DCT is employed in a real-world application? Because this byte-to-float conversion seems strange.

Thanks, Tjark