Multiplying matrices of different element types

Let’s say I have the following C++ API function:

struct matrix;
void matrix_mult( matrix & result, matrix const & left, matrix const & right );

Suppose also that I want to be able to store the matrix elements in various physical formats. Yes, one possibility would be float elements, but I also want to use other formats, for example 16-bit unsigned int elements mapped to the 0…1 floating point interval.

I am trying to implement this function in CUDA using texture fetches and I’m realizing that internally I have to write all possible variations based on the format used to store the matrix elements in device memory (for example I want to be able to multiply a matrix that uses 8-bit unsigned ints elements on the left by a matrix that uses 16-bit unsigned ints on the right, and store the result in a matrix that uses 32-bit float elements.)

As far as I can tell, the only way to specify the physical format is as the first argument of the texture<> template, since tex2D takes an instance of that template as an argument. This is a real bummer because otherwise texture fetching does the conversion to float in the 0…1 interval automatically. If there is a way to do texture fetches this way, the only variation of the kernel I’d have to implement is to support different output types.

So, is there no way to read a texture without knowing the exact physical format of its elements?

This is especially puzzling since in GLSL or HLSL I don’t need to write different versions of a shader to support different physical formats of the input (and even the output.)