Using tensor cores to multiply floats


I was using tensor cores to multiply two matrices of float elements. However, I realized that if I transform them to half, and use tensor cores, the precision I obtain is not good enough.

One of the matrices is very well represented by half precision, but the other it is not.

Do you know if there is any way to represent the badly represented float matrix as two half precision matrices, and then perform the multiplication twice, one for each of these two matrices, and finally combine the result?

Many thanks for your help.

Hi ocanela,

Depending on your data dynamic range, you may use downscaling before the multiplication and upscaling after the multiplication. The smaller is the number the greater is the resolution.

If you cannot do so, you can use a linear combination. Suppose that I am multiplying two matrices of 2x2. Let’s compute one of the coefficients of the result matrix:

c11 = a11*b11 + a12*b21

Let’s suppose that B = U + V, and C = P + Q. You can do, for example:

P = A * U
Q = A * V
C = P + Q

Numerically, suppose that a11 = 5, a12 = 3, b11 = 5.1, b21 = 3.2. We can set u11 = 5, v11=0.1, u21=3, v21=0.2, such that b11 = 5.1 = u11 + v11 = 5 + 0.1, and so on:

Our desired result:
c11 = a11*b11 + a12*b21 = 5 * 5.1 + 3 * 3.2 = 35.1

Now, with the decomposition:
p11 = a11*u11 + a12*u21 = 5 * 5 + 3 * 3 = 34
q11 = a11*v11 + a12*v21 = 5 * 0.1 + 3 * 0.2 = 1.1

c11 = p11 + q11 = 34 + 1.1 = 35.1

You can play some Math tricks like scaling with this linear decomposition to achieve your goal.

Hope this helps.


Just as an appendix:

About your initial approach about using two half precision matrices, you have to consider that a floating point number is no more than the following product:

a = 1.f * (2^( e - 2^(bit{E-1} - 1) ) )

where f is the mantissa represented in fixed-point numbers, or the same to f = m/2^(bit{M}, m is the value of the mantissa, and M is the number of bits used to represent the mantissa, e is the value of the exponent and E is the number of bits used to represent the exponent.

The only path that I see so far is to approximate somehow

1 Like

There are standard techniques for splitting floating-point numbers into pairs of lower-precision floating-point numbers, but given the extremely limited range of IEEE-754 half precision, I doubt they can be brought to bear upon this use case. But since no details have been given, I could be wrong about this and you might want to give it a try.
Claude-Pierre Jeannerod, Jean-Michel Muller, Paul Zimmermann, “On various ways to split a floating-point number”, In: 25th IEEE Symposium on Computer Arithmetic, June 2018, Amherst (MA), United States. pp.53-60

Many thanks for your answer and taking the time to write it. I guess one of my questions is which is the best effective way to split B between U and V. I tried with some simple splits, and it improves, but not as much as I would like.

@njuffa posted a paper talking about it, I will have a look there to see whether I can find better ways of performing this split.

Many thanks again both, for your help.

Many thanks for pointing me to this paper, I will have a look at it.