Hi,

Is it possible to scale a matrix (FP64) by the texture on the GPU in parallel with cuda core and tensor core operations?

Cheers

Hi,

Is it possible to scale a matrix (FP64) by the texture on the GPU in parallel with cuda core and tensor core operations?

Cheers

What scaling operation are you referring to?

The texture unit on a GPU has the ability to do certain kinds of multiplications when performing “texture linear filtering”, but these (AFAIK) cannot be used to scale a matrix (e.g. `sA`

where `s`

is a more-or-less arbitrary scalar and `A`

is a matrix).

The texture unit in a GPU can retrieve 8-byte textures but is not able to perform any of the usual texture linear filtering on them.

Even if you could do this somehow (&), the texture linear filtering engine has only about 9 bits of resolution, so you’d have to consider carefully the use-case for applying this on a FP64 type.

(&) in the 32-bit float case, I suspect it might be possible to have an interleaved realization of the `A`

matrix, such that the range of possible scaling is represented by interleaved points. Suppose we have a 1D `A`

matrix like so:

```
0.2 0.4 0.6 0.8
```

It might be possible to create an interleaved version of `A`

, where the interleaved values represent the maximum range of scaling (let’s assume a maximum multiplier of 10 for `s`

, i.e. `s`

is in the range of 1 to 10):

```
0.2 2.0 0.4 4.0 0.6 6.0 0.8 8.0
```

You could then use the linear interpolator (perhaps) to scale the `A`

matrix at the point of texture fetch, by providing an offset to the sample point that varied from 0 to 1, representing a multiplier from 1 to 10 (in this example).

This would still be potentially “coarse” scaling due to the limited representation of the scaling factor.