Hi,
Is it possible to scale a matrix (FP64) by the texture on the GPU in parallel with cuda core and tensor core operations?
Cheers
Hi,
Is it possible to scale a matrix (FP64) by the texture on the GPU in parallel with cuda core and tensor core operations?
Cheers
What scaling operation are you referring to?
The texture unit on a GPU has the ability to do certain kinds of multiplications when performing “texture linear filtering”, but these (AFAIK) cannot be used to scale a matrix (e.g. sA
where s
is a more-or-less arbitrary scalar and A
is a matrix).
The texture unit in a GPU can retrieve 8-byte textures but is not able to perform any of the usual texture linear filtering on them.
Even if you could do this somehow (&), the texture linear filtering engine has only about 9 bits of resolution, so you’d have to consider carefully the use-case for applying this on a FP64 type.
(&) in the 32-bit float case, I suspect it might be possible to have an interleaved realization of the A
matrix, such that the range of possible scaling is represented by interleaved points. Suppose we have a 1D A
matrix like so:
0.2 0.4 0.6 0.8
It might be possible to create an interleaved version of A
, where the interleaved values represent the maximum range of scaling (let’s assume a maximum multiplier of 10 for s
, i.e. s
is in the range of 1 to 10):
0.2 2.0 0.4 4.0 0.6 6.0 0.8 8.0
You could then use the linear interpolator (perhaps) to scale the A
matrix at the point of texture fetch, by providing an offset to the sample point that varied from 0 to 1, representing a multiplier from 1 to 10 (in this example).
This would still be potentially “coarse” scaling due to the limited representation of the scaling factor.