I’m not sure what you’re asking for. I think you are talking about y := A*x where the elements of x are k-vectors, but what is A and how is A*x defined? Could you write down the mathematics of a well-defined example of the thing you are trying to do?
A is a sparse matrix where each element of A is a k-vector.
Let’s say the operations we want is the dot product aka, (k-vector, k-vector) → scalar.
You would get one element from the sparse matrix A, one element from the vector, interact them to get a scalar and then accumulate the result for a row with your accumulate function (sum)
You could imagine doing the same thing where your interaction function is (3-vector, 3-vector) → 3-vector and it’s the cross product. You accumulate with the accumulation function (sum).
There is a whole family of interaction functions which interact one k1-vector with another k2-vector and produce a k3-vector, (clebsch-gordon tensor products). These products are the analog of simple scalar multiplication for a class of equivariant representational learning.
If cuSparse supported k-vectors as valid datatypes, cusparse could be the backbone of efficient message passing implementations for this class of ML model in the same way that cuSparse is used for efficient message passing in networks that use scalar datatypes.
I’m not sure if this really applies to what you are doing, but a = cross(b, c) is the same as a = B * c, where B is a 3x3 cross-product matrix derived from b. That’s an example where the 3x3 BSR structure shows up.
You definitely could do this, but imagine the overhead from this approach for doing something like the dot product on 5-vectors. You would have a block of 20 zeros for 5 scalars, and you have to instantiate all of the B matrices. It’s more about performing the operation efficiently, than how to perform it.
If this isn’t in the plans, no worries, I was just curious.