Custom DataTypes for CuSparse

aglover1183 · June 3, 2024, 7:00pm

I have an application where I am dealing with a data type which is a 3-vector of scalars, representing a Cartesian coordinate vector. I was thinking that if I defined operations like the dot product or the cross product, I might be able to leverage the CuSparse libraries for handling sparse interactions between collections of this data type.

However a brief review of the documentation makes it look like CuSparseT cannot be extended to work with anything more than a 2-vector (complex number) currently. cuSPARSE . Can someone confirm that my understanding of the capabilities of the CuSparse Library are correct.

As a fall back, I can write everything in a more general linear algebra library, I just wasn’t sure how much performance I would be leaving on the table.

I would appreciate any insight y’all can share.

-Austin

fbusato · June 3, 2024, 7:42pm

Yes, cuSPARSE doesn’t support 3-vector of scalars. We provide SpMM with custom operations cuSPARSE but this doesn’t allow to custom data type. About performance, it depends on how uniform your matrices are. If they are uniform (similar nnz per row) you should get similar performance, while for non-uniform matrices could be several times slower

aglover1183 · June 3, 2024, 8:05pm

Okay, this is inline with my expectations. It’s good to hear that in some cases there shouldn’t be too big of a performance impact using a different library. I’ll focus on that approach.

Do you have any idea if supporting custom data types is on the road map? With the JIT LTO and custom operations support, it seems like a lot of flexibility exists in the system, but I’m sure it would cause a lot of headaches.

Because the data I’m looking at is 3-Vector which is ubiquitous in graphics, is there any chance that there is some Nvidia graphics library with rich support and accelerations for 3-vectors?

fbusato · June 3, 2024, 8:17pm

right now, we don’t have a plan for supporting custom data types. We could add 3-Vector if we see there are enough use cases. I’m not aware of any other libraries working on 3-vector scalars.

eedwards · June 3, 2024, 8:40pm

Hello Austin,

cuSPARSE has some support for Block-Sparse Row (BSR) matrices. A BSR matrix with 3x3 blocks might be a good fit for you. If you have, for example, a mesh with N vertices each of which interacts with its neighbors, then you can build a BSR matrix with 3x3 dense blocks, with one block per edge of the mesh. The 3x3 block encodes the linear interactions between neighboring vertices. I’ve encountered this sort of BSR structure in computer-graphics physics simulations and geometry-processing applications.

That said, cuSPARSE’s support for BSR matrices is not as mature as its support for CSR matrices. You might still find that using cuSPARSE’s CSR format runs faster. You’d have to benchmark your particular application to find out for sure.

aglover1183 · June 20, 2024, 12:09am

I just want to share the use case for your curiosity.

Having N-Vectors of scalars would be useful for accelerating message passing in geometric graph neural networks.

In order to efficiently train on 3D data there are custom interaction and accumulation operators that operate on 3-vectors (regular x,y,z basis) and higher order vectors (5, 7, 9… Vectors).

In the same way that there are “complex numbers” which are managed as 2-vectors, there is the 3D basis for 3 vectors, quaternions as 4-vectors, 2nd order spherical harmonics as 5-vectors, and so on. In geometric deep learning these kinds of data types which are N-vectors will keep appearing in GNNs that operate on 3D and physical data.

In GNN’s, Message Passing is typically viewed as being a spMV operation for efficient implementation. It would be really nice if the library added support for N-vectors (at least a 3 vector would be nice, to enable basic equivariant architectures) so that developers of geometric learning libraries could leverage the basic algorithms and JIT LTO framework.

Or point to a supported workflow or 3rd party library that lowers nicely and provides reasonable performance.

For example, a short coming in the ecosystem is that PyG offers an efficient spMV based implementation for scalar data message passing (Memory-Efficient Aggregations — pytorch_geometric documentation) but falls back onto edge-wise parallelism for more complex cases like discussed above.

Support in cuSparse for such operations would really make things easier for the geometric deep learning community, (in my opinion.)

I’m curious to know your thoughts.

aglover1183 · June 20, 2024, 12:12am

Hello Eedwards,

I think that sort of method would be limiting in the kinds of operations it would be possible to support. For example, how would you define the cross product between two 3-vectors in such a scenario? The data is really a 3-Vector and so I wish I could just store it and operate on it as thought it were a 3-vector.

I appreciate your feedback.

-Austin

eedwards · June 24, 2024, 3:28pm

Hello Austin,

I’m not sure what you’re asking for. I think you are talking about y := A*x where the elements of x are k-vectors, but what is A and how is A*x defined? Could you write down the mathematics of a well-defined example of the thing you are trying to do?

For example, how would you define the cross product between two 3-vectors in such a scenario?

I’m not sure if this really applies to what you are doing, but a = cross(b, c) is the same as a = B * c, where B is a 3x3 cross-product matrix derived from b. That’s an example where the 3x3 BSR structure shows up.

-Essex

aglover1183 · July 16, 2024, 8:49pm

I’m not sure what you’re asking for. I think you are talking about y := A*x where the elements of x are k-vectors, but what is A and how is A*x defined? Could you write down the mathematics of a well-defined example of the thing you are trying to do?

A is a sparse matrix where each element of A is a k-vector.

Let’s say the operations we want is the dot product aka, (k-vector, k-vector) → scalar.

You would get one element from the sparse matrix A, one element from the vector, interact them to get a scalar and then accumulate the result for a row with your accumulate function (sum)

You could imagine doing the same thing where your interaction function is (3-vector, 3-vector) → 3-vector and it’s the cross product. You accumulate with the accumulation function (sum).

There is a whole family of interaction functions which interact one k1-vector with another k2-vector and produce a k3-vector, (clebsch-gordon tensor products). These products are the analog of simple scalar multiplication for a class of equivariant representational learning.

If cuSparse supported k-vectors as valid datatypes, cusparse could be the backbone of efficient message passing implementations for this class of ML model in the same way that cuSparse is used for efficient message passing in networks that use scalar datatypes.

I’m not sure if this really applies to what you are doing, but a = cross(b, c) is the same as a = B * c, where B is a 3x3 cross-product matrix derived from b. That’s an example where the 3x3 BSR structure shows up.

You definitely could do this, but imagine the overhead from this approach for doing something like the dot product on 5-vectors. You would have a block of 20 zeros for 5 scalars, and you have to instantiate all of the B matrices. It’s more about performing the operation efficiently, than how to perform it.

If this isn’t in the plans, no worries, I was just curious.

eedwards · July 16, 2024, 11:43pm

Okay, I think I understand now. The elements of A are k1-vectors. The elements of x are k2-vectors. The equation for A*x is still y_i = sum_j A_ij*x_j, but multiplication A_ij*x_j has been replaced with some arbitrary user-specified function f(A_ij, x_j) that produces k3-vectors. In the examples given (f=dot and f=cross) the function f is a bilinear form, but it sounds like maybe that’s not always the case in this application domain.

This sounds interesting to me, but as @fbusato said, it’s not currently available or on the roadmap.

aglover1183 · July 17, 2024, 6:55pm

Okay, I think I understand now. The elements of A are k1-vectors. The elements of x are k2-vectors. The equation for A*x is still y_i = sum_j A_ij*x_j, but multiplication A_ij*x_j has been replaced with some arbitrary user-specified function f(A_ij, x_j) that produces k3-vectors. In the examples given (f=dot and f=cross) the function f is a bilinear form, but it sounds like maybe that’s not always the case in this application domain.

That’s exactly it! The functions I care about would always be bilinear. And they would fall into a tightly constrained class of functions. Complex number multiplication is an example of this class of functions.

If you think about the existing example with complex numbers, 2-vectors.
func(a, b) -> c is given by (a[0] * b[0] - a[1]*b[1], a[0]*b[1] + a[1]*b[0])

Every scalar element in the output is given by a fixed linear combination of products, each product drawing one element from each of the two input vectors. This condition is also true for the dot product and the cross product and all the other unnamed functions in this class.

Just to illustrate this for the dot and cross product:
func(a, b)->c = (a[0]*b[0] + a[1]b[1] + a[2]*b[2])
func(a, b)->c = (a[1]*b[2] - a[2]*b[1], a[2]*b[0] - a[0]*b[2], a[0]*b[1] - a[1]*b[0])

Complex multiplication is the 2D analog of these 3D operations

This sounds interesting to me, but as @fbusato said, it’s not currently available or on the roadmap.

All good! Thank you for your answers and info!

-Austin

Topic		Replies	Views
sparse vector and symmetric-sparse matrix-vector multiplication BLAS CUDA Programming and Performance	1	1755	October 26, 2009
CUBLAS Sparse Matrix-vector multiply CUDA Programming and Performance	0	11978	April 9, 2007
Sparse matrix operations inside cuda Kernel CUDA Programming and Performance	5	941	November 10, 2017
CUSPARSE - Add 2 sparse matrices in csr format CUDA Programming and Performance	7	3389	May 15, 2011
Sparse Matrix-Vector Multiplication on CUDA CUDA Programming and Performance	79	314809	November 22, 2010
Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt Technical Blog	10	1476	March 14, 2022
Questions about CUB DeviceSpmv GPU-Accelerated Libraries cub	3	836	July 30, 2021
Cusp v0.1 release (Sparse Matrix Library) Cusp is a high-level library for sparse linear algebra and CUDA Programming and Performance	0	1539	May 4, 2010
multi-threading with cuSPARSE lib GPU-Accelerated Libraries	15	1572	November 10, 2017
use cusparse library from python GPU-Accelerated Libraries	4	2294	July 20, 2019

Custom DataTypes for CuSparse

Related topics