Graph characteristics & taxonomy?


Your customers surely ask you for ALL KINDS of input graphs. Can you give a rough taxonomy of the kind of graphs you actually see in practice? For instance, I would say that one category is social-network graphs that are scale-free-ish, where there’s a wide range of input degrees and a wide variance, and then another category might be road-network graphs that are planar, have a small degree and very little variance in degree. How do you characterize the kind of graphs you see?

We work with a wide range of customer, which means that we see a wide range of graph types. The vast majority could be classified as power-law graph, both with and without
properties. But we also see a lot of bipartite and N-partite graphs. Graphs that look like road networks, high diameter with low average degree. We are also seeing an increase in the number
of multi-graphs and have had to augment our data structure to support edge ids.

I have not taken the time to count the various graph types, but that is a great
thing to do.

Oh, cool, that’s very helpful. At a high level can you talk about the use cases you see for bi/N-partite graphs and multi-graphs? Like, how are people looking at them?

Do you specialize the data structure for partite graphs?

Can you expand on “augment our data structure to support edge ids” for multi-graphs? Are you extending COO/CSR and/or is this in your blocked data structure?

Bipartite graphs come up a lot in retail and cyber. Within retail it is a mapping of customer to products (really n-partite of customers to stores to product types to product
items, etc…). In cyber it typically is computers outside the firewall and computers inside the firewall. For Multigraphs, this comes up in retail and finance
where there are lot of transactions (edges) happening every minute/hour/day.
At the C/CUDA layer, we only have a single structure. AT the Python layer we try and add additional
functions to help users interact with the various types of graphs.

Related to augmenting the CSR. We ran into the problem that if we ran some
type of sampling or pathfinding that returned an edge, we were unable to determine
which edge was selected. We expanded from a simple weight value in the CSR to supporting a tuple so that an edge id
could be passed through (also support passing through the edge types)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.