Could you expand on the graph layout constraints? I'm working on something that ...

hansvm · on Sept 24, 2024

Sure! The core of the concept is just building up a library of differentiable constraints you might like to apply to a graph and then mixing and matching (including higher/lower weights on some of the constraints) according to the human intuition you have about the graph you'd like to display.

- Suppose you want the graph to roughly fit on a page, being neither too big nor too small. This is the basic force-directed graph constraint (edges pull nodes together, all pairs of nodes push each other apart). It's often sub-par by itself because it doesn't explicitly do anything interesting with labels or edges and because it tends to emphasize local clusters while letting the rest of the graph remain fairly busy. If you _want_ to spot clusters and don't care much about other edges (maybe in disease tracking for example), that might not be a bad final result, but it's more useful as a building block onto which you can tack on other behaviors.

- Suppose the graph has meaningful topology you want to highlight. E.g., you've done a topological level sort on a DAG and want nodes in the same level to be closer to each other than nodes in other levels. Add an error term pulling nodes in the same level together. The above force-directed constraint will then handle the rest of the layout.

- Laying out nodes is a bit different from laying out edges. If you model your edges as something curvy and differentiable (splines aren't a bad choice, though perhaps add a term minimizing wiggliness too), you can add an error term penalizing too much "busyness" in the graph, e.g. by saying that it's good if you have large regions of whitespace (add up the squared areas of all the connected regions created by your edges and the edges of your bounding box, then negate that to get an error term). This will tend to coalesce edges into meaningful clusters.

- You might want to add a "confusion" term to your edges. One I like is maximizing the _average_ (not total, otherwise you'll encourage crossings) cosine distance at edge crossings.

- Force directed layouts naturally minimize edge crossings, but you might want to add an explicit term for that if we're adding in other competing constraints.

- If you have any human intuition as to what's important in the graph (apologies, "social system architecture" isn't parsing in my brain as a cohesive concept, so I can't provide any concrete examples) then you can tag nodes (and/or edges) with an "importance" factor. The error term you'll likely want is some combination of higher repulsive forces between those nodes and the nodes in the rest of the graph (making the important information distinct), and weaker repulsive forces between the nodes in the other part of the graph (so they take up less space relative to the important structure). A fairly clean way to handle that is a repulsive force proportional to the largest importance squared divided by the smallest importance (or some monotonic function on the constituent large/small parts).

- Labels are something else you might want to handle well. As with the rest of this, there are lots of error terms you might choose. An easy one is adding a repulsive force from the edges of the label to the other nodes, an attractive force to the parts of the graph the labels should be labeling, and a repulsive force from the edges of the label to the graph edges.

- An important point is that you can represent basically all of the other interesting graph layouts as just some projected [0] gradient constraint being optimized. E.g., to get a hive plot [1] you impose a radial constraint on nodes according to their assigned groups and just use rules minimizing edge crossings, minimizing confusion at the existing crossings, and maximizing large chunks of whitespace in the graph to meaningfully cluster the edges (also something minimizing wiggliness if your edge description is complicated).

- And so on. This is quite long. Maybe a good general principle is that if you don't like something about a given layout, if you can quantify what you don't like then you can add that as an explicit error term and get a better graph.

[0] If you have "hard" constraints which can't be violated, replace gradient descent with projected gradient descent. Take a step according to the gradient, then "project" the parameter vector being optimized to the nearest location not violating those constraints (or, more simply, just project each node/edge/... individually, though convergence will likely be a bit slower).

[1] https://ggraph.data-imaginist.com/articles/Layouts.html#hive...

CMCDragonkai · on Sept 26, 2024

Thanks for the detailed response, these are all very interesting constraints, but how would you convert this in an algorithm? It sounds a bit like simulated annealing? I'm aware of gradient descent algorithms used in machine learning, but how would you end up applying this to a graph layout problem?

Regarding social system architecture, I just mean system architecture, but applied to social systems.

> - Suppose the graph has meaningful topology you want to highlight. E.g., you've done a topological level sort on a DAG and want nodes in the same level to be closer to each other than nodes in other levels. Add an error term pulling nodes in the same level together. The above force-directed constraint will then handle the rest of the layout.

An example is like a loose graph of nodes, where we have 2 kinds of nodes, identity nodes and authority nodes. Identities are important - because they are publically discoverable. I want the identity nodes to be more important - and I guess that's where things like topological sort is interesting.