I work with the researchers, and I think they'll be interested in knowing which version of the problem practitioners like yourself are most interested in. Do you ever add/remove edges, perhaps in bulk? Or do are you mostly interested in finding "best possible" layouts for static datasets?
Practical edge crossing minimization for large graphs (e.g. a billion nodes) is a really hard problem.
An alternative formulations that's seen some use in graph drawing & circuit design software is to describe the embedding problem as an optimization problem: find an assignment of coordinates to nodes such that the edges (line segments) don't cross.
It turns out that estimating a lower-bound on the # of edge crossings of an embedded graph is equivalent to solving an SVM-type problem, and one can characterize the optimal layout as the solution to a nonlinear optimization problem. What's typically done, though, is to alternate between finding the bound for a fixed embedding & optimizing the embedding with something like gradient descent.
It gets a bit more confusing once you realize ic netlists are hypergraphs, and edges correspond to sets of nodes, (e.g. the "embedding" of an edge of a set of nodes is sometimes modeled as a rectilinear steiner tree).
For edge crossing/congestion minimization, the authors are Shabeer "Edge crossing minimization", Spindler "Congestion driven placement/RUDY", RePlace for a sota academic force-directed placement algorithm.
I don't work on placement problems, though part of my work provides input for the placement problem (produce an optimized logic netlist that then has to be placed and routed). But if I can suggest something in the same ballpark: imagine that you have a large graph and you want to partition it into a small number of partitions (say, 4 to 10) of similar size. You want each partition to be a planar graph, and you want the number of interconnects between these graphs to be minimal. The graphs would correspond to metal layers on an ASIC. There's a lot of fuzziness in the way I stated it; to make it more rigorous a suitable cost function would need to be defined.
Someone correct me, but I think hMetis is one popular software/algorithm for multi-level graph/hypergraph partitioning. There is also KaHyPar which is a bit more academic.
I'm not current on this stuff at all, but I think planarity mattered more in the Mead-Conway era than today. Modern chips have lots of separate metallization layers so you can avoid crossings. Vias aren't free, but they exist so you can use them.