One interesting/fun problem with text layout is that: width(A) + width(B) != wid...

amelius · on Dec 31, 2022

Didn't Knuth solve this problem in his Computers & Typesetting books?

raphlinus · on Dec 31, 2022

No. Most line breaks in English are at spaces, and the Computer Modern fonts don't have any complex shaping behavior across spaces (it's mostly complex scripts such as Nastaliq that have this behavior). I haven't checked to see whether it takes into account the kerning value with the hyphen added.

The first system I know of that does solve this very well is DirectWrite. Sergey Malkin describes the solution here: https://github.com/harfbuzz/harfbuzz/issues/1463#issuecommen...

It sounds like Chromium has since also adapted the "safe to break" API. Great to see people engaging these hard problems!

taeric · on Dec 31, 2022

I doubt Knuth would claim for "solving" the problem, but I do get the impression that it is a solution. In particular, it does tackle hyphenation of words. Is where I learned of the re-cord versus rec-ord problem.

I'm curious what you mean regarding complex shapes across spaces. If it is related to the displayed bug regarding how words like "office" would be split, I don't think TeX ever had the bug as described. Though, I could be wrong, easily. (Incidentally, what a fun bug. Kudos to who found that one!)

Edit: I also had the impression that TeX would adjust intercharacter spacing as a whole to keep things looking a bit more uniform. Essentially, the goal was to act a lot like a human would when setting out the characters. If needed, you could adjust spacing on all characters within a margin of "not going to be noticed" so that you could expand a line of text to take up the full width. Without just having more space between a few words.

raphlinus · on Dec 31, 2022

By "shaping across space" I mean that the shapes of the words might be very different depending on whether there's a space or whether they're on second lines. That basically doesn't happen in Latin script (though one could possibly imagine a very fancy cursive or a stunt). So I'll give an example in Nastaliq. One of the example images[1] on Wikipedia is "خط نستعلیق" (meaning "Nastaliq script") where the "خط" is tucked under the other word when they're on the same line. There's a space between them, so it's a valid line break. If they were on two separate lines, the total width would be a lot wider because of that shaping behavior.

This is not conceptually a difficult problem, you can just shape all possible substrings between candidate line breaks to get their widths, at which point Knuth-Plass will give the optimal line breaks (relative to your objective function). But given that shaping is expensive, you really want to avoid that in the 99.99% or so cases where the width of the word isn't altered by other words beyond space boundaries. That's what the "safe to break" logic is about - letting you know when you can make that assumption, as opposed to needing to reshape to get the precise metrics.

[1]: https://en.wikipedia.org/wiki/Nastaliq#/media/File:Khatt-e_N...

taeric · on Dec 31, 2022

Sadly, I don't understand the example. Is there a meaning/logic to whether the word is tucked under? I know you said it rarely happens in latin, so I'm assuming it is almost akin to how 2nd is supposed to be narrower, as the "nd" should not be the same as the 2? Definitely not a typographic convention that happens in traditional text, but almost akin to dropped caps? Feels like it is a convention that is common in logos and such? (Though, scanning, I can't see too many logos that stack words, anymore. Olive Garden is about the best example I can find.)

That said, logos are a good example where TeX is not suited. Odds are high that it will not help stack words in a way that works for slogans and such.

aebtebeten · on Dec 31, 2022

Pedantry: note that "safe to break" optimisations can occur vertically (where I encountered them with latin scripts) as well as horizontally.

Also, the original HTML tables algorithm was evidently meant for hardcopy output and not relayout-upon-typed-input, as it was horrendously slow without making some safe to break optimisations.

taeric · on Dec 31, 2022

Pretty sure TeX considers horizontal optimizations, as well? I know it tries not to dangle a single sentence to a page, at least. (Well, I "know" that... I would be far from shocked to find I'm wrong.)

bfgeek · on Dec 31, 2022

A lot of folks think that Knuth-Plass is the optimal solution for good looking text, for all text. It really only considers Latin, and even then has restrictions.

Some fonts have the space character in the kerning table, and possibly (although super rare) in the "ligature" table (gsub). (https://www.sansbullshitsans.com/ is an example of a font with a space in gsub, "paradigm shift" maps to a single glyph).

If you want to go really deep on how complex some scripts are take a look at: https://r12a.github.io/scripts/arabic/arb.html#shaping

taeric · on Dec 31, 2022

Fair, a lot of folks do treat it with probably more praise than makes sense. I'm sure I'm one of those people. :D

That said, my understanding is that there really was no "optimal" for the best way to arrange text. You can, effectively, make an objective function that you can optimize on; but there is no global "this will make text look good" algorithm.

Indeed, even ligatures are... debatable in utility. I personally like them; but I would scoff at any claims for any objective superiority of them. They are fun and a bit of a "flex" for laying out text on a computer. Any other claim is going to be tough to hold up.