And yet, semantic diff and merge is a thing exactly because text is not universal when it comes to programming. Text is universal when you want to diff text but not when you want to diff programs.
Speaking as someone who's been working in the space of collaborative editors for a decade or so:
There are universal-ish formats for data (eg XML, JSON, etc). There are also sets of pretty standard operations for modifying data - for example, "insert", "move", "remove", etc. The same set of basic operations show up again and again for a reason - in ShareDB, Automerge, Yjs, etc. And you can use those operations to implement most applications.
I don't think semantic diff is ever what you want. Ideally you want your editor to capture the user's intent directly through the semantics of their actions. (Signal is lost reconstructing that in a diffing tool). But I bet visual programming could be expressed pretty well in a standard language of semantic changes. And then version control is something you could build on top of that in a pretty straightforward, and reusable way. (It'd be an awful lot of work - but I doubt there's unknown unknowns lurking out there.)
> Ideally you want your editor to capture the user's intent directly through the semantics of their actions. (Signal is lost reconstructing that in a diffing tool).
I'm trying to understand this part. Would you give an example of semantic diff losing intent signal?
An example is, say you have a counter value. There’s two types of operations users can make - either reset the counter to some value or increment it. The counter was 20 and now it’s 25. Did the user set the counter to 25 or did they increment it 5 times? When there’s only one editor, it doesn’t matter - the result is 25 no matter what. But if two users both edited the value at the same time (the other user changed 20 to 30) now we have two different results based on the users intent. Either the new value be 35 (20+5+10) or we should pick one of the results (25 or 30) and just converge to that new value.
You can’t tell what the users intent was by simply diffing the old and new contents. The right approach is to capture the users intent directly from the software that they use to edit the value, and then preserve that intent through the synchronisation system.
This problem is also easy to reproduce with edits on lists / strings which contain repeated elements.
This kind of works but users don't always express their intent in their edit actions, either because no available edit action can fully capture the intent or because it was simpler to achieve the desired state in a different way.
Changing 999999999 to 1000000001, do you increment by two or re-type the whole number?
It is difficult to make users think in (invisible) state changes.
Small commits are all about trying to preserve an explanation of change intent - but they're not ideal because you can end up doing a lot of incidental code just to keep them actually working if they get merged to master.
Whereas looking at the sum of a big commit, you just get a mess which doesn't tell you much of anything unless it's limited solely to inserting discrete blocks.
Whereas ideally what you really want to know is "there's 37 actions replacing the use of variable Y with a call to function X being passed Y" and the types are the same in all cases.
Sure, semantic diff is nice, but it is purely optional. One can diff programs as text and get pretty far -- after all, that's what practically every version control system does today.
The current products do have a bit of semantic knowledge, in the form of syntax highlights and function navigation, but those do not require full language understanding, fail gracefully if they are wrong, and often can be implemented with just a bunch of regexes.
This makes then writing a new text-based tool simpler (even if semantics is wrong, it is still useable). It also makes writing a new text-based programming language simpler (even though existing tools don't know my language's semantics, they can still work with the text including diffs).
One would need “universal visual language representation” file format if they want the visual languages to become as popular text-based ones are.