Hard links prove quite fragile (many tools will remove the link then replace it ...

Overdr0ne · on June 1, 2021

Hmm, do you have some example of these dangerous scenarios? I suspect a lot of tools have come to think of the "file path" as the identifier for a file. So of course they would break if that classification were to be reorganized, like if two parent categories were to swap. I would call that an abuse of the file system though. But the status quo is what it is. Most file systems already have lots of other metadata built in that can be used to access the inode or whatever you call your data structure. My point is, accessing data in a more general case is a search operation.

As far as an external tag store, that is basically what a search index is. And Recoll is full-text, so each file has a shit-ton of tags associated with it. You then just pass the -m flag to the indexer, and it monitors for file modifications, and updates the index accordingly. I have not noticed a significant performance impact there. Mostly just the initial index operation sucks.

dredmorbius · on June 2, 2021

Some I know of, some I'm presuming, and there are all but certainly others.

Hardlinked directories create all kinds of mischeif. That's the principle issue. It's often entirely disabled. Recursive directory trees are all kinds of fun. (Moreso than even the symlinked version.)

Given a hardlink exists, a tool which operates by 1) removing the file (deletes the local directory entry to the hardlink inode), 2) creates a new file (same name, new inode, not hardlinked), and then 3) populates that with new content, creates the issue of a presumed identical hardlink existing where that's not the case.

Hardlinks with relative directory references will reference different files, or configurations, or executables, or devices, from different points on the filesystem.

... or within different filesystem chroots.

Hardlinks might be used to break out of a chroot or similar jail. A process which could change the hardlink could affect other processes outside the jail.

As for tags: These are ... generally ... not the same as what most people have in mind as a full-text index, or at the very least, a special class of index. I'm thinking of a controlled-vocabulary generally instantiated as an RDF triple, though folksononmies and casual tagging systems are also often used.

The problem occurs when you've got a tagged data store that's being modified by non-tag-aware tools. There are reasons why that might be permitted and/or necessary, though also problematic. My sense is that robust tagging probably needs implementing at the filesystem level.

Overdr0ne · on June 2, 2021

Yeah, not a fan of recursive directory trees. Sysfs for example is pretty wonky esp when you're searching for some specific attribute of the device. Not hard links or real files ftm, but same idea. Hard linked directories breaks the category system.

Now the same name for a different inode in two directories is a point well taken, but I would argue that does not fully describe the inode, that name is just one component of the metadata for that file. People are just so unaware of all that other metadata because the interface rarely shows it to them. So many people have taken to packing all that data into the filename. Version numbers, code names- it's one way to achieve portability i guess, but what an ugly compromise! And with all the virtual environments now for pythons and such, it's quite easy to find yourself using the wrong version of something if you don't really know what you're doing and just look at the filename.

Hard linking links all that metadata, which of course does include that unique ID that open returns, so I think it's okay. I would just like to see our file interfaces more adapted to showing all that important metadata in a comfier way

dredmorbius · on June 2, 2021

The same name / different inodes problem isn't a filesystem issue, it's a tools issue. Specifically, the fact that tools which modify files (editors, shells, archival utilities, scripting languages, any random executable) only see the local filehandle, not the fact that it's "supposed" to be a single chained copy across multiple directories.

There might be some way to muck around with that using attributes (at least in theory, I don't know of any that do this now), but presently, the only way to accomplish this is through workflow and integrity-checking systems (e.g., that "filename" at any of numerous specified points should be identical to and/or a hardlink of a specified canonical source).

Oh, and one more: since hardlinks apply only to a single filesystem, any cross-filesystem references are impossible.

I think you also end up with issues in almost all cases of networked filesystems: NFS, SSHFS, etc.

Overdr0ne · on June 3, 2021

Sorry, I was sick yesterday.

It is mostly a tools problem. It should be way easier than it is to see the metadata for a given file in your shell command, browser or whatever. Dired for example has a pretty darn good visual model for this, that could really be taken much further I think. The reason we don't see more metadata like extended attributes and such, is that they are still not standardized across different file systems. So we get left with the lowest common denominator. But a reasonably designed system could just show it if it's there.

I've just always thought a tree is a very elegant way to represent categorical data. Now that I think of it, placing files in the tree is a way to preindex a search for all objects in a given category, basically the ls command. It really affects how we reason about our data. Huh.

Soft links honestly seem like a hack to me, to get around our shitty distributed file system model. And then, because oh no, what if my file is on another server, I guess everyone should just use soft links for absolutely everything. Like, why not just concatenate the host string to the file ID, and have the OS figure out how to handle it? Sorta like tramp.