Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The title is terrible. Here is an excerpt from the article that explains what's going on: "The company's solution was to develop Git Virtual File System (GVFS). With GVFS, a local replica of a Git repository is virtualized such that it contains metadata and only the source code files that have been explicitly retrieved. By eliminating the need to replicate every file (and, hence, check every file for modifications), both the disk footprint of the repository and the speed of working with it were greatly improved. Microsoft modified Git to handle this virtual file system. The client was altered so that it didn't needlessly try to access files that weren't available locally and a new transfer protocol was added for selectively retrieving individual files from a remote repository."

This style of working is needed in large code bases, where not all files are checked out on developer workstations (for performance or privacy reasons).



If you missed the discussion with details about how the technology actually works, rather than xplat: https://news.ycombinator.com/item?id=13559662 linked Q&A https://www.reddit.com/r/programming/comments/5rtlk0/git_vir...

An open-source cross-platform virtual file system API that was also fast (as opposed to FUSE) would be amazing.


It would indeed, although FUSE has the virtue of simplicity, and it would be nice not to sacrifice that.


isn't this similar to what was done by rational (the company thst ibm bought iirc) as part of their clearcase product ? my experience with that was not so great unfortunately. it seemed very clunky to use and i think required an army of system-administration folks to run.


Naming things is among the hardest things for programmers... what would you title it?


And good naming is useful to others. So I think we shouldn't discourage useful debates on how article titles could be improved. That helps us improve our naming skills.


That was partially a reference to a popular programming joke (ex: https://twitter.com/codinghorror/status/506010907021828096?l...), but I would think that asking for an example of a better title is the opposite of discouraging debate. Apologies if that seemed sarcastic; it was not meant that way.


Wait... so they turned got into a centralized abomination and added access control to parts of the source tree (A feature that SVN had from the start but decentralized VCS cannot provide)? Is this correct?


What do you think is more possible?

(a) the result of the work of a major Git player like GitHub, and a major software company like Microsoft, with tens of excellent engineers devoted on it, and that solves a real pain point they have, is a centralized abomination that merely replicates a feature SVN already had

(b) your description is a crude knee jerk reaction

?


In a similar rhetorical style:

If SVN is a terrible application with no merits to large developers like MS, why did it exist in the first place?

If the features SVN brings over git are not considered detrimental by Torvalds, why did he create his own VCS with them removed, and why hasn't he added them back in?


>If SVN is a terrible application with no merits to large developers like MS, why did it exist in the first place?

Where's the contradiction? All kids of terrible apps exist. Terribleness and existence are not mutually exclusive qualities.

(Assuming SVN is terrible, of course, which I didn't say. I'd say SVN was a attempt to go beyond CVS with some shortcomings that don't make it the best available option today).

>If the features SVN brings over git are not considered detrimental by Torvalds, why did he create his own VCS with them removed, and why hasn't he added them back in?

Lots of possible answers (given the assumption in your "if"):

E.g. he might not consider them detrimental for other people and use cases, but he doesn't need them for his use case (Linux kernel development) either.

Or he thinks that while they might be good, they complicate things too much, and he prefers a more minimum feature set.


Iirc google and facebook decided against using git because it didn't scale to stupidly enormous code bases.

Microsoft decided to stick with git but add non-strict fetching to it. On the plus side, still all the advantages of easy branching/merging and working offline if you touched the required files which distributed vcs's bring. But you still need to be connected to work with parts of the code base you haven't used before.

So I guess if you run the test suite for your current task first then you can work offline since all relevant files will be fetched?


> But you still need to be connected to work with parts of the code base you haven't used before.

Which seems like a fair tradeoff to me. The existing "native solution" for a very very large codebase would be to have it split into multiple, logical repositories. If you fetched one repo you needed to work on, but not all the dependencies or sibling repos, you still wouldn't be able to work on those other parts of the codebase until you connected.


>Iirc google and facebook decided against using git because it didn't scale to stupidly enormous code bases.

Compared to mercurial, that Facebook uses?


And Facebook added a lot to Mercurial so that it would scale to their stupidly enormous code base.

https://code.facebook.com/posts/218678814984400/scaling-merc...


Meanwhile OpenJDK (which is another project composed of separate but interconnected components) wrote a mercurial extension to manage multiple repositories easily (http://openjdk.java.net/projects/code-tools/trees/). I think this is a much better solution than a monorepo, you can still see the "whole codebase" like it were a single giant repository but you don't have to deal with the scaling headaches in quite the same way.

I was pretty surprised when I made my own build of OpenJDK 9 how easy it was to work with, `hg tclone blah`, `hg tup jdk9.0.1+11`, `./configure blah; make images` and done. Even if git submodules were closer in functionality (checking out the same tag across multiple modules at once with ease) the song-and-dance with actually downloading the modules after cloning is annoying.


It's still distributed, just not entirely 100% distributed. It's like the difference in which IPFS nodes have which items pinned or how many seeders you have versus torrenters (and the subsequent file availability in that cluster) in BitTorrent.


Are these things really comparable? I am not so sure. My understanding of GVFS is that it is hierarchical, not P2P.


In the reference implementation from Microsoft it most closely resembles a CDN: you give it a list of servers to back the git database when it needs to look up an object. You distribute those servers as you see need to, based on very similar logic to CDN distribution. For instance, you probably want at least one available per office to keep content close to the users that need it.

Even if a CDN is more "hierarchical" rather than P2P, it's still distributed, it's just distributed on a different axis than you are perhaps expecting.

Furthermore, to a very large extent that's an implementation detail. The GVFS protocol itself [1] is a very simple REST API, and there is absolutely nothing stopping you from building a GVFS "server" that is literally backed by IPFS or BitTorrent or some other P2P file system.

[1] https://github.com/Microsoft/gvfs/blob/master/Protocol.md


Man, I get sick of the incessant title criticism on HN. I wish the quality of the title wasn't such a go to topic. Unless the title is grievously misrepresenting, let's just discuss the topic.


Yes the latest crowd has created some memes for themselves. I agree we need to extinguish them for the sake of continued quality HN discussion. Including:

- X software package name reminds/confuses me of Y product with similar name

- X title is garbage. Here is the title I would write...

- I don't understand [basic concept available on Wikipedia]. Someone explain it to me (aka lazyweb)

- I'm not an expert on this, but [completely unqualified and uncited conjecture]

- I also once did X from TFA and [unrelated personal anecdote with no insight] (aka long form "me too" comment)

I come to HN so that I can read informed discussion from people who work in the field of TFA. To read discussion of basic topics between the uninformed, there is everywhere else on the internet.


That meme of "I don't understand [basic concept from wikipedia, googleable phrase, dictionary word] can you explain it?"

Has infected the hell out of our company Slack. It's considered very rude to not answer.

I feel like when I was coming up, where IRC or mailing lists were a thing, having done all possible research before asking humans was an absolute cultural requirement.

It sucked at first, but I definitely miss that and long for the old days. It was so much more respectful and efficient.


I mean, unlike a lot of aggregation sites, HN has a specific policy about link titles, which is generally sensible and does the right thing. (https://news.ycombinator.com/newsguidelines.html)

Sure, in some cases, implementation of this policy fails, and something is rewritten that shouldn't be, or something should be rewritten that isn't. But, I'd estimate maybe 80% of all discussions about titles, that I've seen, have been valid concerns, and typically resulted in a rewrite.


I have long been familiar with HN's policies. OP did not suggest a new title, they instead posted a long quote from TFA and added an oversimplified summary -- which I'm not sure adds to quality discussion. Articles which are only clickbait should be flagged. Correcting the title on a good article is different from complaining.


I sometimes wonder if it isn't almost literal bike shedding: I have nothing to contribute on the topic it hand, but I'm compelled to contribute, so I'll go on about the title. Obviously there are some egregious examples that are correctly called out, in this case, yeah, I saw "the title is terrible" and quit reading.


Other similar types of responses:

- this project/library/company name is terrible

- this website hijacks my scroll bar

- this website doesn’t render well in mobile

- I can’t read this site because it’s too narrow

- I can’t read this site because it’s too wide


The last four I’m willing to forgive as a warning to the large demographic of HN: in case any of you other yahoos think this is a good idea, it’s not.


Maybe we can make this the go to reply on these comments.


But then we all miss out on the title criticism criticisms.


There is a [-] that you can click to minimize any thread you do not wish to read.


"Just delete it if you don't want to read it", said every contributor and defender of low-quality content ever. I've been hearing this since Usenet was popular, and yet the pattern doesn't change. Can you picture Donald Knuth typing, "just delete it if you don't want to read what I write"? No, because he doesn't have to.


Seems like Microsoft is trying to brand a cached proxy for Git...? This would make more sense as a paid service for those that need it. Possibly, a local appliance that a company would host internally. What am I missing...?


It's a virtualization layer for the Git object database (all the hash-named files in a repository's .git folder) intended for sparsely checking out large git repositories (lots of commit history, huge trees of files, etc), and only downloading the objects you need as you need them.

It defines a "CDN protocol" for downloading those objects as needed (which Bitbucket and GitHub are both supporting in various alpha/beta stages), which is essentially a cache offered as a paid service to big enterprise projects, but the GVFS project also has to make sure that git operates as efficiently as possible with sparse object databases, and implements how those sparse object databases work at all (which to this point was not something git concerned itself with, and partly why the work is being done as a filesystem proxy using placeholder files on the user's machine).

The project has included work in making sure that git commands touch as few objects from the object database as they can to get their work done (minimizing downloads from a remote server).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: