Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, absolutely. Nonetheless, I think the author may be right, except that by "Google" they mean "large." This is a fundamental misunderstanding of just how large Google and its peers are. I think it's more interesting to consider three sizes.

If you're small, everything will fit nicely in a monorepo.

If you're large, you'll want lots of repos. There aren't really any off the shelf monorepo options that scale super well, so using a bunch of small repos is a great way to deal with the problem. Plus, you probably don't have a full time staff babysitting the source repos, so you want some isolation. If someone in another org is breaking stuff left and right, you don't want the other orgs to be affected.

If you're GIGANTIC, monorepos are a pretty great option again. You'll probably have to build your own and then have a full time group of people maintain it, but that's not a huge problem for you because you're a gigantic tech company. You can set up an elaborate build system that takes advantage of the fact that the entire system is versioned together, which can let you almost completely eliminate version dependency hell. You can customize all of your tools to understand the rules for your new system. It's a huge undertaking, but it pays off because you've got a hundred thousand software engineers.



> There aren't really any off the shelf monorepo options that scale super well

How can you say this when Perforce on a single machine took Google to absolutely terrifying scale? There is no chance that your mid-sized software company will even slightly tax the abilities of Perforce.

What I believe you meant was there aren't really any good options to make git tolerable for non-trivial projects, and with that I wholeheartedly agree. And that's why these threads are so tiresome: they always boil down to people talking about what git can and cannot do.


Google wrote a whole paper on the fact that, with the help of a beast of a single computer, they were able to get Perforce to work for 10,000 employees averaging about 3 commits per second (20 million commits over 11 years) and a much higher volume of other queries. That white paper pointed out that Google had taken performance to the "edge of Perforce's envelope" and they were only able to do that by treating Perforce's performance limitations as a major concern and striping a fleet of hard drives on that machine with RAID 10.

https://www.perforce.com/sites/default/files/still-all-one-s...

That's not an endorsement for a company as big as Google was then looking for an easy, off the shelf solution. It'd probably be just fine for a company of hundreds, but so would git.

On the other hand, if you play to its strengths, it's probably a great choice. Maybe a team of dozens of content developers checking in large assets for videogames. Perfectly great use case for Perforce.


An organization with 10000 software developers is already dangerously large. There is no way to define mid-sized as 10k SWEs.


The Linux kernel seems pretty non trivial...


Google3 is no joke 1000x larger than Linux.


Yes, but the Linux kernel is still "non-trivial". You said git is not tolerable for non-trivial projects. I think you just meant that it isn't tolerable for "incredibly large" repos, which I do think is right.

It's just a boring semantic point that I'm making, that "non-trivial" was a hyperbolic word choice.


What’s the story behind google3? What happened to google and google2?


I'd argue that using git's sparse-checkout functionality and enforcing clean commits (such as via the patch-stack workflow and maintaining a hard line approach against diff-noise) does a lot of heavy lifting for handling git monorepos.

Sparse checkouts, shallow fetches/clones, partial clones, etc allow you to work with an egregiously large repository without needing to ever actually mess with the whole thing. Most existing build tooling can be made to work with these features pretty easily however some tools are easier than others.

Enforcing clean commits avoids the issues with keeping track of individual project histories and past that the existing git tooling largely already supports filtering commits to only expose commits relevant to specific directories/pathspecs.

---

The only time I really see an organisation outgrowing a monorepo is if the org is incapable of or unwilling to maintain strict development and integration policies.

Also worth noting because I don't see it mentioned enough but not everything has to be in the same monorepo. Putting all closely related products and libraries in the same monorepo is kosher but there's little reason for unrelated parts of an org's software to all be in the same monorepo. So what might be 50-200 independent projects/repos could be 3-20 monorepos with occassional dependencies on specific projects in the other monorepos.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: