The 10 commandments of good source control management

Xixi · on May 3, 2011

"10. Dependencies need a home too"

I agree with this statement, but that home is definitely not with your code. There are tools better suited for that. For instance nexus & maven.

Nexus store artifacts (which are really any piece of data: a jar, a tar.gz, a zip file). That's where you put your dependencies. Maven manage a project's build, that's where you tell what are your dependencies.

But really, there are many other tools out there to handle that (ant, nant, pypi, etc.).

mellery451 · on May 3, 2011

I agree with you - the "not invented here" libs don't belong in your repo. I'm always amazed when I walk up to a "mature" base of code and find third party and open source dependencies festering all over the repository. This drives me crazy. There is just NO reason this stuff needs to be in source control. It's pure laziness that drives developers to just punt and check whatever version they happened to be using during dev. This makes final packaging as well as future upgrading a nightmare (particularly if other projects have glommed-on to the same dependency in the meantime).

The lowest-budget solution is simply to store the third party and open source projects in a reliable network file share (NFS, samba, etc). Then you can either sync locally or just build/link directly against the mounted file share.

It's also important to keep each project in a directory structure that includes the version number in the path (e.g. /nih_libs/boost/1.44.1/...) so that you can easily drop in a new version and start using as needed on a project by project basis. I'm always amazed how many places neglect this step and then have nothing but pain when they want to upgrade to a new version of a lib.

curious_man · on May 3, 2011

I still don't understand why dependencies don't belong to the repo. My naive reasoning: I just have to clone the repo and bam, I already have all the dependecies needed to build the artifact. When I need to upgrade to a new version of a library, I only have to commit a new version and delete the old one (possibly integrating the new library in a separate branch if the process is not trivial).

Can you elaborate a bit more on this?

undees · on May 3, 2011

If you use something like Git submodules, Mercurial subrepos, or Subversion externals, you can get the best of both worlds. Your repo contains just your code, but a fresh clone will set up dependencies automatically.

I've also seen really simple projects get by with just an "install_deps" task in the Makefile, which you run first thing on a new clone. ("The simplicity of Maven meets the dependency management of Make," the wags will say.)

curious_man · on May 4, 2011

Sure, this is a solution. But my confusion remains: why I should't put all these dependencies in the repository with the code that uses them? In the end, doing so I have (almost) all I need to produce the intended software artifact.

undees · on May 5, 2011

There are a few reasons to do this. One is to at least nominally separate code that's licensed differently than yours (something your lawyers may ask you to do). Another is to make it easier to share your tweaks to the third-party code among several of your own projects.

moe · on May 3, 2011

It's funny to see someone defend Maven in 2011.

However, yes, checking in dependencies with your code is the way to go.

Deps that can be reliably included with just a meta-descriptor (i.e. Gemfile, pom.xml, etc.) are still exceptionally rare. Thus the natural choice is to put the rest right next to the code that it belongs to. Well, natural choice unless you're stuck in a 90s java mindset...

CodeMage · on May 3, 2011

What's wrong with Maven? I'm asking this as someone who hasn't used it and who wanted to give it a whirl. Any input would be much appreciated.

gchpaco · on May 3, 2011

If you want to do something simple it takes a 60 line pom.xml to express it. If you want to do something a little trickier it takes another 60 lines and three hours minimum of poking around at inadequately written docs. If you want to do something difficult, forget it. Ant, which is a nightmare of usability and continues to make me wake in the middle of the night screaming, is at least as complex to configure, better documented and is not a quarter as constrained as Maven.

That said the Maven dependency model isn't too bad; fortunately you don't have to adopt Maven to get it, you can just use Ivy.

efsavage · on May 4, 2011

You've got that backwards. The "90s java mindset" is the lib dir. The 2011 approach is Maven, as evidenced by the fact that 100% of my clients use it, and practically every major open source project uses it as well.

Perfect? No. But standard practice? Yes.

joeyespo · on May 3, 2011

Agreed. It's a problem, but VC just isn't built for dependency management.

Since the histories of the dependencies don't get tracked, merging them can be tricky. Especially when dependencies get upgraded separately by multiple people.

ylem · on May 5, 2011

This is a legitimate question--but what if you need to support different versions of open source packages and you need to be sure that they will exist years later. This is something that I'm facing now in that I may be using numpy 1.2.x, but suppose someone else is using numpy 1.3.x and we all need to play well together in a distributed environment--the context needs to be stable. If I don't put this in source control, then where's a good place to stash it so that I can reconstruct the various contexts when I have to migrate to a different server?

Luyt · on May 3, 2011

"In all fairness, VSS was a great tool."

I disagree. Even back in 1995 it had fatal flaws. More than once it got so confused that we had to restore the source tree from backup. There was also the continual problem of files being locked by others when they should not have been, hampering progress and complexities. I was glad when we could kick it out of the door.

sid0 · on May 3, 2011

I'm pretty lucky I got old enough to use version control after fast, reliable and easy-to-use tools like git and hg were created. Every time I have to interact with a CVS repository I'm like "wow, people actually used to use this everyday".

pnathan · on May 3, 2011

Heh, I remember when I used CVS. It was for a senior project. I didn't really get what CVS was, and it was such a pain-

I slammed all the files in the day before and handed them in all at rev 1. Cranky prof.

b0sk · on May 3, 2011

I gave a skim and didn't find this

11. Never commit revisions which can't be built. (In the exceptional cases, always state in the commit message that the code doesn't build.)

bnegreve · on May 3, 2011

This. is wrong. One should be able to commit every little meaningful change he has made, with a equally meaningful commit message. If you wait until everything works you'll end with huge meaningless conflicting commits. Of course it is quite annoying if you use a centralized SCM since everything you commit becomes public. Well this is actually why you should not use centralized SCM.

masklinn · on May 3, 2011

> One should be able to commit every little meaningful change he has made, with a equally meaningful commit message.

If your commit is broken, it's not meaningful, it's just broken.

> If you wait until everything works you'll end with huge meaningless conflicting commits.

Reading comprehension, please. b0sk talked about commits which can not be built. It's a very clear and simple requirement and it definitely does not mean is not feature-complete.

> Of course it is quite annoying if you use a centralized SCM since everything you commit becomes public. Well this is actually why you should not use centralized SCM.

There really is no relation, and a DVCS will not save you when an axe-murderer with a short temper tries to bisect a bug, and you break his bisection because your commits made the whole project un-buildable.

gcr · on May 3, 2011

Let's try this again. Speaking as a git user, my workflow is a little different.

> If your commit is broken, it's not meaningful, it's just broken.

Sometimes I make broken commits just to have something for the written record. Sometimes I rebase them away. Sometimes I ask friends to pull from a broken commit (gasp!) when I need their help to fix my bug. Often times I 'git commit --amend' to fix broken commits before I push. In any case, there's no reason to artificially hide these mistakes as long as the result works.

> There really is no relation, and a DVCS will not save you when an axe-murderer with a short temper tries to bisect a bug, and you break his bisection because your commits made the whole project un-buildable.

Git allows you to automatically skip untestable commits. See http://www.kernel.org/pub/software/scm/git/docs/git-bisect.h...

In my workflow, I usually use a 'master' branch and 'topic' branches. The master must always build, as you say. Topic branches don't -- they're experimental by definition. When a topic is ready to go, we rebase and clean up the commits. This way, we get a traceable, ever-buildable source tree from master and experimental branches when we want them.

Problem?

InclinedPlane · on May 3, 2011

With modern technologies this is definitely a no-no.

With DVCS you can create your own branch and push it to other systems super easily, so there's little excuse for checking in broken code to a shared branch. Also, with a lot of newer VCSes there's often a feature to "shelve" a changeset to a central DB without committing it.

Not to mention that breaking the build for more than a few minutes raises some serious red flags about the way you're working.

ekidd · on May 3, 2011

I agree that breaking the build is normally a very bad idea. I've gone after friends with Nerf weapons for doing this. :-)

But there are a few cases where it's a necessary evil. For example, when upgrading a large Rails application from Rails 2.3 to 3.0, you're likely to make hundreds of small changes before everything works again.

In this case, I create a new branch, prepend "BROKEN:" to each commit message, and record the number of unit tests that are currently passing. Once all the tests are fixed, I hand-test, add new tests for any regressions that weren't caught by the automatic tests, and merge back to the main development branch.

tuxychandru · on May 3, 2011

Why not squash all the broken commits together before pushing to the central repository (assuming you use a DVCS)?

josegonzalez · on May 3, 2011

Because rewriting history is a no-no.

jimktrains2 · on May 4, 2011

:: raises hand ::

Only if no one else ever saw that part of history, right?

:: shaky eyes ::

tomjen3 · on May 3, 2011

Nah that isn't an issue anymore - just don't merge it with the main trunk.

ZeroMinx · on May 3, 2011

I disagree. Get your code updates in there when leaving for the evening/weekend. But, of course, you'll do it in your own project branch that doesn't affect other people.

joeyespo · on May 3, 2011

Even then, I usually comment out partial code then slap on a TODO. And where it makes sense, I'll also add in a "throw NotImplementedException" or some equivalent, so it's explicit that the code isn't expected behavior when executed. It's usually not much effort to do this. Especially when you plan ahead while you write.

There's always repercussions to checking in unbuildable code. Suddenly, nobody else can contribute or pull from that branch without first fixing your bad code, for one. What's worse is when they think it's a mistake and do tweak your code. You'll have to backtrack then, adding noise to the file's history. Another is when you need to roll back. If you allow broken commits, there's always a change of hitting a broken version. In times of emergency, you really don’t want to be bogged down by unbuildable code.

pnathan · on May 3, 2011

Baloney.

That's what branches are for. That's the whole point of committing often.

A disciplined approach to branching will deal with the problem.

sigil · on May 3, 2011

7. Versioning your database isn’t optional

He's talking about versioning schema, not data -- he should have been clearer about this.

alok-g · on May 4, 2011

Where does he say that. I think he <i>is</i> talking about versioning data (and schema as well).

sigil · on May 4, 2011

I had to dig for it, but he references another article he wrote [1] which says:

There is never a reason to use source control to version your data This will be painfully obvious to most people, but I’ve seen it done before, and more than once too. Source control management exists to version, um, source code...

[1] http://www.troyhunt.com/2010/07/rocking-your-sql-source-cont...

mhb · on May 3, 2011

What is a good tool for version control of a Postgres database on an Ubuntu server?

TorKlingberg · on May 3, 2011

This seems to assume that everyone is committing into the same branch. How about the workflow where you create a new branch for each work package and merge it into the main branch when finished? It lets you check in to your own branch as often as you want (even broken code) without worrying about breaking anything for the other devs. The merges will be bigger though, days or weeks of work. A good point is that the branch merge is a natural time to look through all the diffs.

DCoder · on May 3, 2011

Compilation output does not belong in source control

I've seen this before, but never found a suitable alternative. Where does it belong? Suppose multiple developers are compiling a C++ .dll which testers are grabbing through websvn and testing. To track down crashes they get, we need the associated .pdb for the right revision. Where should these files be kept? In a plain folder where each revision gets its own subfolder named as the revision number? That means updating two separate locations with each build, using two different interfaces...

masklinn · on May 3, 2011

> Suppose multiple developers are compiling a C++ .dll which testers are grabbing through websvn and testing.

The DLL is a compilation output, it's not to be in source control.

> To track down crashes they get, we need the associated .pdb for the right revision. Where should these files be kept?

To track down crashes they need the DLL to start with.

Testers should either have the ability to build the project on their machine, or they should be able to grab the output from the CI server (just go to the CI server, open the latest correct (compiled, tested, green) revision, grab files from there and test that.

sliverstorm · on May 3, 2011

The DLL is a compilation output, it's not to be in source control.

Mmm... I've been working in a medium-sized group project, and we've been having pretty good luck with source controlling our driver .lib file. The intermediate object files are discarded, but it's really nice not needing to recompile the library every time somebody updates the library code.

(This is a 'project' that produces a library rather than an executable, for inclusion into other 'projects')

StrawberryFrog · on May 3, 2011

What you want is a CI server that does a build after every checkin and keeps a copy of the output file.

JonnieCache · on May 3, 2011

>Where should these files be kept? In a plain folder where each revision gets its own subfolder named as the revision number?

Yes. Although revision number naming would be problematic if youre using git as it uses SHA1s to identify revisions, naming them with date.branch.author.sha1 might be better.

That means updating two separate locations with each build, using two different interfaces...

It just means having your VCS do the build and create the directory in a post-commit script.

Or you could just actually get a real CI system.

DCoder · on May 3, 2011

Thanks for the ideas. However, in this case the DLL relies on MSVC for binary compatibility with its host, and the version control server is a *nix box.

Xixi · on May 3, 2011

There are tools to solve this problem. And these tools are not source version control.

In my company we setup the following workflow :

-> commit code

-> code is built and tested using Jenkins

-> generated artifacts are stored on Nexus

On top of that, daily we deploy everything on an internal pypi (we do mostly Python). Deploying the code in production is then not much more than running a script that easy_install everything from this internal pypi...

[edit] typo

andos · on May 3, 2011

In a plain folder where each revision gets its own subfolder named as the revision number?

That, yes.

What's the point of versioning a build? After it's built, it will never change. Back it up, of course. But don't version it.

Automate your build and let it do the checking-out, compiling, testing, tagging, copying, emailing and what not, for all eternity.

Killah911 · on May 3, 2011

A CI server solves this issue. We have it set up so that whenever someone commits, the CI Server checks it out, builds, unit tests and then sends nag e-mails to whoever may have inadvertantly broken the build. This also gets rid of the "works on my machine" excuse.

parody_error · on May 3, 2011

"6. You must commit your own changes - you can’t delegate it"

My company does this, and I'm not entirely sure how I feel about this. The motivation for delegating changes is that different groups in our company have different check-in priveleges. This results in half of my changes being committed by me and half the changes being committed by someone else. This does, of course, present some coordination problems though.

sliverstorm · on May 3, 2011

subsequent commit messages from the same author should never be identical

I violate this one sometimes. I'm not a VCS magician, so sometimes I get commit errors, and wind up having to make a second commit. I figure identical commit messages makes it pretty obvious the two commits were intended to be one.

joelhaasnoot · on May 3, 2011

git commit --amend

mikle · on May 3, 2011

Not everyone is using git. I know that the cool kids all do, but some of us still use svn.

joelhaasnoot · on May 3, 2011

See this as another reason you should look into switching. The concept of branches that work well is revolutionary to your workflow.

_tggb · on May 3, 2011

The most useful thing that git has given my workflow is `git stash`. I almost never find myself branching and absolutely hate having to push, pull and pick changes between separate branches.

mikle · on May 4, 2011

I love git myself, but sometimes reality gets in the way.

adolph · on May 3, 2011

Git Immersion has been my friend on this. I went through the set of labs several times (like katas) in order to feel minimally competent to use Git. Lab 19 deals with amend:

http://gitimmersion.com/lab_19.html

grendel · on May 3, 2011

I keep all dependencies in ivy (.Net/C++ shop). If I don't need to hack the source it goes in ivy.

yock · on May 3, 2011

I know this is a tangent and that this story isn't about dependency management, but it's important to me and I've spent a good amount of time trying to understand it completely.

I spent what I thought to be a generous amount of time trying to wrap my head around Ivy and I just couldn't make it work the way I thought it should work. I use Maven, and while I'm not a Maven...er...maven, I can work with it well enough and I'm certainly adept with its dependency management mechanisms. Ivy just seems to have way too many moving pieces and their documentation always seems to be missing key pieces of information that link theory to example. I'm left with an incomplete understanding of how Ivy works and, as a result, I can't use it effectively.