JNRowe

No more SCMs, just rsync trees please!

A plea for an end to the needless number of SCMs one needs to have installed

This is a simple plea for rsync source mirrors from this user at his wit's end.

Of the external Open Source projects I actively contribute to three use Subversion, two use CVS, two use git, one uses monotone, one uses darcs and one other uses Mercurial. I need all of these installed and in nearly every instance for absolutely no reason at all.

Unless I have commit access I'm pretty much relegated to a read-only tree, although with modern SCMs I can at least manage local branches against a remote read-only tree and merge across updates. With the Subversion and CVS trees I'm stuck relying on their appalling network support just to pull a read-only tree, one that would be considerably more network efficient without all the backage both bring.

Just to pull a tree I have to install large packages, often with large untested dependencies. At least for CVS, it is just the well tested and small-ish CVS package. For Subversion I need apr, apr-util, neon and the absolutely massive Subversion package. For monotone it is behemoth that is boost, along with the monotone package. And as for darcs, the most annoying in my opinion, it is ghc which isn't even ported to two of the platforms I regularly develop on and the surprisingly small darcs package. For git it is just the git package itself, but its portability is definitely questionable even across UNIX systems. And for Mercurial it is a Python install and the incredibly small Mercurial package. Every single one of them requires yet more code review if I want them installed on boxes at work, and yet more specific tool knowledge for almost zero long or short term gain.

Note

I have to admit to not being all that worried about Mercurial here, because I like it and I'd have it installed for my own projects anyway. And if I wasn't skipping dependencies that are already pulled in by valuable packages that list would be much much bigger.

Note

And you can add bzr to that list for one new package I'm working on. bzr requires paramiko as well as ElementTree/cElementTree_(with Python 2.4 or lower). And don't get me started on the horrendously slow clone time with bzr.

I'm lucky I have access to a tool, written by a colleague, which updates Subversion trees to specific revisions without any of the subversion code or support packages. It is only a small Ruby script, and it performs as fast as the real Subversion client in benchmarks often using less processor time to boot. And for CVS trees we host converted mirrors where possible for internal developer access, and each one can be synced with rsync. Not everyone is this lucky however, and I may not be in the future(who knows what the future will bring).

Please, please consider offering simple rsync trees for your packages. Most contributors don't have commit access and receive no benefit from having to use your choice of SCM to pull a tree. You can gain here too, I know people who have decided against contributing patches when told to create against the development tree which required one particular SCM(the same one each time) because it just costs too much. I'm yet to get to that point, but I've been close several times but with different SCMs on each occasion.

Really, although it is much less of a problem with current designs like Mercurial, git, bzr and monotone it still applies here. Right now not many people have the tools installed, and they may only be looking to fix a few bugs. The barrier is significantly higher if you require them to install your choice of SCM just to create a patch or two. Of course, there is a negative with these tools for minor contributors too and that is the size of initial clones, even if you can manage your changes against a tree with upstream merging capabilities.

As far as CVS and Subversion are concerned using the SCM to pull a tree is just plain ridiculous unless you have commit access, all read-only users just end up with heaps of totally unusable metadata and significantly longer checkout and update times than a simple rsync.

For what it is worth I'm doing my part, from August all our Open Source packages will have rsync source mirrors. Then you will be able to pull any tree, and use your patch management tools to make changes without jumping through all the hoops of $flavour_of_the_month_SCM if you wish. I'm also hoping to get permission to open up access to some of the packages we mirror, so other people don't have to endure the pain of needless SCM pollution if at all possible.

Update on 2008-02-09

For some reason I've received five independent mails concerning this page in the last month all with pretty much the same point. And that point is "if you have a SCM checkout you can use it to see the changelog for a piece of code or annotate a file to see when a bug first appeared, and they are absolutely right about this. The history of project is very important when you're fixing bugs or adding features, it helps you to find where bugs appeared, understand the process and gauge the expectations of a project's developers.

However, you can also use the project's viewvc, or better yet trac if available, to view the project history. And if they're using git or Mercurial you may be able to use their really cool built-in web interfaces to walk through the repository.

The odd thing about four of the mails is they included SVN commands as examples for how they check a project's history, and if there is a case that proves my point this is it. You're already forced to have the network up to see history with Subversion, in which case you can just fire up your browser. I could almost see the case if you're suggesting browsing history with a distributed system(like Tony Malone did), but definitely not with a centralised system.

While I'd love to be proved wrong on this, and I do believe I may be wrong because there aren't hundreds of other people bringing this point up, arguing that more network round trips with your read-only tree is a better interface than trac where you can browse the history and then directly file your bug isn't the winning argument here.

Return to Top