Historical document for amusement value

I wrote the bulk of this a while ago, when I was particularly angry about something (something got corrupted in cvs). I've learned a lot since then.

It seems many of the git advocates object to, or feel slighted by some of the statements I've made. Particularly about workflow models. Rather than make corrections (it is too late now anyway), I'll leave it as-is for amusement value.

It might be worth reading these for more context:

Anyway.. back to the show.

So, why do we need a new VCS anyway?

As described in VersionControl, we need a new VCS. Here's my (biased) take on the subject. Why does my opinion matter? I've been doing this for a while. For the last 13 years, I've been the 'The buck stops here' guy for our repository. I've seen it all. I wrote the rules about what we can and can't do in the repository. I did the hacks to the cvs system to prolong its use for us. I came up with or implemented most of the hair-brained ideas that we live with on a daily basis. When the repo breaks badly, I'm the one up at 4am fixing it after my phone starts ringing. I might be the guy in the shadows, but when you feel the the hair on the back of your neck stand on end for no apparent reason, that's me. :)

On with the show.

Background

FreeBSD grew out of a patch kit to 386BSD. When the patches grew unwieldy and needed more organization, CVS came to the rescue. Rod Grimes kicked this into motion. FreeBSD went to 1.0 then we got caught in the crossfire with the AT&T/USL lawsuits with Berkeley and BSDI. As part of the settlements, we started from scratch. This was a minor blessing in some ways because we got to redo a bunch of things better this time. CVS was still the tool of choice. We started a new repo, and life was good.

Rod started to burn out and was regularly locking horns with the other 17(!) people on core. Right about then, I blissfully stumbled into the arena and Rod sensed and opportunity. Before I sensed the trap, somehow, I ended up taking over the reigns from him sometime in 1994 and I became Mr CVS for the project.

Before then, we never really did much in the way of 'cvs import'. More often than not, people just applied patches to code in the tree. Everybody could cd to /home/ncvs and edit the files directly. And they did. Large chunks of machine generated code was checked in (configure script output, etc). Our patches NEVER went back upstream and people freely hacked everything in sight.

Paul Traina and I came up with the src/contrib thing and made it work. gcc and a bunch of other things got re-imported cleanly in src/contrib. The idea was to keep code in the vendor's preferred layout (vs the grand renames we did before integrating stuff) to more easily track and generate patches relative to the vendor code. The 'hands off src/contrib' and the 'precious' vendor branch stuff is an outgrowth of this effort. (As an aside, the 'precious' status got blown all out of proportion)

I hacked a bunch of other things in there, including the $FreeBSD$ thing, large chunks of the earlier commit mail scripts, access control fixes, etc. At one time adding a committer took 4 or 5 separate changes and usually got screwed up. They had to be manually added to the committers mailing lists. I fixed it so the committers list was in a single place and everything keyed off that.

One of the golden aims we had right from the beginning was to maintain the unified, integrated system as a whole. You had all the basic tools to self build the system, do development work, get current source, and so on. cvs and src/contrib helped us keep that. (at one point, we almost had a version of the ports tree built into the base src/* tree.. that's an interesting story for a conference one day..)

Anyway, CVS has served us well, but it is now VERY old. There are a lot of things it doesn't do that we really need.

What's wrong with CVS, for us

It is a 'death by a thousand paper cuts' thing. All of the problems/quirks/missing bits/etc don't seem too bad on their own. But together, they're a continuous, ongoing drain.

The most fundamental problem is that cvs does not store complete metadata. Particularly involving vendor branches and other branches. When a file is "on" the vendor branch, the rcs 'branch' flag is set to 1.1.1. A random checkout pulls the top of the 1.1.1.x line. When you make the first commit, it clears the default branch, and copies a SECOND set of all the changes from 1.1.1.1 -> 1.1.1.<top> *plus* your change into rev 1.2. It completely forgets to record what point in the 1.1.1.x tree that 1.2 is derived from. When things are simple, something reading the repo can guess based on timestamps, but that is exactly it... a guess. If 'joe' has a pending commit relative to 1.1.1.11 (which would create 1.2), and 'fred' does another import (creating 1.1.1.12), cvs will quite happily let joe commit his fork of 1.1.1.11. The heuristics that guess the parent would mistakenly guess that 1.1.1.12 is the parent. And this is ignoring the cost of duplicating all the deltas from 1.1.1.1 a second time.

Branches have a similar problem, but cvs has introduced some hacks to lessen the pain that this causes. However, we have large chunks of pre-hack commits in there with NO RECORD of when a file was added to a RELENG_* branch and when it left. This is why you can't reliably check out a branch given a date string - cvs doesn't really know, especially for the older branches.

The next problem is utterly critical. CVS's concept of branches is woefully inadequate. It has no memory of what deltas have been merged across a branch. This is no big deal for doing a branch, testing it, then doing a one-time merge into HEAD and then forgetting about it. But if you do 2 or 3 rounds of this, you're in for a whole world of hurt. Your changes will conflict with things you've already merged. Each iteration is worse than before. FreeBSD developers completely gave up on branches in cvs, except for release engineering. That's not the way it was supposed to be.

I can go on for hours about problems with CVS. I'll spare you most of it. More details are here: VCSCvsProblems

DETOUR!

At right about this time, we started getting serious about the SMPng project. One of our superstar SMPng developers who was plowing through this stuff was John Baldwin. He worked on a laptop. Every now and then he'd upload a new patch. He kept losing things. He'd re-break stuff he'd fixed before in previous patches. His code drops were like bombs. You never quite knew what he was up to, only that he was coding like a maniac. Huge WIP's were in progress on his laptop or in his head. This scared me. The situation was utterly vulnerable to the 'get hit by a bus' problem. (If he lost that laptop or got hit by a bus or burgled or whatever, we'd have had a huge setback).

I wanted him to keep his code in public reach. Both so we could all see what he was up to (and possibly help him), and as insurance so that when his laptop crashed and burned after booting a test kernel, that we wouldn't have to wonder if something as going to go missing. We, as a project, needed that stuff to be within reach.

And yet, we couldn't subject him to the pains of the cvs branch model. It would have utterly destroyed his productivity.

We needed a plan B. I had been tinkering with perforce for a while (indeed, perforce was written on FreeBSD at about that time) and thought it might help. I set up a test perforce repository and arranged for auto-imports to happen. I twisted john's arm and got him to keep his WIP's there. Far from destroying his productivity like CVS would have, p4 helped a lot. He didn't have to remember which code changes were in what checked out cvs tree, etc.

It was going so well, that I figured we had possibly found an ideal replacement for CVS as a whole. I coaxed a few more people into using it as a more public playground. It kept working. Then, trouble hit. We got a bunch of developers who just didn't quite get the p4 concept. It was a radically different "model" to what we were used to with cvs. The binary nature of it also was causing political problems. But for the people who liked it and used it, it helped them a lot.

Things that have been done that wouldn't have been possible without p4.. The amd64 port was done entirely in perforce. The 'hammer' branch was merged 625 times. 625!!. 2185 commits have been made on the branch. Doing merges is still as fast as ever.

But there lies the problem. Our cvs tree has about 160,000 unique commits over the last 15 years. Our perforce server has had 132,000 commits over about 6 years. More activity has happened in perforce over those 6 years than has happened than cvs. That is a heck of a lot of development activity that is off on the side, mostly out of public view. Most of the commits are treated with the same high standards for quality and log messages etc as a cvs commit. But it's not "official". That's a real tragedy.

So now what? What's the answer?

CVS is very long in the tooth, but has some things going for it.

But the serious problems include (and there are many), at one time or another (or regularly):

Adding another playground repo on the side mitigates some problems, but doesn't really solve them. It just delays the inevitible. I put it to you that our cvs efforts would have imploded 3-5 years ago if p4 hadn't taken some of the pressure off.

What alternatives?

There are a couple of serious contenders.

There are others, but the most support within our group is for these three.

I won't describe git/hg. Their web pages do a far better job in selling their advantages than I could hope to cover.

For us to switch to svn would be an evolutionary step. We could use it as a better cvs, with the sharp edges fixed. hg and git require more of a revolution in the way we go about things. The differences are as fundamental as the Cathedral and the Bazaar imagery that ESR paints.

I'll say this right now. Revolution isn't bad. But there are some serious downsides to deal with, which don't often get mentioned in our context. I want to point some of them out.

First, git/hg's model is largely alien to the way we've done things from the beginning. We've always had an integrated build, with a common pool of code. There is only one freebsd. (well, there is a rumour that bdebsd might be real, but that's off on a tangent). A feature isn't "real" till it lands in the public tree.

git/hg make it very easy to take stuff offline. All the good aspects of that are well documented, and they are very good points. But the bad thing is that we undo some of the good that we've achieved by getting stuff off people's laptops and into a more public arena (like p4). Encouraging the taking of stuff further offline is going in the wrong direction for *us*. If anything, we need to make it easier for people to get stuff to us and in the tree in some form.

Linus wrote git to suit his needs for linux. He has one thing going for us that we don't. There is a large cult of personality surrounding Linus. There is intense pressure to "validate" your work by getting it approved (directly or by proxy) by Linus. On the other hand, we already have problems extracting work from people. We can't assume that we'll get the same inward flow that Linus gets.

From http://lwn.net/Articles/246381/ - there are some choice quotes. The topic is the problems the KDE folks had making git work for them.

Linus: "Quite frankly, the way git works (tracking whole trees at a time, never single files), that ends up being very painful, because it's an "all or nothing" approach.

Linus: "So I'm hoping that if you guys are seriously considering git, you'd also split up the KDE repository so that it's not one single huge one, but with multiple smaller repositories (ie kdelibs might be one, and each major app would be its own)

Uhh, what? We'd have to split src/* into a mess of subtrees to make git work well?

Linus: "To put this in a KDE perspective: it would make tons and tons of sense to have one central place (kde.org) that most developers know about, and where they would fetch their sources from. But for various reasons (and security is one of them), that may not be the main place where most "core developers" really work.

Having "secret" work areas is a good thing???

Linus: "But what's probably worse, a single large repository will force everybody to always download the whole thing. That does not necessarily mean the whole *history* - git does support the notion of "shallow clones" that just download part of the history - but since git at a very fundamental level tracks the whole tree, it forces you to download the whole "width" of the tree, and you cannot say "I want just the kdelibs part".

So, if we wanted to keep our coherent tree, and use git, then you'd have to clone/branch an entire copy of /usr/src plus metadata just so you can work on a branch of src/bin/ls or src/sys. Not being able to check out src/sys is apparently a design feature. (For what its worth, I don't buy the submodules thing. There's just too much to go wrong for things getting out of sync with each other.)

We're not Linux. A good number of our best supporters stick with us because we're a coherent tree and not like linux' chaos.

Do we really like what Xorg did by splitting into hundreds of modules? That seems to be what a git strategy requires in order to work well. Linus certainly seems to be advocating that as the only "correct" way of doing things.

I don't know if hg is as fundamentally a "single unit" repository like git is, but they've been talking about it for a long time now.

A couple of other pros/cons:

Why do you seem to be pushing subversion?

It's because I am. I think the whole hg/git thing is a distraction. We need something NOW. svn will work for us and gains us some huge benefits immediately.

FWIW, if p4 hadn't come along and worked, we would have been using svn for the last 2-3 years.

What now?

I'm putting my money where my mouth is. I'm building a fully functional svn based strawman to prove the concept of how we could use it as VCS-NG for the project.

And no, I'm not intending to play fair. :)

VCSWhy (last edited 2008-06-17 21:37:23 by localhost)