Problems with CVS that are hurting us

A summary of the CVS problems that are hurting us yet may not be immediately obvious to the casual observer.

Lack of branch basepoint information

CVS doesn't record when a branch began. You have to use heuristics to guess exactly when it was. CVS itself doesn't know either, and avoids the problem by simply duplicating the entire branch contents.

Consider vendor branch imports. Begin with a rev 1.1.1.1, 1.1.1.2 and 1.1.1.3 import. If you then edit the file on the mainline, then rev 1.2 is created. CVS records this as a delta from 1.1.1.1 -> 1.2. It never records where 1.2 actually came from. It might have been 1.1.1.2 or (most likely) 1.1.1.3. If you'd checked out an older revision when you did the commit, cvs wouldn't have warned you and would have just used 1.1.1.2 as the base even though 1.1.1.3 existed and had later code.

This is mostly harmless, until you try and reconstruct the history. Things that mechanically parse the repo will guess solely based on the date. And it is just that, a guess.

This is both the reason why we go to great lengths to avoid touching vendor branch code (stop the duplication of rcs deltas), and never put files back on the vendor branch - there would be no way to ever guess that the mainline existed based solely on a 'date' spec.

Ambiguous tagging practices

If 1.5 is HEAD, and we create RELENG_7, then a "magic" tag is created - RELENG_7: 1.5.0.2. ("Magic" is another term for "illegal" in a RCS file. Since RCS won't have created it, then cvs knows that it must have.) This means that commits to RELENG_7 will have deltas in 1.5.2.x. (remove the illegal .0. and shift up).

This is fine.. suppose there are commits on the branch, and 1.5.2.1 is the top of RELENG_7 when we release. Then RELENG_7_0_RELEASE will be 1.5.2.1. It is obvious that this is on the branch.

However. If no commits took place, RELENG_7_0_RELEASE can't be tagged as '1.5.0.2' because there isn't really a corresponding rcs revision. So the tag gets put on the 1.5 instead. How is one supposed to tell for sure if "1.5" is a mainline or a branch revision? CVS repository parsers guess this using heuristics. They won't know our convention that "RELENG_7_0_RELEASE" really happened on "RELENG_7"

It is more complicated than that because now we create release branches. RELENG_7_0 would be branched from RELENG_7, and RELENG_7_0_0_RELEASE would be on RELENG_7_0. Except that we didn't do that in the past. Before RELENG_4_4_0_RELEASE or so, this intermediate branch didn't exist.

No merge history

Plenty has been said about this already.

In a nutshell, if you create a branch, and implement feature X, then merge it into the mainline, all is well. Assume that somebody tweaks the deltas (fixes a typo) in the mainline. If you then do more work on the branch for 'X' and merge your new changes, cvs gets confused. It tries to merge your original 'X' changes again, and the new mods. For simple trees this is usually harmless because cvs detects the delta is duplicated and skips it. But remember that typo fix? that causes a conflict that you have to resolve.

Suppose you fix the same typo in your branch.. Then you do some more work on the feature and attempt to re-merge into mainline. CVS still hasn't forgiven you, you've got that same conflict again with the mainline typo fix. Those deltas will never cleanly apply again. After a few iterations of this, even if you persist and fix it each time, the branch becomes useless. This makes the branch development model really difficult to deal with.

There are workarounds, eg: you can tag your branch at the merge points and use them to merge differences that way, but all those tags are exported to the world. Each time you do it, you touch all the files... It doesn't scale well, to say the least. It requires great discipline to do this on an ongoing basis.

Contrast this to something like perforce. The 'hammer' branch was where I did the amd64 port. There have been 2203 commits on the branch, and 628 merges. I don't have a count for the number of times I've exported a patch to cvs and committed there. With a real VCS, branch development becomes trivially easy. It scales well, and most importantly, it becomes easier to abandon something that isn't working out the way you hoped.

The loss of effective branch development is hurting us in a huge way. The various SMPng / KSE / threading / scheduling / etc fiascos that lead to massive ongoing instability might never have hit the tree so prematurely if we'd had an effective branch model.

Adding perforce alongside cvs has mitigated some of the pressure and has generally given better quality code drops, ""but"" it isn't widely used by people that should be using it, and we lose (or marginalize) the history that is developed "over there". We need to bring this model out of second class status and into first class.

Primitive client

cvs has to constantly scan the checkout trees extensively. The server has to do tag searches to find out if a tag really exists or not. It has to scan every single file to find out if a tag operation is about to cause a conflict. A 'p4 sync' or 'svn update' on an up-to-date tree takes a moment (generally a second or so). cvs has to crawl the tree. It has to upload a shadow of the tree to the cvs server and compare. This is an unnecessary drain on time.

cvs has absolutely no offline support, except for having an externally maintained complete copy of the repository.

No changesets

Tracking down all parts of a commit is a PITA to say the least. There is no cross referencing. If you want to find all parts of a commit for a MFC, cvs won't tell you. You'd better have a copy of the diff/patch you applied, or have some other means of generating the diff with external tools.

repocopy??

Enough said. The fact that we have to go behind the back of the system is bad enough. Besides the turn-around time delays, the bigger problem is that it pollutes the chronological history for files. For example, if you check out src/sys/amd64 as of 1994, you see a bunch of files - long before "amd64" really existed. I never want to hear the word "repocopy" again!

Every day, we lose a little more

CVS might be helping us "get by", but every time we touch a branch, vendor code, import, etc, we have lost a little more information.

Sooner or later, we will have to change. The longer we leave it, the bigger the mess is that cvs leaves us.

I have trouble remembering what was branched from what back in 1999 in order to correct the guesses that the cvs repo parsers come up with. Because of the ambiguous quirks above, they usually get it wrong.

Do you remember if RELENG_5_0_RELEASE was really from HEAD or RELENG_5? When was RELENG_5 created? Before 5.3 or 5.4? Did we have RELENG_4_3? or did we start doing that with RELENG_4_4? The cvs2xxx tools all got this wrong.

Back to VCSWhy

VCSCvsProblems (last edited 2008-06-17 21:37:49 by localhost)