Tuesday, 30 April 2013

Documentation and Version Control

In this post I'm going to take a look at the version control requirements for storing and archiving documentation. It's worth considering what those requirements are, because they are not identical to the version control requirements for developing application code. Documentation requires many, but not all, of the features offered by classic developer-oriented revision control systems. On the other hand, many commercial Content Management Systems (CMS) do not offer the kind of flexibility that is required to maintain a professional documentation repository.

Here are the features I consider essential for a documentation-friendly revision control system:
  • Resolving collisions
  • Atomic commits
  • Revert/undo commits
  • Diffs between commits
  • Branching
  • Sub-projects
And here are some features I consider nice-to-have:
  • Merging branches
  • Cherry picking
And, finally, a non-requirement:
  • Tagging

Resolving Collisions

If there is more than one person on your docs team, it is reasonable to suppose that, sooner or later, you are going to collaborate on a document. In this case, it would be supremely annoying if you both happen to submit changes to the same file at the same time, and the documentation system simply over-writes one of the versions. The documentation system must therefore have some means of detecting and resolving such collisions. This is especially important if writers want to work offline (and they usually do), because collisions are then more likely to occur.

Revision control systems usually tackle this problem in one of two ways. Either by locking the file (so that one writer gets exclusive rights to update the file for as long as the lock is held) or by merging (where the revision control system requires you to merge the changes previously made by other users before you can upload your own changes).

Atomic Commits

Atomic commit means that you can submit multiple changes to multiple files in a single commit step. This has several advantages:
  • The commit log is much less cluttered, because you can group several changes into a single commit entry. 
  • It helps to keep the docs in a consistent state. For example, if you change the ID of a link target, this might break one or more cross-references and it might take multiple subsequent changes to multiple files to fix all of the broken links. If you can roll all of these changes into a single commit, you can ensure that the cross-references remain unbroken, both before and after the commit.

Revert/Undo Commits

From time-to-time we all make mistakes, so the capability to undo a commit is a welcome feature. Strictly speaking, Git does not allow you to undo a commit, but it enables you to commit the inverse of a previous commit, which amounts to the same thing.

Diffs between Commits

Diffs between commits are every bit as useful for technical writers as they are for developers. They enable you to keep track of changes made by your collaborators; and they enable you keep track of your own changes. In fact, the ability to make diffs between commits is one of the major reasons for keeping a version history in the first place.

Branching

There are various ways you can put branches to good use in a revision control system. The most important application of branching, from a documentation perspective, is for tracking past versions of the documentation.

In a documentation repository, it is natural to create a separate branch for each release. So, for example, you might have branches for versions 1.0, 1.1, 1.2, 1.3, 2.0, and 2.1 of your documentation. Anytime you need to go back, say, to fix an error or to add a release note to an earlier version, all you need to do is to check out the relevant branch, make the updates, and re-publish the docs from that branch. Moreover, sometimes fixes or enhancements made to an earlier version can also be applied to the current version (or vice versa) and it is particularly nice, if you have the option of cherry-picking the updates between branches.

This is a basic requirement, if you intend to do any maintenance at all of older documentation versions (and given that your customers are likely to have support agreements for the older products, it seems pretty much inevitable).


Sub-Projects

In a complex product, it is likely that you will need to use sub-projects at some point (that is, a mechanism that enables you to combine several repositories into a single repository). This can become necessary, if a product consists of multiple sub-products, so that the corresponding library is created by composing several sub-libraries.

The kind of mechanisms you can use to implement sub-projects include svn:external references in SVN or submodules in Git.

Although Git is, in most respects, a wonderful revision control system, its implementation of submodules does suffer from a couple of drawbacks:

  • You cannot mount an arbitrary sub-directory of the external sub-project in the parent project (like you can do in SVN), only the root directory.
  • Whenever you update the parent directory (for example, by doing a git pull), Git does not automatically update the submodules. This is probably the correct policy for working with application code, where you need to be conservative about updating dependencies, in case you break the code. But in the case of documentation, you normally want the submodules to point at the latest commit in the referenced branch. It is a nuisance to have to constantly update the submodules manually and then check those updates into the parent project.

The fundamental reason why sub-projects are needed is because sub-products naturally evolve at different rates, and you normally need to pick specific versions of the sub-products to assemble a complex product. Using a sub-project mechanism enables you to mix and match sub-product versions at will. (You might think it is also possible to lump all of the sub-products into a single repository, but this has the serious limitation that you can only work with a single combination of product versions. If you also need to release another product that uses a different combination of sub-product versions, this approach becomes completely unworkable.)


Merging Branches

I hesitated before putting merging branches into the nice-to-have category. You might prefer to categorise it as must-have, and I won't argue with you. But if you don't have a merge capability, I think you can mostly work around it, in the context of documentation. The most important use of branches in documentation is for tracking different versions of a library and these kind of branches would normally never need to be merged.

Just because the capability to merge branches is not an absolute necessity, it does not mean that you do not need a merge capability at all. You certainly need to be able to merge commits in order to resolve conflicts between different users working on the same branch.

Cherry-Picking

Cherry-picking is the ability to apply the same commit to more than one branch. In Git, for example, the procedure is incredibly easy. You just make the changes to one branch; commit them; then check out another branch and apply the commit to this branch as well (in my Git UI, I can right-click on a commit and select Cherry Pick to apply it to the currently checked out branch).

Tagging

Contrary to what you might think, tagging is a not really necessary in a documentation repository.

For years, myself and my team-mates dutifully tagged the repository every time a documentation library was released. In the early years, we did it in SVN and now we are doing it in Git. But recently I realised that we never used any of these revision tags, not even once.

This is because, in practice, you are only ever interested in accessing the tip of a branch, not the tagged commit. For example, if I create a branch for a 1.3 release, the tip of this branch will always have the version of the docs that I want to use for re-publishing the 1.3 docs. If I correct some errors in the docs, update the release notes with some patch information, and so on, this will always be available at the tip of the branch. The tag that might have been created the first time the library was released is of absolutely no interest: it references an earlier commit in the branch, which is now out of date.

2 comments:

  1. Thanks for the insight! Document version control feature is really worth for documentation.

    ReplyDelete
  2. Nice article. Loved your post on Documentation. I am using Paperlez for my documentation and office automation process.
    If you want you can check out this, maybe you will find something you like.

    Keep up the good work.

    ReplyDelete