Thu, 20th Dec 2007 (HowDidIDoThat :: LaTeX)
Document Revision System
I've decided to start using a document revision system for my papers. My current ad hoc method is getting seriously strained; exchanging copies with collaborators, merging their comments with my revisions, preparing different versions for different places (archives, journals, home pages); it was all getting too confusing.
So I had a look around on the web and found a paper (also available as a wikibook) promoting the use of subversion. This is actually a software revision system. However, the general principle is the same: a typical LaTeX paper consists of several text files and this is exactly what a typical piece of software consists of and thus what software revision systems are designed to track.
There are several different systems around and it's not clear which is the "right" one. Subversion certainly seems better than CVS, but there's also GNU Arch and its variants. I had a quick play with subversion, but am going to give bazaar a serious go.
Stage One: Getting Started
My first attempt at using bazaar involves an already-existing document. I want to store it as a bazaar archive without destroying any information that I currently have. I have several versions of a paper. Schematically, they can be represented as follows.
So, I have a main branch together with some other branches. Real updates will go in the main branch and the other branches contain variants suitable for other purposes. I also will want to take snapshots when I do something significant, like submit it to a journal.
So we start by creating a new repository. Repositories in bazaar are for projects with shared histories, so we create a new one for each paper.
(You did know that I use
So far, so good. Next step is to add the next version to the main branch.
Okay, now we want to branch this version as the public versions are minor modifications of this one. So we go back up to the top directory and create a new branch, two in fact.
Now we copy across the overlaid versions.
As you may notice, the version for the archive had an extra file; in this case holding some meta data that the archive wants.
Now we continue this process until all of the files in the original directory have been added to the bazaar archive in their correct order and their correct relationship.
When we get a version that we want to tag, we do so:
Stage Two: Line Wrapping
Line wrapping is an issue that comes up with document revision. It is mentioned in the article referenced at the top of this page. Systems such as bazaar are designed primarily to work with source code. This is (usually) in the form of text files and these are constrained somewhat in their format (some more than others). To determine whether two files differ, the usual method is to compare them line by line, with the lines themselves being compared as a whole. That is, no finer distinction is made than whether the lines are the same or not. This is generally fine for source code. For a document this can cause problems with regard to line wrapping. Consider a long paragraph that begins "And so to bed." and imagine that you decide to change it to "Consequently, one retired to ones bedroom and laid oneself upon ones usual resting place.". Using a text editor that does hard line wrapping, this change will probably knock several words on to the next line, and the next, and probably for several more lines. All of these lines will display as changed when doing a comparison. This is clearly not what is desired.
The solution appears to be to use soft line wrapping. The distinction being that hard line wraps are written to the file but soft line wraps only appear in the editor and are removed when written to the file. Unfortunately, this makes the problem worse! If there are no line breaks in the file then the whole document is one line and that one line is displayed if there are any changes at all.
So the real solution is to take note of what line breaks are for. Within a LaTeX document, a line break is usually simply whitespace (two line breaks denote a paragraph ending). So, apart from doubles, line breaks are irrelevant for LaTeX and can be used for something else. Within an editor, we can use soft line wrapping to make the text easy to read and ignore hard line breaks (except for doubles). So who does use line breaks? From above we see that it is the version software. Essentially, the line breaks are used to determine the context of a change, namely if I change "And so to bed." to "Consequently to bed." what should I see when later I want to know what I changed? The line breaks tell the version software how much information around the change should be displayed or recorded.
So when writing a LaTeX file, we should insert a line break to separate out context. As a side point, this is consistent with TeX's use of double line breaks to denote the end of a paragraph since a new paragraph should certainly designate a new context. A reasonable set of rules might be:
Note that this advice is contained in the paper where I originally got this idea. As mentioned there, there are other advantages to using such a system.
The exact set of rules is not important. What is important is to choose a set of rules and stick to it. Changing rules mid project is a Bad Idea.
This produces a problem, though. What if I have a file that is badly formatted? Perhaps I'm doing the initial import in the manner laid out above and I didn't pay attention to line wrapping when I originally wrote it? Or perhaps I made a quick edit in an editor that doesn't understand the difference between hard and soft line wraps and I want to fix it before committing it.
The answer? A
If, like me, you use Emacs for your editor then you should use
Stage Three: Deciding on a Strategy
This probably ought to be Stage One, but I tend to find it easier to play with a system a little before deciding exactly how I'm going to use it. I've decided to base my system on the "Team collaboration, central style" workflow described in the Bazaar documentation. My reason for this is that in general I am the only person directly editing files, even in collaborative work, but I sometimes work on different computers.
So I store the repositories in a "central" location that I have read and write access to from anywhere on the internet. One advantage of bazaar here is that this central location doesn't have to have bazaar installed. At work, I have direct access to this location (via NFS) whilst elsewhere I get access via ssh. So my workflow is now as follows.
Then repeat steps 2 and 3 until the paper (or whatever, I'm using this for lots of things now) is finished.
Here's a list of bazaar commands that I use (or think I will use) a lot.
Tags and commits on a checkout get sent straight to the central repository.