01 May 2008

Subversion and Filesystem Metadata

I’ve been using Subversion a lot lately, and for the most part I’m pretty floored with it. It’s a huge step up from CVS, and it offers a lot of flexibility, beyond what I’ve ever seen in commercial version-control products. Plus, you can’t beat the price.

However, there are a few things that have irked me. One of the biggest is that SVN doesn’t preserve filesystem metadata, particularly document modification times. Apparently this is by design. (‘Why’ isn’t exactly clear, but supposedly has to do with automated build tools.) But to me, filesystem metadata – modification stamps in particular – is fairly important, and I’m not really happy with any tool just blithely throwing it away, as SVN does when you import a folder into version control and then check out a working copy.

As a sort of half-assed solution, I wrote a couple of little scripts to pull the file access and modification times from the filesystem, and store them in SVN as “properties” associated to that particular document. (Since Subversion lets you store as many key:value pairs for each document as you’d like, in many ways it’s superior to most commonly-used disk filesystems … it just doesn’t bother putting much stuff in there by default. Bit of a wasted opportunity.) Although this isn’t as useful as having it actually in the filesystem, it at least ensures that no metadata is destroyed when you load files into version control. To me, the idea of not ever destroying data or context information is important. I like knowing that if I ever need to know the last modification time of a document prior to loading it into version control, it’s all there.

Due to the mechanics of Subversion, the use of these scripts is a little roundabout. It’s a multistep process:

  1. Import the directory you want to version-control into the Subversion repository. Don’t delete it!

  2. Checkout the directory, giving it a name different from the ‘original’ copy. (I like to name it something like “directory-svn”.)

  3. Copy – using your preferred CLI or GUI method – all the files from the old, non-version-controlled directory to the working directory. Clobber all the files in the working directory.

    [Why? This overwrites all the files in the working directory – which have their atime, ctime, and mtime set to whenever you checked the directory out (not really that useful) – with the original files, which have useful timestamps on them that actually correspond to the data in the logical files.]

    N.B.: You need to copy the files from one directory to another; don’t overwrite one directory with the other. If you do the latter, you’ll wipe out the “.svn” directory in the working directory, and it’ll no longer be a functioning SVN checkout.

  4. Now that you have a version-controlled working directory full of files with useful timestamps (run ‘ls -al’ if you want to check; that’ll show you the mtime), you can run the script below. This will take the ctime, mtime, and atime and copy them into SVN properties (named “ctime”, “mtime”, and “atime” respectively). Run ‘svn commit’ to write these changes to the repository.

  5. When you check out the working directory onto a new computer, you still won’t have the right metadata actually written into the filesystem, but you will have it in the properties. To view the properties associated with a file, run ‘svn proplist –verbose filename’.

Not as good as if SVN just respected and didn’t destroy filesystem metadata by default, but it’s better than nothing. On the system that originally housed the data, your files still have all the correct values stored in the filesystem (since we copied them from the old, non-version-controlled directory), and on other systems, you’ll be able to retrieve the file’s original timestamps using ‘proplist’.

Here’s the script for Mac OS X (and probably BSD?):

#!/bin/bash
# A little script to take modification date/time and stick it
# into a Subversion property

for file in *
   do
   mtime=`stat -f %Sm "$file"`
   svn propset mtime "$mtime" "$file"
   ctime=`stat -f %Sc "$file"`
   svn propset ctime "$ctime" "$file"
   atime=`stat -f %Sa "$file"`
   svn propset atime "$atime" "$file"
done
exit 0

And on Linux it’s the same, except the syntax differs slightly:

for file in *
do
   mtime=`stat --format %y "$file"`
   svn propset mtime "$mtime" "$file"
   ctime=`stat --format %z "$file"`
   svn propset ctime "$ctime" "$file"
   atime=`stat --format %x "$file"`
   svn propset atime "$atime" "$file"
done

At the moment I’m just concentrating on archiving some of my documents and shoving them into SVN – this has the advantage both of getting them in version control, and also putting them on a central server where I can easily back them up – so I’m satisfied with just sticking the original file’s timestamps into SVN properties for archival purposes. Obviously, the stamps don’t get updated as you modify the file, so they’re really just for historical purposes.

What would be nice would be to fix Subversion so that on import, it collected as much metadata information as it can about a file and stuck it into the properties, and then took this information and used it to recreate the files on checkout (only if you wanted it to, of course, or perhaps if the file only had one version in the repo, meaning it hadn’t been modified since being added). That’s a bit beyond both my abilities and level of interest at the moment, but it seems like a useful feature, particularly as more and more non-programmers start to discover Subversion and how useful it can be for managing home directories and other lightweight content-management tasks.

This entry was converted from an older version of the site; if desired, it can be viewed in its original format.