Since Pragmatic Guide to Subversion went to press, CollabNet has updated their Subversion server. If you want to follow the instructions in the book, use this link to download the previous Subversion server from CollabNet, rather than the link in the book. Make sure you select the “server and client” download for Subversion 1.6. Now follow the instructions in the book.
Archive for the 'Version Control' Category
Things have been a bit quiet on my blog recently, I think I’d better explain myself. Two things have been happening in my life over the last couple of years. Firstly I am now a proud dad with two lovely kids, Ben and Natalie, who are two-and-a-bit and seven months, respectively. To say that having kids is a life changing is an obvious understatement. I won’t gush too much but it has been a wonderful experience. It’s also been exhausting so my capacity for extra stuff reduced quite a lot, hence the lack of activity on the blog.
Secondly, and probably more interesting to those of you who follow my blog, I have been working on a new Subversion book. I’m very happy to announce that Pragmatic Guide to Subversion is now in beta. The new Pragmatic Guide series takes various topics and condenses them down into easily digestible guides and quick references. The original idea was a kind of “pocket guide” series. We have both a Subversion and a Git guide, designed to get someone up to speed quickly on each version control tool. If you have experience with another version control tool (and most developers do already have this knowledge) the guide allows you to quickly transition to Subversion. Each section of the book starts with an introduction that sets the scene, so for those of you who are new to version control you will be able to understand what the book is talking about. Those of you interested more in a complete guide to why we use version control tools should take a look at the original Pragmatic Version Control using Subversion, as that book spends more time on the basics of source control.
I’m especially excited because this is the first multi-platform book I’ve done, covering command-line Subversion, the Tortoise Windows GUI, and the Cornerstone Mac GUI. A lot of people still use the powerful command-line interface but for some tasks the GUI is just way better. History browsing and viewing diffs are just naturally better with a GUI. All of the tasks in the book contain instructions for the command line, Tortoise and Cornerstone. I’ve also been able to incorporate all the latest Subversion features and techniques, the most important of which is probably merge tracking. This is new in Subversion 1.6 and can save you a ton of time if you’re working with a branched code base.
The book is a Pragmatic Bookshelf title and has been released as a beta. The beta process means that readers can get an electronic copy of the book before it goes to the paper presses, as well as getting a good deal on an eventual paper copy should you want one. The beta helps me fix the final few errata in the book and get feedback on other changes that might need to be made. The book is complete and has been through two review cycles so I think you’ll get a lot out of it even though it’s technically not finished yet.
My current client in Calgary recently switched from CVS to Subversion. Our main goal for switching was to fix performance problems with CVS, but we also hoped to get some benefit from the improved features within Subversion. The CVS repositories were on a reasonably beefy Sun box but we’d been seeing “waiting for lock” messages and frequent hanging of our CVS clients. The server didn’t look like it was under load and switching the repositories to their own mount point didn’t fix the problem. One of the teams also wanted to clean up their branching structure–after five years of CVS they were in a bit of a mess.
Our first conversion was straightforward. We wanted to convert a recent project from its existing home in a CVS module into Subversion. We used cvs2svn to do the conversion and ended up with around 6,000 revisions in Subversion. This represented about 18 months of effort from about a dozen developers, and the conversion took about 2 hours to run. The team’s developers had been briefed on the conversion and all checked in to CVS beforehand, then checked out from Subversion once we were ready. The entire team–including business users, analysts and testers–upgraded from TortoiseCVS to TortoiseSVN and pretty much carried straight on with their work.
We got the performance improvement we had hoped for, with a Subversion update taking around ten seconds compared to CVS’ one or two minutes. This is with the repository on the same Sun server, the only thing we needed to do was actually install the Subversion software.
The second conversion was more complicated. We wanted to take a 5GB CVS repository with five years of history and not only upgrade to Subversion, but sort out some branching problems. One of the branches within CVS had started out as a release branch but evolved into its own product maintained by a separate team. We also had a fairly complicated set of branches we didn’t want to include, tags that were no longer worthwhile, etc. We scripted the conversion by customizing the example cvs2svn-example.options file included with cvs2svn to get exactly what we wanted. The big “Eureka!” moment came when we realized that promoting the CVS branch to its own product was really easy once everything was in Subversion. cvs2svn converts branches and puts them into their own directory, but there’s nothing stopping us from moving a directory within Subversion. We simply copied the branch-that-is-a-product into a higher level, mirroring a regular project’s structure, then deleted its old location so developers wouldn’t get confused about which was the right one.
Converting the 5GB, five year old repository took around 16 hours over a weekend. Shuffling directories around once converted took only a few minutes, and we used the excellent TortoiseSVN Repository Browser so all our move operations ran directly against the repository and were lightning fast.
I’m very pleased to announce that the second edition of Pragmatic Version Control Using Subversion has been published and is now shipping. As an author, it’s great to get an opportunity to update a published book, and for there to be enough interest that making an update is worthwhile.
Since the book first came out Subversion has come a long way, from version 1.0 to 1.3, adding new features and making improvements. I’ve also had a bunch of feedback on what people did and didn’t like in the book and this was a good opportunity to add some more content and address some of that feedback.
The book is still very much a guide for using version control in a pragmatic fashion, suitable for people who are new to version control as well as those with prior experience, but the new edition adds some more advanced stuff like programmatic access to a repository, path-based security, and file locking.
It’s my continued pleasure to work with Andy and Dave—if you’re an aspiring author with an idea for a book you should seriously drop them a line. The Pragmatic Programmers’ editorial expertise and publishing system is second-to-none, and best of all you won’t have to write your book using Word!
When setting up Subversion within an organization, folks will often ask “How many repositories should I create?”—my advice is to just create one repository until you have a concrete need for more. I take this approach because it’s easy to split an existing repository into two. I also remind people it’s not the end of the world if they create multiple repositories and then they need to merge them, because Subversion has good support for splitting, merging, and reorganizing repositories. I’ve never really gone into any detail on how you actually do this stuff, but since I recently needed to merge two repositories I thought I’d share the technique I used.
Splitting a repository
First off make sure you tell everyone you’re going to split the repository. The ideal situation is where everyone can check in, go home for the night, leave you to organize stuff, and then come in the next day and start on something fresh. If people can’t commit all their changes you may need to help them relocate their working copy. Once everyone’s committed their changes, close down network access to your repository to be sure no-one’s committing further changes. This might be overkill depending on your situation, but it’s nice to be safe.
Next, back up your repository using svnadmin dump to create a dump file. A dump file is a portable representation of a Subversion repository and something you might be using for backups already. We’re going to load the dump file into a new repository, using svndumpfilter to select just the directories we wish to move to the new repository. A typical transcript might look like this:
[mgm@penguin temp]$ svnadmin dump /home/svnroot/log4rss > log4rss.dump * Dumped revision 0. * Dumped revision 1. : : : * Dumped revision 37. * Dumped revision 38. [mgm@penguin temp]$ mkdir tools-repos [mgm@penguin temp]$ svnadmin create tools-repos [mgm@penguin temp]$ cat log4rss.dump | svndumpfilter include log4rss/trunk/tools | svnadmin load tools-repos Including prefixes: '/log4rss/trunk/tools' Revision 0 committed as 0. Revision 1 committed as 1. Revision 2 committed as 2. : : : <<< Started new transaction, based on original revision 38 ------- Committed revision 38 >>>
In the above sample, I dumped the Log4rss repository into a file called
log4rss.dump and created a new directory called
tools-repos initialized with an empty repository. Then I piped my dump file through
svndumpfilter and told it to include just the tools directory, and piped the result of the filter into
svnadmin load into the new repository. I haven’t included it here, but I got a bunch of information about which items were included in the filter and which were dropped. Now the new
tools-repos repository contains just the tools directory.
At this point, I can make the new repository available and tell developers where to find it. It’s probably also wise to delete the
log4rss/trunk/tools directory from the original repository, just so people can’t accidentally use the old stuff. Subversion doesn’t have an obliterate command so the tools directory is still using space in the old repository—if this is an issue you’ll need to consider loading your dump file into a new repository using an “exclude” command to weed out the directory you no longer want.
Merging two repositories
My current project recently moved from Chicago to Calgary. For a while we had two teams running, using separate Subversion repositories. When everything moved to Calgary, we needed to merge the Chicago team’s code into our repository. We didn’t want to just import the files, we wanted to include historical information too.
We created a dump file of the Chicago team’s repository and loaded it straight into our repository using
svnadmin load. This worked because the load command simply replays a series of commits, simulating what would have happened if the Chicago team had been working with us all along. The key thing to note here is that we had been using different directory paths in the two repositories, so their stuff didn’t conflict with ours. If they had used the same directory structure we would not have been able to simply load their changes into our repository. In that case, we would have had to work some magic with the dump file—it contains plain-text path definitions, so in a pinch we could have munged those path names so they didn’t conflict.
Organizing a repository
Once we’d loaded the Chicago code into our repository we used TortoiseSVN’s graphical repository browser to move the new stuff into our existing directory tree. Here’s a screenshot of the repo browser—it’s a great tool for this kind of thing and made reorganization very simple. We just used the “rename” command to move everything around in the repository, and once done we all checked out the newly organized directory tree and continued working.
My colleague Clinton Begin asked me whether Subversion supports shelving. This is something that the new Visual Studio may have as part of its “Team” features, and is basically a way for a developer to put aside a set of changes come back to them later. Storing shelved changes in your version control tool is pretty sensible—your repository is reliable, backed up, and not liable to disappear if someone pinches your laptop.
So can you do this kind of thing with Subversion? You betcha. Here’s roughly how it would work:
- Whilst working on adding the new “frobscottle” feature Alice decides she’d like to shelve her current working copy changes. Her project, codenamed “xyzzy,” is checked out from
- Needing somewhere to store her changes, Alice branches the trunk to create
- Alice uses the Subversion switch command to switch her working copy from the trunk to the new frobscottle branch. When switching, Subversion preserves any changes you’ve made to the working copy.
- Alice checks in her working copy. The changes will be safely stored under the shelves directory.
- Alice switches her working copy back to the trunk and works on something else. In future if she wants the shelved frobscottle changes she can merge from the branch to her trunk working copy, then commit the changes back into the main code line.
There are a few details you’ll need to get right—you may need to create the new branch from an older revision on the trunk rather than from the head—and it’s less pretty than a “shelve” button in a GUI, but it’ll work great and you’ll understand exactly where your changes actually are.
Pragmatic Version Control using Subversion launched Tuesday night in Calgary. I went down to a local bookstore and spent a few minutes talking about version control, Subversion, and what the book covered. The audience had a bunch of questions about Subversion and I took this as a really good sign—people are doing their own research and wanted to find out more.
Here’s a few of the questions people are asking about Subversion:
How does Subversion compare to other tools? Is there a feature matrix I can look at to decide what tool to use?
Subversion stacks up really well against CVS, fixing the bugs and fragility of CVS whilst keeping the proven development model. Subversion also adds features like change sets, atomic commit, decent networking performance, and a reliable back end. I’m wary of comparisons that read like a school book report, checking boxes if a tool has a particular feature. Those kinds of comparisons always tend to be biased by the person writing them—if you really want to know whether Subversion is right for you try it out on a small project. If things don’t work out you can try something else, if things do work out you’ll know more about the tool and will be better able to roll it out to larger projects.
I’ve heard that Subversion’s database can become corrupted. That doesn’t sound good!
Subversion 1.0 uses the Berkeley DB for storing your files, and this has been a source of some problems. BDB is very reliable when used properly, but unfortunately it’s quite finicky about permissions on its database files. If you set up a Subversion repository, usually on Unix, and have two different users access the repository, if their umask isn’t quite right they can grab control of those database files. This usually happens when you have more than one access mechanism, say svn+ssh as well as Apache. If BDB can’t write to its files it gets stuck, or “wedged.” People often confuse this with database corruption, which has only happened in a few cases and was traced to hardware problems.
Subversion 1.1 introduced the new “fsfs” back end which doesn’t use Berkeley DB and instead uses plain files on disk. This works much better for people using NFS, for example, and helps avoid some of the permissions problems. Most people can stick with BDB as long as they don’t try to mix network servers for Subversion.
Update: As of late 2007, Subversion uses the FSFS back end by default, so this whole wedged repository thing is not usually a problem any more. FSFS is easier to back up too–you don’t need to dump a database, just copy the repository files like they’re regular files.
I’ve heard Subversion supports “meta-data.” What’s that?
Using Subversion, you can attach named data to files and directories. Each name defines a property and properties can have textual or binary content. The nifty thing is that properties are version controlled in exactly the same way as files — Subversion tracks how their contents change over time, and can perform merges, deletes, and updates just like file contents. Subversion uses special properties to do stuff like ignoring certain files in a directory or setting the “execute bit” for a file.
Since properties are editable just like file contents, you could write a tool that used them in some special way. An example often given is a system that stores big graphic files — you could store a thumbnail inside a Subversion property for each file, then use that in your system.
Why would someone spend thousands on a commercial tool when they can get Subversion for free?
This is a good question, and one that I think a lot of people are beginning to ask. In the case of a version control tool, it may be that a company is happier using a product for which they can pay for support – if something goes wrong they can call someone and get it fixed. But open-source software is challenging the notion that you must pay for support. Subversion has an extremely active user community and you can often get a response in minutes, for free.
I like Perforce, and I actually think it’s better than Subversion in certain circumstances (usually when your branching has got out of control and you’re in a bit of a mess). But is Perforce several hundred dollars per head better than Subversion? I think probably not.
So what’s in Subversion that’s not in your new book?
This was actually the toughest question I faced during the book launch and I had to think for a long time before answering. I think the book covers 95% of Subversion’s features, and easily covers everything you’ll need when using Subversion on a typical project. I couldn’t cover all the advanced usages of Subversion, but I think having read the book you’ll be able to adapt what’s in there to cover any new situation you face.
The book sticks to the Subversion command line, and only covers GUI tools briefly, so you’ll need to experiment a little to figure out how Tortoise works, for example. I think it’s useful to understand what a GUI is doing “under the hood,” so I don’t see this as a serious omission. The book also doesn’t cover IDE integration because those tools are still evolving rapidly.
It looks like we’ll be doing another print run of the book, so be sure to get a copy of the first printing before we correct the typos!
Update: In addition to further print runs we did a second edition of the book, updated to include new features introduced in Subversion and additional information on IDE integration and programmatic access to Subversion repositories.