Splitting, Merging, and Organizing a Subversion Repository
When setting up Subversion within an organization, folks will often ask “How many repositories should I create?”—my advice is to just create one repository until you have a concrete need for more. I take this approach because it’s easy to split an existing repository into two. I also remind people it’s not the end of the world if they create multiple repositories and then they need to merge them, because Subversion has good support for splitting, merging, and reorganizing repositories. I’ve never really gone into any detail on how you actually do this stuff, but since I recently needed to merge two repositories I thought I’d share the technique I used.
Splitting a repository
First off make sure you tell everyone you’re going to split the repository. The ideal situation is where everyone can check in, go home for the night, leave you to organize stuff, and then come in the next day and start on something fresh. If people can’t commit all their changes you may need to help them relocate their working copy. Once everyone’s committed their changes, close down network access to your repository to be sure no-one’s committing further changes. This might be overkill depending on your situation, but it’s nice to be safe.
Next, back up your repository using svnadmin dump to create a dump file. A dump file is a portable representation of a Subversion repository and something you might be using for backups already. We’re going to load the dump file into a new repository, using svndumpfilter to select just the directories we wish to move to the new repository. A typical transcript might look like this:
[mgm@penguin temp]$ svnadmin dump /home/svnroot/log4rss > log4rss.dump
* Dumped revision 0.
* Dumped revision 1.
    :     :     :
* Dumped revision 37.
* Dumped revision 38.
[mgm@penguin temp]$ mkdir tools-repos
[mgm@penguin temp]$ svnadmin create tools-repos
[mgm@penguin temp]$ cat log4rss.dump | svndumpfilter include log4rss/trunk/tools | svnadmin load tools-repos
Including prefixes:
   '/log4rss/trunk/tools'
Revision 0 committed as 0.
Revision 1 committed as 1.
Revision 2 committed as 2.
    :     :     :
<<< Started new transaction, based on original revision 38
------- Committed revision 38 >>>
In the above sample, I dumped the Log4rss repository into a file called log4rss.dump and created a new directory called tools-repos initialized with an empty repository. Then I piped my dump file through svndumpfilter and told it to include just the tools directory, and piped the result of the filter into svnadmin load into the new repository. I haven’t included it here, but I got a bunch of information about which items were included in the filter and which were dropped. Now the new tools-repos repository contains just the tools directory.
At this point, I can make the new repository available and tell developers where to find it. It’s probably also wise to delete the log4rss/trunk/tools directory from the original repository, just so people can’t accidentally use the old stuff. Subversion doesn’t have an obliterate command so the tools directory is still using space in the old repository—if this is an issue you’ll need to consider loading your dump file into a new repository using an “exclude” command to weed out the directory you no longer want.
Merging two repositories
My current project recently moved from Chicago to Calgary. For a while we had two teams running, using separate Subversion repositories. When everything moved to Calgary, we needed to merge the Chicago team’s code into our repository. We didn’t want to just import the files, we wanted to include historical information too.
We created a dump file of the Chicago team’s repository and loaded it straight into our repository using svnadmin load. This worked because the load command simply replays a series of commits, simulating what would have happened if the Chicago team had been working with us all along. The key thing to note here is that we had been using different directory paths in the two repositories, so their stuff didn’t conflict with ours. If they had used the same directory structure we would not have been able to simply load their changes into our repository. In that case, we would have had to work some magic with the dump file—it contains plain-text path definitions, so in a pinch we could have munged those path names so they didn’t conflict.
Organizing a repository
Once we’d loaded the Chicago code into our repository we used TortoiseSVN’s graphical repository browser to move the new stuff into our existing directory tree. Here’s a screenshot of the repo browser—it’s a great tool for this kind of thing and made reorganization very simple. We just used the “rename” command to move everything around in the repository, and once done we all checked out the newly organized directory tree and continued working.
 
mike on October 19th 2005 in Version Control
