Extract multiple partial directories from a Subversion repository

Subversion almost got me today.

Scenario:

Multiple parts or subdirectories of an existing Subversion repository should be synced to an exposed Subversion server to give someone access to the code in this directories – only those directories and only readonly. So bear in mind that this solution is only working one way (and downhill with tailwind).

First attempt:

Since Subversion 1.5 partial sync of a repository is possible. This sounds promising but after some searching in the documentations I throw away this plan because only one subdirectory can be synced to a mirror repository at a time. As I had multiple to sync, this solution would never gonna work. More information this can be found here (“Q: Can I mirror a subdirectory of a master repository?”). I spare you the details on creating a synced repository here because there is plenty of guides and documentations out there.

Second attempt:

My second attempt was just dumping the source repository and use svndumpfilter to only include the directories which should be exposed later on. The size and number of commits aren’t the issue here but in repositories files and directories tend to get moved, renamed, deleted and so on. For the dump, you have to include all those special sources which changed over time to the include list of svndumpfilter and this gets really frustrating after you started the 10th time. Also the standard svndumpfilter doesn’t work very well in some cases. Therefor some people implemented svndumpfilter in Python which should help solving those problems. You can find it here. But still after trying out various options of svndumpfilter3 I wasn’t able to cleanly export the parts I needed. So I started scripting and came up with clearly not the best solution, but a solution.

As every missing directory or file needed a new command line option it is fairly boring to run the script, open it, add the option, save and rerun it.

Here comes my first script (dump-testrepo.sh) into work. It runs the svnadmin dump and the svndumpfilter3 combination, reads the output and automatically adds the missing parts to itself.

#!/bin/bash

newline=`(svnadmin dump /srv/subversion/repos/source | /usr/bin/svndumpfilter3 \
include trunk/modules/subpath2 | gzip > testdump.svndump.gz) 2>&1 >/dev/null | grep "svndumpfilter3: Invalid" | awk -F\' '{print $2;}'`

if [ ! -z "$newline" ];
then
echo "modifying export script to add line: "
echo $newline
echo

sed -i -r "s#(\s{4}include.*?gzip.*)#    include $newline \\\\\n\1#" $0

echo "### RUN $0 again!"
fi

The second script (dumprunner.sh) is in fact only a while loop which calls the first script, which changes in every run, until the svnadmin dump command exits without any missing parts.

#!/bin/bash

while true
do
echo "still running"

./dump-testrepo.sh | grep "RUN"
if [ $? -eq 1 ];
then
break
fi

sleep 1
done

At the import I hit the next bump by getting errors like this:

<<< Started new transaction, based on original revision 646
 svnadmin: File not found: transaction ’0-1′, path ‘some/path/File.java’
 * adding path : some/path …<em>
</em>

The explanation of this can be found here. As stated in the linked blogpost you have to create the missing subdirectories. This is done like this (I’ve added a svn mkdir line for every missing path):

#!/bin/bash

svnadmin create /srv/subversion/repos/testrepo
svn mkdir --parents file:///srv/subversion/repos/testrepo/some/path -m "empty dir"

svnadmin load -q /srv/subversion/repos/testrepo < testdump.svndump

Now you should hopefully have a new shiny repository with only the needed parts.

Leave a Reply

Your email address will not be published. Required fields are marked *