Rewriting the past, and the woes of SVN

Long ago, I wrote a little Bash script — set-properties — for the Wesnoth-UMC-Dev project, to ensure the correctness of SVN properties such as svn:keywords and svn:executable on files. It was pretty simple:

#! /bin/sh
# Set properties on PNG files
for f in $(find -iname *.png); do
svn propdel svn:executable $f
svn propset svn:mime-type image/png $f
done
# Set properties on Ogg files
for f in $(find -iname *.ogg); do
svn propdel svn:executable $f
svn propset svn:mime-type audio/x-vorbis
done
# Set properties on PCM files
for f in $(find -iname *.wav); do
svn propdel svn:executable $f
svn propset svn:mime-type audio/x-wav
done
# Set properties on JPEG files
for f in $(find -iname *.jpg); do
svn propdel svn:executable $f
svn propset svn:mime-type image/jpeg $f
done
for f in $(find -iname *.jpe); do
svn propdel svn:executable $f
svn propset svn:mime-type image/jpeg $f
done
for f in $(find -iname *.jpeg); do
svn propdel svn:executable $f
svn propset svn:mime-type image/jpeg $f
done
# Set properties on CFG file
for f in $(find -iname *.cfg); do
svn propdel svn:executable $f
done
# Set properties on scripts
for f in $(find -iname *.sh); do
svn propset svn:executable '*' $f
done
for f in $(find -iname *.cmd); do
svn propset svn:executable '*' $f
done
for f in $(find -iname *.bat); do
svn propset svn:executable '*' $f
done
for f in $(find -iname *.py); do
svn propset svn:executable '*' $f
done

Some time after I learned Perl with my work on Shikadibot 0314, I rewrote that script in Perl to arrange the “ideal” property values in a neat table (hash), check current properties instead of blindly overwriting them in the working copy, and cover plenty of other file types. It also gained a blinking progress bar to display the search progress for some reason.

To have an idea of how known file types are defined in the source, let's take a look at these bits:

#
# proptab:
# extension => [svn:executable, svn:mime-type, svn:eol-style, svn:keywords]
# properties set to the empty string '' (except svn:executable) are left unchanged;
#
my %proptab = (
cfg => [FALSE, '', 'native', ''],
ign => [FALSE, '', 'native', ''],
"map" => [FALSE, '', 'native', ''],
# [...]
pl => [TRUE, '', 'native', 'Author Date Id Revision'],
# [...]
gif => [FALSE, 'image/gif', '', ''],
png => [FALSE, 'image/png', '', ''],
# [...]
xcf => [FALSE, '', '', ''],
# [...]
);

The table we have used since then (around Sept. 2008) has always contained more than 10 extensions with their minimum required property sets. As of this writing, it covers approximately 57 file types. Keep this on mind.

It would be overkill to fork-exec find processes to discover paths that could require SVN property changes, right? So, instead, I used find2perl to generate File::Find client code to embed it into set-properties. So far, so good. But how about running that code (scalar keys %proptab) times (e.g. number-of-extensions-times) anyway? Overkill?

No! It's plain stupid. But definitely less stupid than what you are about to read.

I apparently decided, for some reason, that any matching paths in each cycle should be added to a plain scalar (a text string to be exact) separating individual paths with newlines, of all things. Then, another cycle is performed at the end, (scalar keys %proptab) times again, to put each array of paths matching a certain extension into another hash, then iterating over the newly inserted hash element (array of paths) checking and fixing SVN properties in the same cycle.

Very roughly summarized as the following pseudocode:

FOREACH extension FROM file_extensions
    FIND IN ./ AS file
        ; all while displaying a cute blinking bar!
        IF file MATCHES extension
            APPEND file TO file_list_string
        END IF
    END FIND
END FOREACH

FOREACH extension FROM file_extensions
    ; split 'file_list_string' every newline
    FOREACH path FROM (SPLIT /\n/ file_list_string)
        INSERT path IN file_index[extension]
    END FOREACH

    FOREACH path FROM file_index[extension]
        fix_properties( extension, path )
    END FOREACH
END FOREACH

Careful readers will quickly realize that something is horribly wrong with this pseudocode. I wish I was making it up. This algorithm has actually been in use by the Wesnoth-UMC-Dev set-properties script for one year and 4 months! I must have been on something when I wrote this shit. I've honestly never seen any program so awful as this in my life. Not even build-external-archive.sh (a.k.a. “Scrappy”) can compete with this abomination.

So, yesterday, I took a look at set-properties after noticing how much CPU time it ate working with very scarcely populated directories of the project — and I was blaming the usage of backticks (svn propset foo bar baz and such) as a possible cause of overhead. Then I slowly realized what I had written. There isn't any emoticon here for the expression in my face at that moment.

Thus, set-properties got rewritten with a much cleaner and simpler algorithm:

FOREACH file FROM (FIND IN ./)
    ; no more useless cute blinking bar!
    FOREACH extension FROM file_extensions
        IF file MATCHES extension
            fix_properties( extension, path )
        END IF
    END FOREACH
END FOREACH

So yeah. 🙁

But wait, there's more! While optimizing replacing the script I also replaced svn foobar backtick code with invocations of libsvn, via the SVN::Client module. This worked very well at the end, but I discovered a few things in the process:

  • Those SVN::Client methods I used choke on non-absolute path specifications for some reason, causing an assertion failure in libsvn's C back-end and terminating the execution of Perl and the script with a SIGABRT.
  • Despite the documentation's claims for SVN::Client::url_from_path() returning undef if the specified path is not under version control, it actually causes the module to invoke die and get the client script terminated. Which means that I cannot even check if a file is versioned or not safely (e.g. without resorting to .svn/text-base/<FileName>.svn-base existence checks). What the hell?
$ set-properties
perl: /tmp/buildd/subversion-1.6.9dfsg/subversion/libsvn_subr/path.c:114: svn_path_join: Assertion `svn_path_is_canonical(base, pool)' failed.
Aborted

Turns out the solution is to wrap SVN::Client method calls in eval blocks and handle whatever crap SVN comes up with. Oh, and make sure all paths are absolute using Cwd::realpath so that libsvn doesn't hit an assertion failure killing us, eval or no eval. How nice.

# No point in working with unversioned files.
my $svn_ret = undef;
eval { $svn_ret = $svn->url_from_path($path) };
if($@) {
# fucking libsvn dies if $path isn't under version control;
# the documentation says it should return undef above instead!
return;
}

With this, I have absolutely lost my faith in Subversion's excellence as a version control system not only as a normal user, but also as a programmer integrating it into my own client applications. And I know I lost my faith as a user after seeing svn sit for two days doing fucking nothing because the damned connection died after 1 minute of running time — and then svn spent the next days ignoring SIGTERMs all the time, no less. It's been like this for so many versions that I'm almost convinced it's intentional.

(Ah, that was relaxing. I should do this more often.)

Wesnoth-UMC-Dev has to continue using SVN because we have several add-on authors using Windows and the few alternatives for using git (I love git, I don't think I could try anything else at the moment) on Windows seem to be rather awkward to install and/or use for the average Non-Computer Person. That's a pity. It's really a pity.

Finally, set-properties got renamed to umcpropfix for a change, to mark its rebirth after I solved the algorithmic mess above last night. It doesn't have a nifty codename like the rest, though, but the new umc-prefixed name is still something to be celebrated now that we are going to have umcdist, umcstat, umcreg and umcbotd. Yes, I know I'm crazy, thank you.