I submitted the final edit of my Balisage paper, Multilevel Versioning for XML Documents, the other day. While I did try to shorten it (I seem to be unable to produce a short paper) and, of course, correct problems and mistakes pointed out by reviewers, there were no radical changes, and so I am forced to draw one of two possible conclusions:
I am deluded and simply don’t know what I’m talking about. This is an awful feeling and happens to me a lot after submitting papers.
The paper suggests something that might actually work.
(There is a third conclusion, obviously, one that is a mix of the two, but let’s not go there.)
My paper is about a simple versioning scheme for the eXist XML database, built on top of the versioning extension that ships with it. Its main purpose is to provide granularity to versioning, to provide an author of XML documents with a method to recognise significant new versions as opposed to the long series of saves, each of which comprises a new eXist version.
On the surface of it, my scheme is a typical multilevel versioning system,with integers, decimals, centecimals, etc (1, 1.1, 1.1.1, 1.1.2, 1.1.3, 1.2, …) identifying a level of granularity. The idea is that the lowest level (centecimal, in this case) denotes actual edits while the levels above identify significant new versions. Nothing new or fancy, in other words. What is new (to me, at least; I have not seen this suggested elsewhere) is how the scheme is handled in eXist.
I’m proposing that each level is handled in its separate collection, each using eXist’s versioning extension to keep track of new versions in the respective collections. When a level change occurs (for example, if a new centecimal version such as 1.3.1 is created from 1.3), the new version is created using a simple copy operation from the decimal collection to the centecimal collection. The operation itself (in this case, a check-out from a decimal version to a centecimal version) is kept track of using an XML file logging each such operation and mapping the eXist “standard” version to the new integer, decimal or centecimal revision.
A related task for the XML file is to map the name of the resource to its address; the XML file’s other big purpose is to provide the resources with a naming abstraction so a named resource in a specific significant version can be mapped to an address on the system. I propose using URNs, but most naming conventions should work just as well.
Implementation-wise, the XML version map abstraction is very attractive to me as a non-programmer (or rather, someone whose toolkit of programming languages is mostly restricted to those commonly associated with XML technologies), as I believe most of the operations can be implemented in XSLT and XQuery.
But I’m not there yet. I’ve submitted the final paper and now, I have to produce a sufficiently convincing presentation on the subject.
The presentation is on Tuesday, August 5th, and I’d love to see you there.