Category Archives: Balisage

2016

As 2015 draws to a close, I’m thinking of 2016 and specifically these highlights:

  • The Hateful Eight in 70mm. I haven’t bothered booking Star Wars yet, but there’s no way I’m not going to see Tarantino’s 70mm epic the way it was meant to, in 70mm at the Imperial in Copenhagen. Yes, I know, the roadshow is coming to Stockholm, too, but the Danes still know how to run 70mm shows while the Swedes don’t. Sorry.
  • Göteborg Film Festival. Yes, I’m going to spend another 11 days in a dark projection booth, hitting Play at three-hour intervals.
  • XML Prague. I’ve submitted a paper, but I’m also peer-reviewing other people’s papers as I’m now part of the Program Committee. The conference is in February, starting on a Thursday (the 13th) rather than a Friday and ending on a Saturday rather than a Sunday, allowing you and your better half to enjoy Prague on Valentine’s Day. Get your festival passes now, folks.
  • Balisage is between August 1-5. I’m definitely going; a year without Balisage would just be too weird.

Balisage 2015

It’s the first day of Balisage (I missed the pre-conference symposium, sadly), and it’s a lot like a markup holiday. It’s great meeting old friends and new, and the two talks so far promise another great conference.

Mr Smith Goes to Washington

My paper submission to this year’s Balisage conference was accepted. It’s about an eXist implementation I did for the Swedish Federation of Farmers (LRF), and while I may not be completely objective, I think the system is very cool. From the conference blurb:

The Federation of Swedish Farmers (LRF) provides its 170,000 members with a web-based service to check compliance with state and EU farming regulations. These checklists are also produced nightly both as generic checklists with more than 130 pages and as individualised checklists for registered members. The system consists of an eXist database coupled with oXygen Author. The checklists and their related contents are edited, stored, and processed, published as PDFs, and exported to the SQL database which stores member registration, feeds the website, and does various other tasks. The system uses XQuery, XSLT, XInclude modularization, an extended XLink linkbase, and other markup technologies. It currently handles more than 40,000 PDF documents a year and many more than that in the web-based forms.

This is the second version of the LRF system. The first, presented at XML Prague in 2013, was XProc-based and represented my somewhat naive trust in the state of XProc in eXist, The new one I rewrote in XQuery, having tested (and failed miserably at using) the XProc module that is now available. XProc in eXist, sadly, is not yet ready for prime time.

Be as it may, I’m really pleased about both the system and my paper. and hope to see you there.

On Reflection

Having reread my recent post on HTML5, I suppose I’d better stress the fact that it was never meant to be a commentary on HTML5 itself. It was a rant, something that happened because of the attitudes I think have increased in tandem with HTML5’s growing popularity. I really don’t know enough HTML5 to have an informed opinion beyond what I see suggested and discussed about it that is related to markup in general. My comments should be read in that light.

Take the addition of document semantics in HTML5 as a case in point. For example, the article and section tags are welcomed additions, as are, in my humble opinion, the fairly minor redefinitions of the em and strong semantics. And there’s some important work being done in the area of extensible semantics for the web (see, for example, Robin Berjon’s XML Prague paper, Distributed Extensibility: Finally Done Right? and the Web Components web page on best practices), which turned out to be a heated topic at Balisage this year because quite a few of its participants are, like me, grumpy old men defending their own turf.

These are steps in the right direction, because they move away from the presentational horror that is the “old” HTML and to a more semantic web. Semantics is about meaning, and meaning is now being added to the web rather than simply empty but oh-so-cool visuals. I should add that some very cool visuals are being added, too, but in, and please pardon the joke, a meaningful way.

But, and maybe this is just me, it’s when those steps are being heralded as original and unique, never thought of before or at least never done right, when history and past (and working) solutions are ignored or misinterpreted because they are part of a standard (XML or even SGML) that is regarded as failed, when I react. Google’s Dominic Denicola provided a case in point when he held a controversial presentation on the subject called Non-Extensible Markup Language at Balisage; unfortunately, only the abstract seems to be available at their website.

That grumpy old men thing, above, is meant as a joke, of course, but I imagine there to be some truth in it. Part of the HTML5 crowd will certainly see it that way because they are trying to solve a very practical problem using a very pragmatic standard. HTML5 is, in part, about keeping old things working while adding new features, and it seems to do the job well. Having to listen to some older markup geeks argue about what XML was actually designed to do must seem to be as being utterly beside the point.

So, what to do? Well, I think it’s largely about education, both for the newer guys to read up on the historical stuff, and the older guys to try to understand why HTML5 is happening the way it is, and then attempting to meet halfway because I’m pretty sure it will benefit both.

Me, I’m in the midst of the reading up phase, and HTML5 – The Missing Manual is an important part of that.

I Should Probably…

Following this year’s Balisage conference, I should probably do a rewrite and update of the whitepaper I presented. It’s on my list of things to do.

On the other hand, I should do an eXist implementation of the version handling system I suggested in that paper. It’s also on my list of things to do.

But then again, I still have to finish my ProXist implementation, the one I presented at XML Prague. It is (you guessed it) on my list.

I have lots of good (well, I think so) ideas first presented at XML conferences, many of which deserve (well, I think so) to live beyond them. After all, a lot of work goes into the papers and the presentations, so shouldn’t I follow up on them more?

My version handling system, for example, should be easy enough to do in eXist. It doesn’t require a revolution, it doesn’t require me to learn how to code in Java, it should be enough to spend a couple of nights writing XQueries and XSLT to produce a usable first version.

ProXist is both simpler and more difficult to finish, but on the other hand, the basic application is already out there and works. It’s a question of rewriting it to be easier for others to test, which basically means redoing it as a standard eXist app.

Yet, instead of doing something about either of them, here I am, writing this blog post. It’s conference procrastination taken to yet another level.

And the next XML Prague deadline is coming up fast.

Submitted My Final Balisage Edit

I submitted the final edit of my Balisage paper, Multilevel Versioning for XML Documents, the other day. While I did try to shorten it (I seem to be unable to produce a short paper) and, of course, correct problems and mistakes pointed out by reviewers, there were no radical changes, and so I am forced to draw one of two possible conclusions:

I am deluded and simply don’t know what I’m talking about. This is an awful feeling and happens to me a lot after submitting papers.

The paper suggests something that might actually work.

(There is a third conclusion, obviously, one that is a mix of the two, but let’s not go there.)

My paper is about a simple versioning scheme for the eXist XML database, built on top of the versioning extension that ships with it. Its main purpose is to provide granularity to versioning, to provide an author of XML documents with a method to recognise significant new versions as opposed to the long series of saves, each of which comprises a new eXist version.

On the surface of it, my scheme is a typical multilevel versioning system,with integers, decimals, centecimals, etc (1, 1.1, 1.1.1, 1.1.2, 1.1.3, 1.2, …) identifying a level of granularity. The idea is that the lowest level (centecimal, in this case) denotes actual edits while the levels above identify significant new versions. Nothing new or fancy, in other words. What is new (to me, at least; I have not seen this suggested elsewhere) is how the scheme is handled in eXist.

I’m proposing that each level is handled in its separate collection, each using eXist’s versioning extension to keep track of new versions in the respective collections. When a level change occurs (for example, if a new centecimal version such as 1.3.1 is created from 1.3), the new version is created using a simple copy operation from the decimal collection to the centecimal collection. The operation itself (in this case, a check-out from a decimal version to a centecimal version) is kept track of using an XML file logging each such operation and mapping the eXist “standard” version to the new integer, decimal or centecimal revision.

A related task for the XML file is to map the name of the resource to its address; the XML file’s other big purpose is to provide the resources with a naming abstraction so a named resource in a specific significant version can be mapped to an address on the system. I propose using URNs, but most naming conventions should work just as well.

Implementation-wise, the XML version map abstraction is very attractive to me as a non-programmer (or rather, someone whose toolkit of programming languages is mostly restricted to those commonly associated with XML technologies), as I believe most of the operations can be implemented in XSLT and XQuery.

But I’m not there yet. I’ve submitted the final paper and now, I have to produce a sufficiently convincing presentation on the subject.

The presentation is on Tuesday, August 5th, and I’d love to see you there.

Just A Few More Days

The Balisage paper deadline is approaching fast. Deadline is on Good Friday, which is three days from now and close enough to keep me busy until some decidedly ungodly hours come Friday night. Basically I’m interpreting “April 18” as “early morning, April 19, GMT”.

My paper is, so far, a study in procrastination. There is an angle bracket or two, and possibly even a semi-intelligent thought, but most of all my long days so far remind me more of my uni days and approaching exams, when any excuse from sorting spoons (I had two) to counting dust bunnies was enough to keep me away from the books, than a serious commitment to markup theory and practice.

It is coming along, though.

ProXist and My XML Prague Paper

I recently submitted the final version of my XML Prague whitepaper about my eXist implementation of ProX, called ProXist (with apologies for the tacky name). While I’m generally pleased with the paper, the actual demo implementation I am going to present at the conference is not quite finished yet and I wish I had another week to fill in the missing parts.

Most of the ProXist stuff works but there are still some dots to connect. For example, something that currently occupies the philosophical part of my brain has to do with how to run the ProX wrapper process, the one that configures the child process that actually does stuff to the input. ProX, so far, has been very much about automation and about things happening behind the scenes, and so I have aimed for as few end user steps as possible.

My Balisage ProX demo was a simple wrapper pipeline that did what it did in one go. Everything was fitted inside that pipeline: selecting the input, configuring the process that is to be applied to the input in an XForm, postprocessing the configured process and converting it to a script that will run the child process, running the child process, saving the results. Everything.

But the other day, while working on the eXist version and toying with its web application development IDE, it dawned on me that there doesn’t have to be a single unified wrapper process. If its components are presented on a web page and every one of them includes logic to check if the information from a required step is available or not (for example, a simple check to confirm that an input has been chosen before the process can be configured), they don’t have to be explicitly connected.

The web page that presents the components (mainly, selecting input and configuring the process to be applied on the input) then becomes an implicit wrapper. The user reads the page and the presentation order and the input checks are enough. There is no longer a need a unified wrapper process.

Now, you may think this is obvious, and I have to admit that it now seems obvious to me, too. But I sometimes find it to move from one mindset (for example, that automation bit I mentioned, above) to another (such as the situation at hand, the actual environment I implement things in) as easily as I would like. If this is because I’m getting older or if it’s who I am, I don’t know. In this particular case, I was so convinced that the unified wrapper was the way to go that it got in the way of a better solution.

At least I think it’s a better solution. If it isn’t, hopefully I can change my mind again and in time.

See you at XML Prague.