Category Archives: XML Prague

ProXist

I’ve been working on an eXist-based implementation of my XProc abstraction layer, ProX, hoping to have something that runs before XML Prague, next month. It turns out that the paper I submitted on the subject has been accepted, so now I guess I just have to.

The ProX implementation should not be terribly complicated to finish, but until recently it risked to be rather hackish (is that a word?) because the XMLCalabash eXist module written by Jim Fuller was rather primitive: it would only support pointing out the pipeline to run and one, hard-coded output port. I foresaw a more or less complete rewrite of the ProX wrapper in XQuery.

Luckily, Jim very graciously agreed to rewrite his module into something more immediately usable. I received the first updated module to test in December and the most recent update just a few days ago. He also found a bug in Calabash’s URI handling and sent a first fix to me along with the updated module. There are still things to do but for me, Christmas came really early this year.

Oh, and I’m calling the implementation, and the paper, ProXist. Sorry about that.

Micro XML and Namespaces

Micro XML is an attempt by James Clark, John Cowan and Uche Ogbuji to simplify XML and get rid of all that extra baggage that currently surrounds it. DOCTYPE and PIs are both removed, UTF-8 is mandatory, draconian error handling is no longer a must, and–perhaps most controversially–namespaces are gone, too.

Uche Ogbuji held a brilliant talk about Micro XML at the recent XML Prague 2013 conference, so rather than reiterating his arguments, I suggest you watch the presentation once it’s made available at the XML Prague website.

What I did want to comment about is this namespaces business. Of everything proposed in the Micro XML spec, the removal of namespaces is clearly the most controversial, as indicated by the many tweets following Uche’s talk. But should you be upset? I mean, really?

I’ve done some fair bit of XML stuff involving namespaces lately (yes, I know, there’s no way to avoid it, really). There’s a Relax NG compact schema that I wrote that uses several, including a default “”. There are conversions from external XSD-based XML to that Relax NG-based XML using XSLT 2.0, and there are conversions from the Relax NG schema to (an obviously not namespace-aware) DTD to satisfy the needs of an editor that does not know what Relax NG is. (And I can’t bring myself to write XSDs; they are the spawn of Satan.) And there are XProc-based pipelines that glue these things together, and they obviously need to be aware of the namespaces in addition to the ones they use themselves.

Lots of namespaces, in other words. And I’m not exaggerating when I tell you that a vast majority of the problems I had and the weirdness I encountered had to do with namespaces.

Nothing coming out from the transformation? A forgotten implied default namespace in the source XML. Namespace declarations in the target XML messing up validation? That same default namespace. The wrong prefix for the XLink namespace in the target XML? No explicit namespace declaration in the source. An unwanted and disallowed XLink namespace declaration being complained about in the root element of an XML document in the process of being checked out from a repository? A web service helpfully adding a seemingly missing namespace declaration to a root element into content in a SOAP envelope, resulting in a document that could not be opened but that did not show any problems in the repository itself, only on its way out…

These are just a few select examples from my plight, and while I may have some of the details slightly wrong here, you probably get the idea. The list goes on.

And why is this all happening? Because someone at some point thought that wouldn’t it be nice if you could share your XML with everyone on the globe with no risk of name collisions and clashing semantics? Wouldn’t it be cool if the conflicting schemas could all be identified using a URI? We could have a throwaway name prefix attached to that URI and implement processing that could hide the prefix for the end user, simplifying things further…

Of course, that someone’s idea of backwards compatibility was simply that to a DTD, the revolution would be hidden in an extra attribute and an element type name containing a colon.

The fact is that I have yet to be helped by namespaces when using XML from the other side of the globe. In fact, I have yet to encounter a situation where I need to process unknown XML where potential clashes in semantics can do harm without me spotting the problem well in advance and taking care of it. The fact is that I don’t often need to use XML from the other side of the globe, out of the blue. It tends to happen in a context, in a controlled manner.

But when I do process that XML, knowing full well the source semantics and how they can map to my needs, it is always the namespaces that cause me grief.

Namespaces are among the least understood features of modern-day XML and among the most abused. The tools range from helpful to disastrous to completely ignorant or just plain wrong, and there are as many reasons for this as there are XML parser implementations out there. You know right from the start that you will have problems, so you’d better resupply the medicine cabinet well in advance or get ready for that headache.

So, Micro XML? Yes, please. Now?

XML Prague 2013

Somewhat surprisingly, the XML Prague 2013 paper I mentioned in an earlier post was accepted. Considering how little time I had to write it (“writing” is probably a bit of a stretch, “drafting” is more to the point), I have to say I’m extremely pleased. I’m very much looking forward to presenting it.

I’m going to talk about the eXist-based publishing solution I’ve been busy doing for a client. It began as a humble PDF-on-demand service but came to include a lot of stuff I find cool in and slightly outside the world of XML. There’s XProc, XQuery, RelaxNG, the process XML abstraction I have been working on, XML authoring, nightly mirroring from SQL databases to eXist, and more. And it all seems to come together quite well. I’ve had fun working with all this so I’m hoping it might be of interest to others, too.

XML Prague, of course, is worth a visit regardless. Think of it as an XML weekend about cool new things frequently starting with an “X”, interesting people, Czech hospitality (including Czech beer), and one of my favourite cities, Prague.

XML Prague Whitepaper Woes

Why is it that every year, I promise myself to finish my (XML Prague and otherwise) whitepapers early in order to avoid spending the last few nights before a deadline writing furiously but always end up doing just that, very frequently having to share whatever little time that remains with customer projects, family engagements and various Christmas preparations, seeing that yes, Christmas arrives at around the same time this year as every other?

The Final (?) Take on Film Markup Language

As some of you may know, I sometimes project films at the Draken cinema when I’m not busy doing XML stuff. Also, as I’ve noted before, film projection is moving from analogue to digital and it’s all happening very, very fast. The commercial cinemas, multiplexes all of them, now run films on hot-swap hard drives in servers coupled with ugly digital projectors, and the one remaining 35mm cinema, an art house, is rumoured to close soon.

So today, after a call from the city council’s school cinema group, I started thinking and realised that while I did consider the advent of all things digital when I first wrote Film Markup Language, even updating the DTD to include some rudimentary support for 2k and 4k projection for my 2010 presentation on it in Prague, it’s too late to actually modernise the DTD or the spec for what’s actually going on today.

See, the digital thingies do use XML. It’s inconsistent and looks like some weird kind of committee hack, though, the kind of XML you might find in Java config files, but it’s XML and it seems to be enough. So, Film Markup Language is dead for all practical purposes.

It’s kind of sad.

Balisage 2012

I’ll be presenting a paper at Balisage 2012 in Montréal, Canada, in August. For those of you who have no idea of what I’m talking about, Balisage is is a conference on markup, a sister conference to XML Prague, and, together with the latter, a markup geek’s wet dream. The conference is not just about XML (although quite naturally, XML takes up a lot of space), there are all kinds of topics related to markup theory and practice, including all those semantics you really can’t formalise using XML.

Balisage, along with XML Prague, is also a conference where the discussions that inevitably follow the presentations are actually on topic and intelligent. It’s a very humbling experience to stand before a crowd of experts that can and will spot any flaws you might have in your slides, suggest improvements you never thought of and generally offer valuable insights. It’s a forum for learning, whether you are a presenter or a part of the audience.

I’m really, really looking forward to August.

Back from XML Prague

I’m back from this year’s edition of XML Prague, my favourite markup geek event. As always, there’s plenty to praise, from Jeni Tennison’s opening keynote to Michael Sperberg-McQueen’s closing one and pretty much everything in between, from the friendly organisers to MarkLogic’s demojam event at the social dinner, the city itself, and, well, everyhting.

But what really gives me my yearly high is the fact that the event is always so much more than simply the sum of the above. We get to interact and learn from fellow markup enthusiasts, we meet with some of our favourite tool producers (who also are markup enthusiasts, btw) and other pros in the field, and we are once again refuelled and energised and inspired, and ready to do more when back home. Every year.

Don’t you think that’s amazing?

Digital Shows, FML and XML

Ran my second DCP show at Draken, earlier. The film is stored and handled by a Dolby server running a modified Debian Linux with XCF as the window manager producing a lightweight interface with only the bare necessities, but very, very functional necessities. There is drag and drop to handle show components, there are ready-made cues, and it’s all reasonably well designed. Every time I use the touchpad/keyboard combo to build or run a show, I’m struck by how similar to my Film Markup Language concepts everything is. I presented my ideas at XML Prague in 2010 but after that, I couldn’t make much headway with the hardware so the project sort of died.

Supposedly, the shows are indeed handled using XML files. I was planning something very much like Dolby’s interface so I’m dying to know if their XML is anything like my DTD. The components are all there so I’m half hoping it is. I bet they don’t use XLink, though.

Semantic Profiles

Following my earlier post on semantic documents, I’ve given the subject some thought. In fact, I wrote a paper on a related subject and submitted it to XML Prague for next year’s conference. The paper wasn’t accepted (in all fairness, the paper was off-topic for the themes for the event), but I think the concept is both important and useful.

Briefly, the paper is about profiling XML content. The basics are well known and very frequently used: you profile a node by placing a condition on it. That condition, expressed using an attribute, is then compared to a publishing context defined using a similar condition on the root. If met, the node is included; if not, the node is discarded.

The matching is done with a simple string comparison but the mechanism can be made a lot more advance by, say, imposing Boolean logic on the condition. You need to match something like A AND B AND NOT(C), or the node is discarded. Etc.

The problem is that in the real world, the conditions, the string values, usually represent actual product names or variants, or perhaps an intended reader category. They can be used not only for string matching but for including content inline by using the condition attribute contents as variable text: a product variant, expressed as a string in an attribute in an EMPTY element, can easily be expanded in the resulting publication to provide specific content to personalise the document.

Which is fine and well, until the product variant label or the product itself is changed and the documents need to be updated to reflect this. All kinds of annoyances result, from having to convert values in legacy documents to not being able to do so (because the change is not compatible with the existing documents). Think about it:

If you have a condition “A” and a number of legacy documents using that condition, and need to update the name of the product variant to “B”, you need to update those existing documents accordingly, changing “A” to “B” everywhere. Problem is, someone owning the old product variant “A” now needs to accept documentation for a renamed product “B”. It’s done all the time but still causes confusion.

Or worse, if the change to “B” affects functionality and not just the name itself, you’ll have to add “B” to the list of conditions instead of renaming “A”, which in turn means that even if most of the existing documentation could be reused for both “A” and “B”, it can’t because there is no way to know. You’ll have to add “B” whenever you need to include a node, old or new.

This, in my considered opinion, happens because of the following:

  • The name, the condition, is used directly, both as a condition and as a value.
  • Conditions are not version handled. If “B” is a new version of “A”, then say so.

My solution? Use an abstraction layer. Define a semantic profile, a basic meaning for the condition, and version handle that profile, updating it when there is a change to the condition. The change could be a simple name change for the corresponding product but it could just as well be a change to the product’s functionality. Doesn’t really matter. A significant change will always requires a new version. Then, represent that semantic profile with a value used when publishing.

Since I like URNs, I think URNs are a terrific way to go. It’s easy to define a suitable URN schema that includes versioning and use the URN string as the condition when filtering, but the URN’s corresponding value as expanded content. In the paper, I suggest some simple ways to do this, including an out-of-line profiling mechanism that is pretty much what the XLink spec included years ago.

Using abstraction layers in profiling is hardly a new approach, then, but it’s not being used, not to my knowledge, and I think it should. I fully intend to.