ProXist and My XML Prague Paper

I recently submitted the final version of my XML Prague whitepaper about my eXist implementation of ProX, called ProXist (with apologies for the tacky name). While I’m generally pleased with the paper, the actual demo implementation I am going to present at the conference is not quite finished yet and I wish I had another week to fill in the missing parts.

Most of the ProXist stuff works but there are still some dots to connect. For example, something that currently occupies the philosophical part of my brain has to do with how to run the ProX wrapper process, the one that configures the child process that actually does stuff to the input. ProX, so far, has been very much about automation and about things happening behind the scenes, and so I have aimed for as few end user steps as possible.

My Balisage ProX demo was a simple wrapper pipeline that did what it did in one go. Everything was fitted inside that pipeline: selecting the input, configuring the process that is to be applied to the input in an XForm, postprocessing the configured process and converting it to a script that will run the child process, running the child process, saving the results. Everything.

But the other day, while working on the eXist version and toying with its web application development IDE, it dawned on me that there doesn’t have to be a single unified wrapper process. If its components are presented on a web page and every one of them includes logic to check if the information from a required step is available or not (for example, a simple check to confirm that an input has been chosen before the process can be configured), they don’t have to be explicitly connected.

The web page that presents the components (mainly, selecting input and configuring the process to be applied on the input) then becomes an implicit wrapper. The user reads the page and the presentation order and the input checks are enough. There is no longer a need a unified wrapper process.

Now, you may think this is obvious, and I have to admit that it now seems obvious to me, too. But I sometimes find it to move from one mindset (for example, that automation bit I mentioned, above) to another (such as the situation at hand, the actual environment I implement things in) as easily as I would like. If this is because I’m getting older or if it’s who I am, I don’t know. In this particular case, I was so convinced that the unified wrapper was the way to go that it got in the way of a better solution.

At least I think it’s a better solution. If it isn’t, hopefully I can change my mind again and in time.

See you at XML Prague.

ProXist

I’ve been working on an eXist-based implementation of my XProc abstraction layer, ProX, hoping to have something that runs before XML Prague, next month. It turns out that the paper I submitted on the subject has been accepted, so now I guess I just have to.

The ProX implementation should not be terribly complicated to finish, but until recently it risked to be rather hackish (is that a word?) because the XMLCalabash eXist module written by Jim Fuller was rather primitive: it would only support pointing out the pipeline to run and one, hard-coded output port. I foresaw a more or less complete rewrite of the ProX wrapper in XQuery.

Luckily, Jim very graciously agreed to rewrite his module into something more immediately usable. I received the first updated module to test in December and the most recent update just a few days ago. He also found a bug in Calabash’s URI handling and sent a first fix to me along with the updated module. There are still things to do but for me, Christmas came really early this year.

Oh, and I’m calling the implementation, and the paper, ProXist. Sorry about that.

oXygen Users Meetup

Immediately following TIC 2013, I’ll be attending oXygen Users Meetup in Munich, Germany. I’m very much looking forward to this one. I’ve been using oXygen for years and it is now my XML tool of choice. Also, oXygen’s is the most responsive team in the world, frequently solving your problems even before you knew you had them.

It’ll be good to meet George & Co again.

Open-source ProX

I recently got the go-ahead from my boss at Condesign to open-source ProX, my XML processing XML and its first implementation. It sounds rather more than what it actually is – right now there’s a wrapper pipeline, an XForm, some XSLT and an example DTD – but I happen to think ProX is pretty cool and potentially useful.

I’ll make the stuff available at Github as soon as I have the time, of course with a proud announcement here. In the mean time, you can get an idea about what ProX is by reading my Balisage papers ProX: XML for interfacing with XML for processing XML (and an XForm to go with it) and Using XML to Implement XML.

TIC 2013

I co-presented a paper about the oXygen/eXist solution I’ve been involved in building for The Federation of Swedish Farmers (LRF) at the TIC 2013 conference in Stockholm, Sweden. My co-presenter was Anders Johannesson from LRF, who is a brilliant, brilliant presenter. He is knowledgeable, funny and supremely engaging, and I had loads of fun.

Modular XForms?

I just read Eric van der Vlist’s excellent XML London paper, in which he discusses the (lack of) modularity in XForms, caused to no small degree by the XForms MVC architecture, and, more importantly, offers solutions and workarounds. I really should have been there.

Having dabbled with XForms myself lately, I’m now very much looking forward to his talk at Balisage and the International Symposium on Native XML User Interfaces in Montréal, later this year.

Not One But Two Papers Accepted

Both of my papers submitted to Balisage were accepted. I feel honoured and somewhat nervous.

My second paper is a progress report of sorts and about ProX, my XML processing XML. I think it’s going to be very cool, especially because I will have an implementation to show. I finished the wrapper pipeline to run everything with just the other day, and one day very soon that wrapper will do things with a live ProX (my processing XML format) document, including some actual publishing.

As the Balisage blurb says, life is good.

Balisage 2013

My paper was accepted for this year’s edition of Balisage, the markup conference held in Montréal, Canada in August every year. I’m going to talk about profiling XML using an abstraction level to avoid the problems associated with using plain values.

I’m really, really excited and honoured. 

Micro XML and Namespaces

Micro XML is an attempt by James Clark, John Cowan and Uche Ogbuji to simplify XML and get rid of all that extra baggage that currently surrounds it. DOCTYPE and PIs are both removed, UTF-8 is mandatory, draconian error handling is no longer a must, and–perhaps most controversially–namespaces are gone, too.

Uche Ogbuji held a brilliant talk about Micro XML at the recent XML Prague 2013 conference, so rather than reiterating his arguments, I suggest you watch the presentation once it’s made available at the XML Prague website.

What I did want to comment about is this namespaces business. Of everything proposed in the Micro XML spec, the removal of namespaces is clearly the most controversial, as indicated by the many tweets following Uche’s talk. But should you be upset? I mean, really?

I’ve done some fair bit of XML stuff involving namespaces lately (yes, I know, there’s no way to avoid it, really). There’s a Relax NG compact schema that I wrote that uses several, including a default “”. There are conversions from external XSD-based XML to that Relax NG-based XML using XSLT 2.0, and there are conversions from the Relax NG schema to (an obviously not namespace-aware) DTD to satisfy the needs of an editor that does not know what Relax NG is. (And I can’t bring myself to write XSDs; they are the spawn of Satan.) And there are XProc-based pipelines that glue these things together, and they obviously need to be aware of the namespaces in addition to the ones they use themselves.

Lots of namespaces, in other words. And I’m not exaggerating when I tell you that a vast majority of the problems I had and the weirdness I encountered had to do with namespaces.

Nothing coming out from the transformation? A forgotten implied default namespace in the source XML. Namespace declarations in the target XML messing up validation? That same default namespace. The wrong prefix for the XLink namespace in the target XML? No explicit namespace declaration in the source. An unwanted and disallowed XLink namespace declaration being complained about in the root element of an XML document in the process of being checked out from a repository? A web service helpfully adding a seemingly missing namespace declaration to a root element into content in a SOAP envelope, resulting in a document that could not be opened but that did not show any problems in the repository itself, only on its way out…

These are just a few select examples from my plight, and while I may have some of the details slightly wrong here, you probably get the idea. The list goes on.

And why is this all happening? Because someone at some point thought that wouldn’t it be nice if you could share your XML with everyone on the globe with no risk of name collisions and clashing semantics? Wouldn’t it be cool if the conflicting schemas could all be identified using a URI? We could have a throwaway name prefix attached to that URI and implement processing that could hide the prefix for the end user, simplifying things further…

Of course, that someone’s idea of backwards compatibility was simply that to a DTD, the revolution would be hidden in an extra attribute and an element type name containing a colon.

The fact is that I have yet to be helped by namespaces when using XML from the other side of the globe. In fact, I have yet to encounter a situation where I need to process unknown XML where potential clashes in semantics can do harm without me spotting the problem well in advance and taking care of it. The fact is that I don’t often need to use XML from the other side of the globe, out of the blue. It tends to happen in a context, in a controlled manner.

But when I do process that XML, knowing full well the source semantics and how they can map to my needs, it is always the namespaces that cause me grief.

Namespaces are among the least understood features of modern-day XML and among the most abused. The tools range from helpful to disastrous to completely ignorant or just plain wrong, and there are as many reasons for this as there are XML parser implementations out there. You know right from the start that you will have problems, so you’d better resupply the medicine cabinet well in advance or get ready for that headache.

So, Micro XML? Yes, please. Now?