Author Archives: admin

Mobile Sync

After years of not being able to sync my Nokia mobile(s) with my Debian Linux desktop, syncevolution and the Evolution “groupware suite” have finally made that possible. I’ve had success with both my older Symbian 60-based phone, N95-2, and my (Maemo-based) N900.

See www.syncevolution.org for details on how to do this. My Debian Sid box required the apt sources from that site (it seems that Sid is lagging behind, at least for now; they’ve packaged the last beta but the site includes the released 1.0 version), but otherwise the install and sync both went without a hitch.

VirtualBox

I’ve switched from KVM to VirtualBox for my virtualisation needs. My Debian laptop is hosting and right now there is a Windows 7 guest. Apart from some slowness, especially with shared folders (on extfs3), the whole thing works like a charm. I can run XMetaL in the VirtualBox with no problems.

Finally it looks like I won’t be needing a Windows partition at work.

DITA Lists, Part Two

Today it occurred me to have a look at the DITA Architecture Specification source to see how the people behind the spec would tag a list; as some of you know, this was the subject of my recent blog entry. There are a number of lists in that spec, many with introductory paragraphs, so it’s a pretty obvious way to find out, right? Well, after the examples in that spec, maybe.

Anyway, this is how they do it:

<p>Introductory para:
<ul>
<li>Item</li>
<li>Item</li>
</ul>
</p>

This was one of my guesses, and I have to say that it’s better than any of the alternatives I could come up with. It’s not good markup, though, in my opinion, as it says that semantically, a paragraph is sort of a block-level superclass, a do-it-all and one that you must use if you need that introduction.

But then, why limit yourself to lists? Why aren’t notes tagged like that? Or definition lists, or images, or tables? Think about it. Doesn’t this feel just a little bit like a cop-out to you? It does to me. It feels like the author realised that he needed that wrapper but there was nothing he could cling to, other than this construction.

I’m not saying that my way is the only way (obviously it’s not) but this bothers me because it muddies the semantic waters.

List Modelling

I’ve been reading up on DITA. I’ve looked at the specs and the DTD before, obviously, but more from the perspective of an innocent bystander. The DTDs I implement in authoring systems and elsewhere are usually my own, and whenever I need to deliver content in some other format, I simply convert to it. This time things are a bit different, however, as we are considering doing a “DITA Edition” of the content management system I’m responsible for at work, and I need to know how DITA can fit into our stuff.

DITA’s got lots of things that I like, such as the combining of topic IDs with target IDs in references to avoid ID collisions. The DITA way is a very elegant solution and probably a better one than what I would usually do, which is to (in various ways in the DTD and in the authoring environment) make sure that authors can never end up in situations like it to begin with. There’s other stuff, too, but those are best left to another blog entry at some point.

Here, I want to talk about list modelling and specifically something that not only DITA but so many other DTDs and schemas seem to ignore, and that, in my mind, results in bad markup. Let’s start by discussing list semantics first:

A list is, well, a list of things. There are several types of lists, of which unordered and ordered are the most common, and the semantics are probably clear enough: the former lists stuff without a specific order (say, grocery lists) and the latter items whose order is significant (for example, David Letterman’s top ten lists). There’s also the definition list (which, in my mind, is not a list at all but a special case of a table, namely a two-column one), and probably some other types as well. In DITA, you can find something called “simple list”, which claims to limit what’s listed to one line per item, tops, without bullets or numbers, but to me that’s less about semantics and more about presentation.

So here’s a typical DITA list (HTML, DocBook and quite a few others look exactly like it, too):

<ul>
<li>Apples</li>
<li>Oranges</li>
<li>Bananas</li>
</ul>

There’s more to list semantics, though, at least in my mind. If you wanted to find a complete list in a document, you’d probably want to include its qualifying introduction (“Here’s the groceries you need to buy:”), and any and all information that goes between list items without being part of them but still belonging to the list as a whole. If your spouse is kind enough to subcategorise the grocery list to vegetables, fruit, dairy products and so on (I know I need the help), we’d have a multi-part list where the participating lists are part of a larger whole.

The introductory paragraph is where it gets tricky in DITA and similar structures. There are a LOT of block-level elements to choose from, but you cannot easily do a list that meets these requirements. This one, the preferred DITA way (at least if we choose to believe the examples in the spec), lacks a wrapper that identifies the list as one unit instead of a loose paragraph that happens to be followed by a list:

<p>The fruit we need for tonight:</p>
<ul>
<li>Apples</li>
<li>Oranges</li>
<li>Bananas</li>
</ul>
<p>And the vegetables for tomorrow:</p>
<ul>
<li>Cucumbers</li>
<li>Tomatoes</li>
</ul>

Of course, one could argue that our grocery list is really a section, but I would argue that the introductory paragraph is actually part of the list, but not necessarily a part of the whole section. What if I wanted to include images or perhaps a note to that section? Semantically, I can think of dozens of ways to reasonably expand the structure of such a surrounding section and still keep it on topic (that is, limiting it to subject matters concerning that central grocery list).

Keeping with DITA’s topic-based approach, we could certainly use a number of such sections and wrap the whole thing in a topic, but me, I think that’s overkill. All I want to do is include an introductory paragraph.

This, of course, is where some will argue that the introductory paragraph is really a heading. Definition lists in DITA and some other DTDs actually do have a heading for this very purpose, which to me hints that somebody did touch the subject at hand at some point, but then why do the “ordinary” lists without that heading? And of course, me, I think that introduction is not a heading at all, only a qualifier for the list.

Another option in DITA and others is to use the <p> element as a wrapper:

<p>The fruit we need for tonight:
<ul>
<li>Apples</li>
<li>Oranges</li>
<li>Bananas</li>
</ul>
And the vegetables for tomorrow:
<ul>
<li>Cucumbers</li>
<li>Tomatoes</li>
</ul>
</p>

This is perfectly valid, of course, but it ruins the intent of the <p> element and creates a very odd (and ugly) mixed content that would be difficult to process properly.

What I would like to see is more in the lines of this:

<ul>
<p>The fruit we need for tonight:</p>
<li>Apples</li>
<li>Oranges</li>
<li>Bananas</li>
<p>And the vegetables for tomorrow:</p>
<li>Cucumbers</li>
<li>Tomatoes</li>
</ul>

Now we have a single list (our grocery list) that includes the necessary introduction(s). Of course, it’s still somewhat ugly; I, for one, dislike the relative lack of list item structure–I’d much rather see an item modelled more properly, perhaps divided into paragraphs and other block-level content, where the concepts block level and inline remain properly separated.

4G?

Apple unveiled their newest iPhone, the 4G, earlier today. Looks like they are getting a little closer to what my Nokia N900 can do. I have to admit that they know how to market their stuff. If the functionality matched the hype, I might even be interested.

N900 Gets New Firmware

Those of us owning a Nokia N900 have been patiently (well, some of us, in any case) waiting for the new firmware, PR 1.2. It’s been delayed a couple of times but now maemo.org informs us that tomorrow the wait is finally over.

Among the goodies are a new QT library, more apps, a revamped UI, and quite a list of bug fixes.

Permanent URLs, Addresses and Names

I found a link to an article by Taylor Cowan about persistent URLs on the web. It was mostly about what happens to metadata assertions (such as RDF statements) when links break, but there was a little something on persistent links and URNs, too. A comparison with Amazon.com and how books are referenced these days was made. A way to map the ISBN number as a URN was described (URN:ISBN:0-395-36341-1 was mapped to a location by the PURL service, in this case at http://purl.org/urn/isbn/0-395-36341-1), which is quite cool and, in my opinion, both manageable and practical.

The author thought otherwise, however: But on the practical web, we don’t use PURLs or URNs for books, we use the Amazon.com url. I think in practical terms things are going to be represented on the web by the domain that has the best collection with the best open content.

Now, what’s wrong about this? At first, it may seem reasonable that Amazon.com, indeed the domain with the (probably) largest collection of book titles, authors, and so on, should be used. Books are their business and they depend on offering as many titls as possible. In the everyday world, if you want to find a book, you look it up at Amazon.com. I do it and you do it, and the author does it. So what’s wrong about it?

Well, Amazon.com does not provide persistent content per se, they provide a commercial service funded by whatever books they sell. At any time, they may decide to change the availability of a title, relocate its page, offer a later version of the same title, or even some other title altogether. The latter is unlikely, of course, but since we are talking about URLs, addresses, rather than URNs, names, talking about the URL when discussing what essentially is a name is about as relevant as talking about the worn bookshelf in my study when discussing the Chicago Manual of Style.

Yes, I realise that my example is a bit extreme, and I realise that it’s easy enough to make the necessary assertions in RDF to properly reference something described by the address rather than the address itself, but to me, this highlights several key issues:

  • An address, by its very nature, is not persistent. Therefore, a “permanent URL” is to me a bit of an oxymoron. It’s a contradiction in terms.
  • Even if we accept a “permanent URL approach”, should we accept that the addresses are provided and controlled by a commercial entity? One of the reasons to why some of us advocate XML so vigorously is that it is open and owned by no-one. Yes, I know perfectly well that we always rely on commercial vendors for everything from editors to databases, but my point here is that we still own our data, the commercial vendors don’t own it. I can take my data elsewhere.
  • Now, of course, in the world of metadata it’s sensible to give a “see-also” link (indeed that is what Mr Cowan suggests), but the problem is that the “see-also” link is another URL with the same implicit problems as the primary URL.
  • URLs have a hard time addressing (yes, the pun is mostly intentional) the problem with versioning a document. How many times have you looked up a book at Amazon.com and found either the wrong version or a list of several versions, some of which even list the wrong book?

Of course, I’m as guilty as anyone because I do that, too. I point to exciting new books using a link to Amazon.com (actually I order my books from The Book Depository, mostly) because it’s convenient. But if we discuss the principle rather than what we all do, it’s (in my opinion) wrong to suggest that the practice is the best way to solve a problem that stems from addressing rather than naming. It’s not a solution, it merely highlights the problem.