Tuesday, March 01, 2005

The case against XML



If any of you happened to see the article in the Tech Monday section of the San Jose Mercury News about Bubbler, you may have seen me quoted as saying that XML is one of the most over-rated phenomena. I've gotten quite a few questions about that :)



Here's why I think XML is overrated.



XML is not really a technology. It's a syntax for tagging bits of data. It's not all that advanced in its evolution, harking back to SGML from some 20 years ago, which was actually better in many ways.



XML is syntax without any semantics. That's a big piece of what's wrong with it. Any program parsing an XML file still needs to know exactly what's in it in order to make sense of it (witness 5 different "flavors" of RSS).



But there are a lot of little things, too. To get an idea of what's wrong with XML, just open up any RSS file and take a look. The first thing you see is that XML does not nest worth a damn. You can't embed XML (or even HTML) within an XML file without escaping every last character, so it looks like &lt;tagname&gt; instead of <tagname> (it was hard enough to type this in and escape it correctly; I can't imagine what it looks like in an RSS feed :)



It doesn't stop there. The open and close delimiters are clunky, though I guess you can't blame XML for that as HTML has it too. The syntactic waste can double the file size easily. And parsing XML files requires reading the whole file before you can start to actually make sense of it. Audio and Video started this way, and guess what, it didn't work very well when transmitted over networks, so people worked really hard to coming up with streaming audio and video just to get around this shortcoming. Where's the streaming XML format? I rest my case.



XML is verbose, inefficient, suffers from whole-file syndrome, is difficult to parse correctly, has nesting/escaping problems to the nightmare degree ... and yet everybody is always raving about how great it is. I think it's overrated at best.

No comments: