Wednesday, 7 March 2007

tpegML to Atom

Since last weekend, in the evenings I've been working on converting tpegML from the BBC's traffic site into an Atom (with embedded geoRSS) feed (I've also had some limited success converting to KML).

So far, I've found tpegML to be an absolutely horrid data format. One thing that doesn't help is that specifications to the format are locked away by the ISO (so the format is "open", but you can't just get the specs off a web site - not a great way to encourage web wide adoption). Whoever designed it went overboard on making the format language neutral, so you end up with elements like <location_descriptor descriptor_type="&loc3_32;" descriptor="M4; 25">. Fortunately, entity definitions are provided by the BBC - but it still is more pain than it's worth. The schema itself is a bit odd, where elements that you think would be children of other elements are siblings instead. BeautifulSoup comes to the rescue - it turns tpegML into something a bit more sane.

So far, I've been able to reasonably re-create the BBC's own RSS feed (at least for motorway data), so I've turned to embedding the geoRSS data, where I've run into more oddities. There are 2 formats of geoRSS - geoRSS and geoRSS Simple (plus there's a geo tag from the W3C that is slightly different but has been deprecated, but looking at some blog posts, is still commonly used). Personally, I like the old W3C format as it has explicit elements for latitude and longitude. I'm somewhat unimpressed by how geoRSS (and KML) dump both values into a single element. But what's worse is that geoRSS doesn't support multiple points in a single item (outside of the concepts lines and boxes). This is an oversight, as far my use of geoRSS is concerned, as I have the need to have a single item with more than one point, where those points are related, but they are not a part of a line. For now I think I'll try and use a line, but I'm not really satisfied. I might just extend atom myself (as I'm probably the only person who wants this traffic data in atom anyway)

1 comment:

Andrew Turner said...

I doubt you're the only one that wants traffic data in Atom. :)

You should check out the unofficial - but "likely" extension to support MultiPoint and MultiLine et al: