2014-09-03

XML done badly?

We are getting quite a lot of experience with XML these days, having had to use it to interact with BT and other suppliers, even HMRC, and having used it internally for many things, we kind of think we know what we are doing.

One of the big things with any XML based system is choosing the style of operation. There are a wide range of different approaches to how one structures data in XML. It really is giving people enough rope to hang themselves.

XML has two basic concepts - the object and the attribute. An attribute is something which can appear in any order within an object and has a unique tag so can appear once (or not at all) and has a simple string value. An object can appear in another object, and can even interleave with text (though apart from HTML, nobody does that really). Typically an object can have a list of objects contained within it, and the order can matter and specific named objects can appear more than once. These objects can have attributes and can have a text content (or other objects as content).

Now, this does lend it self to some types of data structures. For example, if you have a row in an SQL table the columns will have unique names and the fields will be strings, so making an object with attributes matching the column names makes a lot of sense.

Some times you want to pass stuff in a generic way and even have data types defined instead of inferred from the string itself, and for that an object is not a bad idea as it can have attributes to say the data type and other meta data, with the text content being the value.

But some times people go really mental, using objects and text content to do things that are far more logically attributes (as they appear at most once, and order does not matter). A simple example is the XML used to tell the Iridium Go! to do something, sent as an HTTP POST with a SOAPAction and classic SOAP envelope XML.

So, first issue :-

         <userCredentials>
            <userName>guest</userName>
            <password>guest</password>
         </userCredentials>


Given that the userName and password can logically only appear once in that, and the order they are used really has no meaning, why the hell not :-

<userCredentials userName="guest" password="guest"/>

That would be way simpler. Or, perhaps, if all performTask requests must have credentials, then make them attributes of that top level object.

Then we have a requestList that contains <taskID>2</taskID>. So again, why not taskID="2" ?

Then we go for some generic parameters, and now it gets really verbose :-

               <options>
                   <name>Enable DNS forwarding</name>
                   <value>true</value>
                   <dataType>boolean</dataType>
               </options>


I mean, what can I say, why the hell not just enable-dns="true" somewhere? Or perhaps <option name="Enable DNS forwarding" type="boolean">true</option>

However, this is where things get annoying - I personally prefer a less verbose style, but whatever style you pick, you could at least be consistent. Why the hell have things like that incredibly verbose "option" above, but have things like a taskID that is "2". What does "2" mean?

I assumed it was some sort of sequence of some such, so I tried 1. Guess what, it seems task "1" is "Set SOS state and start sending GPS updates", where as task "2" is "Start an Internet connection" (which is what I wanted). Why the hell go for incredibly terse and meaningless task digits rather than something like <taskID>Start Internet Connection</taskID>

Anyway, needless to say I now have a simple script to start and stop the Internet connection from my laptop, including defining which ports are open.

Interestingly TCP copes, even ssh, though at this speed telnet is possibly a tad better.

What was a surprise is the mosh did not cope. It tried, and got started even, but it started re-sending some UDP packets. These just clogged up the tx queue by the look of it, and meant even more latency, and that meant it did even more re-sends, and so on. It got to the stage that it would send several UDP packets from a port in a short period, apparently give up and try another source port, all before getting a reply back to the first port (which it was now ignoring). Seems mosh can't quit cope with the nearly 2 seconds round trip time and incredibly slow transmission rate. Pitty.

However, a simple ssh or telnet was just about usable for irc.

Anyone wanting this script for their Iridium Go!, let me know.

9 comments:

  1. The more I learn about XML, the more certain I become that it is vastly over complicated. ASN.1 did the same job much more simply and with much less ambiguity, and no textual encoding just so 2 out of every 8 bits are wasted on the line (roughly).

    The other problem with textual encoding is parsing it, considering how many systems send invalidly encoded suff. ASN.1 and other binary encoding schemes have canonical encoding methods and mal encoded stuff in them is much less common, and easier to handle too.

    ReplyDelete
    Replies
    1. Binary encoding is almost always more efficient in terms of processing and space. But textual encoding makes things far far easier to debug - quite often you can see the error in the text right there without having to faff with decoding tools.

      For a lot of my data interchange needs I've settled on gzipped XML, which is surprisingly space efficient whilst still retaining readability.

      Delete
  2. I've seen some pretty terrible XML myself. It seems to be one of those technologies that *can* be used elegantly by competent engineers, but turns into an awful mess when it's inevitably handed to the nearest intern to design what later turns out to be a critical legacy system.

    Mind you, looking at the average XML schema gives some insight into the management and development process that goes into a lot of tech products. Including, for example, consumer routers.

    Scary.

    ReplyDelete
  3. We integrated with National Rail a while back, XML/SOAP are just so far from the modern web.
    Binary encoding is faster over wire etc, but debugging is a pain, its handy for IPC or Microservice setups. Google/Square etc use Protobuf.

    More often than not JSON is the way to go. It's lighter weight and with HTTP2 spec around the corner (already implemented in some cases), compression on the wire makes binary formats all but obsolete.

    Just my two cents. SOAP was once described to me as; "The guy who buys a new Porsche when the ash tray is full." It works but is expensive and incredibly impractical.

    ReplyDelete
  4. Hi, could you send me the script if you still have it around? Struggling with the same API.

    ReplyDelete
    Replies
    1. I don’t think I can find it now, sorry.

      Delete
    2. No worries, figured out the problem with my script and was able to establish a usable SSH connection.

      Pretty incredible how far ~2.6kbit/s gets you.

      Delete
    3. Yeh, there are some options to say that can make that faster, but the initial negotiation really is a tad time consuming...

      Delete
    4. Hi, I can't establish a SSH connection, any chance to get your working script ?

      Delete

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

Fencing

Bit of fun... We usually put up some Christmas lights on the house - some fairy lights on the metal fencing at the front, but a pain as mean...