Sep 132011
 

T-SQL Tuesday LogoThis month the 22nd T-SQL Tuesday comes to us courtesy of Robert Pearl (blog | @PearlKnows), and he’s asking us to write about formatting data for presentation to end users. He describes end-users as “boss, supervisor, department head, the analyst, employees, or customers”. I’m going to take the liberty of extending this to include other computers so I can tell you all about what I came to call “Pseudo-XML”.

Fresh out of college I didn’t exactly find the DBA job I was hoping for. Nobody was really looking to trust their databases to a kid with no real work experience and I don’t blame them! Instead I started off as a software developer writing a lot of database access code. A good deal of the time I worked there was spent rewriting the middle layer of their website from an old system which used Tcl to a new one using the .Net Framework. The data I needed to read was stored in SQL Server, and the presentation layer was in ASP (pre- .Net) and most of it was reading XML files. My mission was to use C# to create XML files identical to those being generated by the Tcl code. Sounds easy enough, right?

I should add that this project had a few complicating factors, like there was zero documentation and all the people who wrote the original code had left the company long before. I also had no knowledge of Tcl and was told I shouldn’t waste my time trying to learn it since it all was going away anyway -all I really needed was the queries it was using. But the pièce de résistance was the fact that all the XML that was generated really wasn’t XML at all. That’s when things started to get much more interesting.

One of my favorite high school math teachers, Mr. Blew, would often rattle off the saying “Almost only counts in horse shoes and hand grenades” in class when someone partially answered a question. He’d probably say the same thing about this “XML” that was being generated. Apparently at the time the Tcl system was written the XML standard had not yet been finalized and so the language didn’t support it. Instead the XML was stored as strings and was generated by simply placing tags around datapoints. This way absolutely anything could become “XML” just by placing some <Tags>around it</Tags>. I started referring to this as “Pseudo-XML” when discussing with my co-workers.

While it sounds really simple, Pseudo-XML created a ton of issues as it contained many quirks. Since it wasn’t documented anywhere, I had nothing to go on other than the application that generated it and the ASP that read it. I couldn’t use the .Net XML library, as it enforced well-formedness and Pseudo-XML did away with a few basic XML rules such as requiring each document to have a root element or requiring certain characters to be escaped. Instead I got very familiar with the StringBuilder class for building Pseudo-XML strings.

As I started building more complex files a few other annoyances challenges came up, such as representing lists of data. In real XML, a list of values is typically fairly easy to represent as a hierarchy:

<Fruits>
<Fruit>Guava</Fruit>
<Fruit>Mango</Fruit>
<Fruit>Kumquat</Fruit>
</Fruits>

But apparently Pseudo-XML didn’t support nested elements of any type. It was more like a list of tags than a hierarchy, and there weren’t any attributes either. I guess the brains behind Pseudo-XML felt there was no need for them, but probably more like nobody wanted to write the ASP code necessary to parse them out. Instead, lists were pipe-delimited like this:

<BaconTypes>Canadian|Applewood smoked|Pancetta</BaconTypes>

This worked fine, except in cases where values containing pipes were present. The authors of Pseudo-XML crafted a solution to that too, though, the pipe-and-tilde-delimited list:

<HighSchools>Fenwick~|~Riverside|Brookfield~|~Paris</HighSchools>

Fortunately there was never a case where it was necessary to display a “~|~” value, otherwise I’m sure things would have gotten much more interesting. When I left, there were still a few bits and pieces of Pseudo-XML lurking in the deepest parts of the site. I wouldn’t be surprised at all if they’re still there.

Anyhow, that’s my little tale about the lengths I had to go to to please customers through formatting, even if the customers were web servers and not people. Thanks, Robert, for the thought-provoking topic!

  3 Responses to “T-SQL Tuesday #22: Pseudo-XML”

  1. I’m pretty sure the BaconTypes are still in use, and I think the HighSchools list didn’t originate from what you worked on, that pattern came from one specific person who used it for something completely different… but we can talk about those old time details over lunch sometime. In either case, I didn’t realize they exposed that ugliness directly like that. Maybe they were thinking ahead of their time and trying to merge JSON output into XML? (Yeah, right…) I thought the pseudo-xml was limited to their lack of well-formed xml and tendency to put regular html tags (unescaped) in the output.

    Regardless, a post like this is a good reminder (for me at least) to turn up the heat on trying to purge out all that legacy garbage wherever it still may exist.

    • I totally forgot about the unescaped HTML tags they’d put in there! Oh well. You’re not on that team anymore, so you shouldn’t have to worry about it, right? :)

  2. [...] Bob Pusateri, aka SQLBob, of his blog series, The Outer Join (who hosted a T-SQL Tuesday recently himself), relates his story to data presentation and takes the liberty of extending this to include other computers so he can tell all about what he calls “Pseudo-XML”. [...]

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>