T-SQL Tuesday #22: Pseudo-XML

Sep 13, 2011 · TSQL Tuesday XML ·

This month the 22nd T-SQL Tuesday comes to us courtesy of Robert Pearl (blog | @PearlKnows), and he's asking us to write about formatting data for presentation to end users. He describes end-users as "boss, supervisor, department head, the analyst, employees, or customers". I'm going to take the liberty of extending this to include **other computers** so I can tell you all about what I came to call "Pseudo-XML".

Fresh out of college I didn't exactly find the DBA job I was hoping for. Nobody was really looking to trust their databases to a kid with no real work experience and I don't blame them! Instead I started off as a software developer writing a lot of database access code. A good deal of the time I worked there was spent rewriting the middle layer of their website from an old system which used Tcl to a new one using the .Net Framework. The data I needed to read was stored in SQL Server, and the presentation layer was in ASP (pre- .Net) and most of it was reading XML files. My mission was to use C# to create XML files identical to those being generated by the Tcl code. Sounds easy enough, right?

I should add that this project had a few complicating factors, like there was zero documentation and all the people who wrote the original code had left the company long before. I also had no knowledge of Tcl and was told I shouldn't waste my time trying to learn it since it all was going away anyway -all I really needed was the queries it was using. But the pièce de résistance was the fact that all the XML that was generated really wasn't XML at all. That's when things started to get much more interesting.

One of my favorite high school math teachers, Mr. Blew, would often rattle off the saying "Almost only counts in horse shoes and hand grenades" in class when someone partially answered a question. He'd probably say the same thing about this "XML" that was being generated. Apparently at the time the Tcl system was written the XML standard had not yet been finalized and so the language didn't support it. Instead the XML was stored as strings and was generated by simply placing tags around datapoints. This way absolutely anything could become "XML" just by placing some around it. I started referring to this as "Pseudo-XML" when discussing with my co-workers.

While it sounds really simple, Pseudo-XML created a ton of issues as it contained many quirks. Since it wasn't documented anywhere, I had nothing to go on other than the application that generated it and the ASP that read it. I couldn't use the .Net XML library, as it enforced well-formedness and Pseudo-XML did away with a few basic XML rules such as requiring each document to have a root element or requiring certain characters to be escaped. Instead I got very familiar with the StringBuilder class for building Pseudo-XML strings.

As I started building more complex files a few other ~~annoyances~~ challenges came up, such as representing lists of data. In real XML, a list of values is typically fairly easy to represent as a hierarchy:

1<Fruits><br />
2<Fruit>Guava</Fruit>
3<Fruit>Mango</Fruit>
4<Fruit>Kumquat</Fruit>
5</Fruits>

But apparently Pseudo-XML didn't support nested elements of any type. It was more like a list of tags than a hierarchy, and there weren't any attributes either. I guess the brains behind Pseudo-XML felt there was no need for them, but probably more like nobody wanted to write the ASP code necessary to parse them out. Instead, lists were pipe-delimited like this:

1<BaconTypes>Canadian|Applewood smoked|Pancetta</BaconTypes>

This worked fine, except in cases where values containing pipes were present. The authors of Pseudo-XML crafted a solution to that too, though, the pipe-and-tilde-delimited list:

1<HighSchools>Fenwick~|~Riverside|Brookfield~|~Paris</HighSchools>

Fortunately there was never a case where it was necessary to display a ~|~ value, otherwise I'm sure things would have gotten much more interesting. When I left, there were still a few bits and pieces of Pseudo-XML lurking in the deepest parts of the site. I wouldn't be surprised at all if they're still there.

Anyhow, that's my little tale about the lengths I had to go to to please customers through formatting, even if the customers were web servers and not people. Thanks, Robert, for the thought-provoking topic!