Dec 132011
 

T-SQL Tuesday LogoThis month’s T-SQL Tuesday is hosted by Allen White (blog | @SQLRunr) and is an invitation to show off a nifty T-SQL trick you use to make your job easier. Here’s one of my favorites as of late:

Some of the source systems we copy from shard their data across multiple similar tables. For example, instead of having 1 table named dbo.Sales, they might have 30 of them named dbo.Sales_001 to dbo.Sales_030. If there were a rhyme or reason to this sharding it might be seen as an advantage, but unfortunately neither myself nor others on the team have ever found a method to this madness, and the vendor will not divulge the way they do this. As if that’s not bad enough, additional tables can pop up out of nowhere, and not all these tables are guaranteed to contain the same columns. Table dbo.Sales_031 may pop up tomorrow and have only a subset of the columns from the previous 30 tables, or may contain new columns not present in any of the others.

To keep this all straight, I have a process that compares the columns in all applicable tables and generates a view that combines their contents using UNION ALL. I’ve actually blogged about an aspect of this before, but today I have a function containing a nice little trick that’s proven very helpful for me. Since the tables I’m generating a view of aren’t guaranteed to have the same columns in them, a simple “SELECT *” from each of them won’t work because unioning tables together requires identical datatypes in the same order. Instead I have to generate a select statement that explicitly lists all columns for every table. T-SQL can easily accomplish this with cursors and loops, but then I found FOR XML PATH, which made life a lot simpler.

FOR XML PATH has many features which are outside the scope of this post, but the one behavior we’re going to exploit today is that you can pass a row tag name that’s blank. When you do this, instead of sandwiching each row between tags, it simply concatenates all the row values together into one string. If you add commas, you’ve got a great way to list out all the columns of a table which you can then form into a SELECT statement. Without any further ado, here’s the demo code:

-- create our demo table with a few columns
CREATE TABLE dbo.MyTable (
   ID INT NOT NULL,
   Value VARCHAR(20),
   Notes NVARCHAR(MAX)
);

-- select the table's columns from system tables
SELECT c.name
FROM sys.columns c
INNER JOIN sys.tables t ON c.object_id = t.object_id
WHERE t.name = 'MyTable'
ORDER BY c.column_id;

-- now let's format it as XML
SELECT c.name
FROM sys.columns c
INNER JOIN sys.tables t ON c.object_id = t.object_id
WHERE t.name = 'MyTable'
ORDER BY c.column_id
FOR XML PATH;

-- now pass a blank string argument to PATH
-- note how the <row> tags are now removed but column tags remain
SELECT c.name
FROM sys.columns c
INNER JOIN sys.tables t ON c.object_id = t.object_id
WHERE t.name = 'MyTable'
ORDER BY c.column_id
FOR XML PATH('');

-- casting it to NVARCHAR removes the tags for each column
DECLARE @s NVARCHAR(MAX);

-- now let's add a blank string argument to PATH
SET @s = (
      SELECT ',' + QUOTENAME(c.name) -- comma to separate names
      FROM sys.columns c
      INNER JOIN sys.tables t ON c.object_id = t.object_id
      WHERE t.name = 'MyTable'
      ORDER BY c.column_id
      FOR XML PATH('') -- XML that really isn't
   );

-- remove the leading comma
SELECT SUBSTRING(@s, 2, LEN(@s));

Wrap the final output with a SELECT and a table name and you’re good to go. There’s tons more uses for this than creating lists of columns – basically anytime you want to turn column values into a delimited list you can make it happen using this method. Happy coding!

Sep 132011
 

T-SQL Tuesday LogoThis month the 22nd T-SQL Tuesday comes to us courtesy of Robert Pearl (blog | @PearlKnows), and he’s asking us to write about formatting data for presentation to end users. He describes end-users as “boss, supervisor, department head, the analyst, employees, or customers”. I’m going to take the liberty of extending this to include other computers so I can tell you all about what I came to call “Pseudo-XML”.

Fresh out of college I didn’t exactly find the DBA job I was hoping for. Nobody was really looking to trust their databases to a kid with no real work experience and I don’t blame them! Instead I started off as a software developer writing a lot of database access code. A good deal of the time I worked there was spent rewriting the middle layer of their website from an old system which used Tcl to a new one using the .Net Framework. The data I needed to read was stored in SQL Server, and the presentation layer was in ASP (pre- .Net) and most of it was reading XML files. My mission was to use C# to create XML files identical to those being generated by the Tcl code. Sounds easy enough, right?

I should add that this project had a few complicating factors, like there was zero documentation and all the people who wrote the original code had left the company long before. I also had no knowledge of Tcl and was told I shouldn’t waste my time trying to learn it since it all was going away anyway -all I really needed was the queries it was using. But the pièce de résistance was the fact that all the XML that was generated really wasn’t XML at all. That’s when things started to get much more interesting.

One of my favorite high school math teachers, Mr. Blew, would often rattle off the saying “Almost only counts in horse shoes and hand grenades” in class when someone partially answered a question. He’d probably say the same thing about this “XML” that was being generated. Apparently at the time the Tcl system was written the XML standard had not yet been finalized and so the language didn’t support it. Instead the XML was stored as strings and was generated by simply placing tags around datapoints. This way absolutely anything could become “XML” just by placing some <Tags>around it</Tags>. I started referring to this as “Pseudo-XML” when discussing with my co-workers.

While it sounds really simple, Pseudo-XML created a ton of issues as it contained many quirks. Since it wasn’t documented anywhere, I had nothing to go on other than the application that generated it and the ASP that read it. I couldn’t use the .Net XML library, as it enforced well-formedness and Pseudo-XML did away with a few basic XML rules such as requiring each document to have a root element or requiring certain characters to be escaped. Instead I got very familiar with the StringBuilder class for building Pseudo-XML strings.

As I started building more complex files a few other annoyances challenges came up, such as representing lists of data. In real XML, a list of values is typically fairly easy to represent as a hierarchy:

<Fruits>
<Fruit>Guava</Fruit>
<Fruit>Mango</Fruit>
<Fruit>Kumquat</Fruit>
</Fruits>

But apparently Pseudo-XML didn’t support nested elements of any type. It was more like a list of tags than a hierarchy, and there weren’t any attributes either. I guess the brains behind Pseudo-XML felt there was no need for them, but probably more like nobody wanted to write the ASP code necessary to parse them out. Instead, lists were pipe-delimited like this:

<BaconTypes>Canadian|Applewood smoked|Pancetta</BaconTypes>

This worked fine, except in cases where values containing pipes were present. The authors of Pseudo-XML crafted a solution to that too, though, the pipe-and-tilde-delimited list:

<HighSchools>Fenwick~|~Riverside|Brookfield~|~Paris</HighSchools>

Fortunately there was never a case where it was necessary to display a “~|~” value, otherwise I’m sure things would have gotten much more interesting. When I left, there were still a few bits and pieces of Pseudo-XML lurking in the deepest parts of the site. I wouldn’t be surprised at all if they’re still there.

Anyhow, that’s my little tale about the lengths I had to go to to please customers through formatting, even if the customers were web servers and not people. Thanks, Robert, for the thought-provoking topic!