Sep 132011

T-SQL Tuesday LogoThis month the 22nd T-SQL Tuesday comes to us courtesy of Robert Pearl (blog | @PearlKnows), and he’s asking us to write about formatting data for presentation to end users. He describes end-users as “boss, supervisor, department head, the analyst, employees, or customers”. I’m going to take the liberty of extending this to include other computers so I can tell you all about what I came to call “Pseudo-XML”.

Fresh out of college I didn’t exactly find the DBA job I was hoping for. Nobody was really looking to trust their databases to a kid with no real work experience and I don’t blame them! Instead I started off as a software developer writing a lot of database access code. A good deal of the time I worked there was spent rewriting the middle layer of their website from an old system which used Tcl to a new one using the .Net Framework. The data I needed to read was stored in SQL Server, and the presentation layer was in ASP (pre- .Net) and most of it was reading XML files. My mission was to use C# to create XML files identical to those being generated by the Tcl code. Sounds easy enough, right?

I should add that this project had a few complicating factors, like there was zero documentation and all the people who wrote the original code had left the company long before. I also had no knowledge of Tcl and was told I shouldn’t waste my time trying to learn it since it all was going away anyway -all I really needed was the queries it was using. But the pièce de résistance was the fact that all the XML that was generated really wasn’t XML at all. That’s when things started to get much more interesting.

One of my favorite high school math teachers, Mr. Blew, would often rattle off the saying “Almost only counts in horse shoes and hand grenades” in class when someone partially answered a question. He’d probably say the same thing about this “XML” that was being generated. Apparently at the time the Tcl system was written the XML standard had not yet been finalized and so the language didn’t support it. Instead the XML was stored as strings and was generated by simply placing tags around datapoints. This way absolutely anything could become “XML” just by placing some <Tags>around it</Tags>. I started referring to this as “Pseudo-XML” when discussing with my co-workers.

While it sounds really simple, Pseudo-XML created a ton of issues as it contained many quirks. Since it wasn’t documented anywhere, I had nothing to go on other than the application that generated it and the ASP that read it. I couldn’t use the .Net XML library, as it enforced well-formedness and Pseudo-XML did away with a few basic XML rules such as requiring each document to have a root element or requiring certain characters to be escaped. Instead I got very familiar with the StringBuilder class for building Pseudo-XML strings.

As I started building more complex files a few other annoyances challenges came up, such as representing lists of data. In real XML, a list of values is typically fairly easy to represent as a hierarchy:


But apparently Pseudo-XML didn’t support nested elements of any type. It was more like a list of tags than a hierarchy, and there weren’t any attributes either. I guess the brains behind Pseudo-XML felt there was no need for them, but probably more like nobody wanted to write the ASP code necessary to parse them out. Instead, lists were pipe-delimited like this:

<BaconTypes>Canadian|Applewood smoked|Pancetta</BaconTypes>

This worked fine, except in cases where values containing pipes were present. The authors of Pseudo-XML crafted a solution to that too, though, the pipe-and-tilde-delimited list:


Fortunately there was never a case where it was necessary to display a “~|~” value, otherwise I’m sure things would have gotten much more interesting. When I left, there were still a few bits and pieces of Pseudo-XML lurking in the deepest parts of the site. I wouldn’t be surprised at all if they’re still there.

Anyhow, that’s my little tale about the lengths I had to go to to please customers through formatting, even if the customers were web servers and not people. Thanks, Robert, for the thought-provoking topic!

Sep 082011

Everything is good in moderation, and when it comes to having multiple filegroups in your database, I believe this to be true there as well. Having multiple filegroups allows you to spread your data across multiple disks, and can also come in handy for features like partitioning. When applied properly and reasonably, multiple filegroups can lead to significant performance improvements.

Mystery DoorWhen applied unreasonably, the result will probably be similar to what I have experienced lately – a large thorn in your side! There’s a maximum limit of 32,767 filegroups per database, but I don’t recommend having anywhere near that number. One of the systems I’ve been working with has over 100 filegroups, most of which serve little to no purpose because they contain very few tables and are all located on the same drive. Some are worth keeping as they contain data that’s accessed in ways that could benefit from a separate filegroup, but it seems the logic used by those who came before me was “If I’m creating 2 or 3 tables about X, I’ll make a filegroup for X as well.” This large number of filegroups is not only a pain to manage, but a waste of disk space too, as most of them contain files sized much larger than they needed to be.

I’ve been spending some time working on paring down the number of filegroups by combining them based on a few factors including size, number of objects they contain, and usage patterns. I’ve come up with a few queries that have been most helpful with these tasks, and am happy to share them.

As always, these scripts come with ABSOLUTELY NO WARRANTY. It is your responsibility to read and understand these queries before running them, and I highly recommend getting familiar with them in a development environment before running in production.

Filegroup Summary
This will list all of the filegroups in the current database and tell you how many files it contains, how many objects are stored in it, total size of the files, total space used, total free space, and percentage of free space. It’s a great starting point for seeing which filegroups might not belong, possibly because they contain very few objects and/or use a low percentage of space. Decisions should not be based totally on the results you see here though, other factors to consider are file activity (from the sys.dm_io_virtual_file_stats DMV) and your knowledge of the data contained in that filegroup.

WITH fg_sizes AS (
SELECT fg.data_space_id AS FGID, 
	COUNT(f.file_id) AS FileCount,
	ROUND(CAST(SUM(f.size) AS FLOAT)/128,2) AS Reserved_MB,
	ROUND((CAST(SUM(f.size) AS FLOAT)/128)-(CAST(SUM(FILEPROPERTY(,'SpaceUsed'))AS FLOAT)/128),2) AS Free_MB
FROM sys.filegroups fg
LEFT JOIN sys.database_files f ON f.data_space_id = fg.data_space_id
GROUP BY fg.data_space_id
fg_objs AS (
	SELECT ID AS FGID, SUM(n) AS TotalObjects
		SELECT au.data_space_id AS ID, 
			COUNT(*) AS n
		FROM sys.allocation_units au
		WHERE au.[type] = 1
		GROUP BY au.data_space_id
		SELECT t.lob_data_space_id AS ID,
			COUNT(*) AS n
		FROM sys.allocation_units au
			JOIN sys.partitions p ON au.container_id = p.partition_id
			JOIN sys.objects o ON p.object_id = o.object_id
			LEFT JOIN sys.tables t ON o.object_id = t.object_id
		WHERE au.[type] = 1
			AND au.data_space_id <> t.lob_data_space_id
		GROUP BY t.lob_data_space_id
	) q
SELECT fg.[data_space_id] AS FilegroupID, 
	fg.[name] AS FilegroupName,
	ISNULL(fgo.TotalObjects,0) AS ObjectCount,
	ROUND(MAX(fgs.Free_MB)/MAX(fgs.Reserved_MB)*100,2) AS Percent_Free
FROM sys.filegroups fg
INNER JOIN fg_sizes fgs ON fg.data_space_id = fgs.FGID
LEFT JOIN fg_objs fgo ON fg.data_space_id = fgo.FGID
GROUP BY fg.data_space_id,, fgs.FileCount, fgo.TotalObjects
ORDER BY Percent_Free DESC;

What’s In That Filegroup?
Now that we have our list of filegroups, you’ll probably wonder what’s in them. This query gives you some more detail about what the filegroup contains such as object & index names, and their size. It also gives the LOB (large object) filegroup for each table, if applicable. (LOB Filegroups can get pretty interesting and are worthy of their own post, so I won’t discuss them here.) Since this query may return a large number of rows depending on how many objects the filegroup contains, I recommend only running this on one filegroup at a time.

SET @FGID = ; -- Filegroup ID

	o.object_id AS ObjID,
	p.index_id AS IndexID, 
	ROUND(CAST(au.data_pages AS FLOAT)/128,2) AS MB_Used,
	QUOTENAME( AS DataFilegroup,
FROM sys.allocation_units au
    INNER JOIN sys.partitions p ON au.container_id = p.partition_id
    INNER JOIN sys.objects o ON p.object_id = o.object_id
    INNER JOIN sys.indexes i ON p.index_id = i.index_id AND i.object_id = p.object_id
    INNER JOIN sys.schemas s ON o.schema_id = s.schema_id
    LEFT JOIN sys.tables t ON o.object_id = t.object_id
    LEFT JOIN sys.filegroups f ON au.data_space_id = f.data_space_id
    LEFT JOIN sys.filegroups f2 ON t.lob_data_space_id = f2.data_space_id
WHERE au.[type] = 1
	AND (au.data_space_id = @FGID OR t.lob_data_space_id = @FGID)
ORDER BY SchemaName, ObjName, p.index_id;

What Files Make Up That Filegroup?
Finally, you might want to know what files make up each filegroup. This will return the name and path of each file in the filegroup as well as the size, used and free space. As with the previous query, I tend to run it on only one filegroup at a time.

	a.FILEID As FileID,
	a.groupid AS FileGroupID,
	a.[name] AS LogicalName,
	a.[filename] AS FilePath,
	ROUND(CAST(a.size AS FLOAT)/128,2) AS Reserved_MB,
	ROUND(CAST(a.size AS FLOAT)/128 - CAST(FILEPROPERTY(,'SpaceUsed') AS FLOAT)/128,2) AS Free_MB
FROM dbo.sysfiles a
WHERE a.groupID = 16

There you have it, a few queries that might come in handy when doing a filegroup audit. If you’ve never done a filegroup audit before, these should help get you started. Happy querying!

Sep 012011

I’m back! I really never left, but due to the time spent putting the final pieces together for my wedding things got a bit stale here. I can’t really say I’m sorry though, as the whole experience created some great blog topics! As you now know if you didn’t already, I got married recently. My fianceé wife and I had a blast planning it all ourselves, and early on we decided that my DBA skills could come in handy as we would be generating a decent amount of data throughout the process. The guest list, invitation information, replies, meal choices, seating charts, gifts received and thank you notes sent could all be stored in a database, and schema designs started popping into my head shortly after we got engaged. As we had a relatively long engagement there was plenty of time to think everything through before the coding began.

Weighing The Options
My plan from the beginning was to utilize my skills and create a database in SQL Server for keeping everything neat, tidy, and in proper relational form. This sounded like a great idea at first, but the more I thought about it the less appealing it became. Designing the database was a non-issue, but designing a front-end was. Since this would be a shared database that both my wife and I would be utilizing, I really wanted a web interface. Despite years of .Net application development experience, I was never all that good at designing front ends. It’s a skill I really wish I had – maybe someday I’ll get good at it.

Another option was just to skip the front end altogether and do everything in SSMS with straight SQL. This would be a big pain, and I have lots of database experience! I knew it would be very difficult for Michelle to use a system in this way, so it was quickly ruled out.

In the end I decided to go with something much simpler, Google Docs. Its sharing functionality was excellent, as we were able to keep pretty much everything stored in the cloud and synchronized between the both of us. Most of our information was stored in spreadsheets (“Excel normal form” counts as a database, right? Right? Bueller?…) We also used the word processor to store notes, such as when either of us would make a phone call so we could keep a log of all activities. The ability for both of us to be editing a document at the same time also came in handy a number of times. Contracts and other correspondence were scanned (if not in electronic form already) and stored as PDFs. We even made good use of Google’s versioning functionality, as several of our scanned documents evolved over time. We still had a gigantic wedding binder with paper copies of most things, but using the cloud made it a lot less necessary. Having everything stored online also made it very easy to do things when away from home – like at work, because many vendors are only open during regular business hours.

But wait, this *is* a SQL Server blog, isn’t it? Don’t worry kids – I’m not going anywhere, but the moral of this story is that even though I really love SQL Server it just wasn’t the right tool for me in this particular case. You’ll be happy to know that I did end up importing a bunch of my spreadsheet data into a database anyway for some reporting and BI purposes, which I’ll cover in another post.

A Better Way To Reply
Aside from the method in which all this wedding data was being stored, another important factor is how it was retrieved. As awesome as online invitations would be, I was not about to send out e-vites for my wedding. That being said, I’ve never been a big fan of wedding reply cards either. I have nothing against the cards themselves, but the traditional design of them has never sat right with me. Let’s look at a standard reply card. I don’t have any handy, but they all look pretty much like this:

RSVP Card 1

First off, I’ve always thought the “M” was stupid. I know I’m supposed to write my name on the line, and I don’t really need any help in writing “Mr.”, “Mrs.”, or “Ms.” If your name happens to start with an “M”, you’re even luckier! This line is fine for writing 1 or 2 names, but we had several instances where families of 4 or 5 were invited. Cramming all those names into that space would be a pain at best.

Another thing I hate about reply cards is the method of selecting a meal. Let’s say 2 people will be attending the wedding using the card above. One person is having the chicken, the other the fish. It’s easily to specify the couple will be eating 1 chicken and 1 fish, but who’s having what? There’s really no way to specify that on the above card without writing really small in a margin somewhere, and that’s just sloppy. There had to be a better way. Here’s what we came up with to address these issues:

RSVP Card 2

As you can see, my spreadsheet database roots came into play in the design of my reply cards. You’re actually looking at the back side – the front gave a reply date and explained the meal choices in more detail. This design allows ample space for names and a rather intuitive method for choosing a meal for each guest. (Yes, I’m aware that “child” is a meal choice and that it looks really bad, but it fit much better into the column than “Chicken Tenders”.)

The final feature we added was the number you can see in the bottom right. On the advice of several friends who had issues at their weddings, we gave each card a primary key ID number. This way if someone mailed a card back without writing their name on it we would still be able to trace it to its owner. We were fortunate in that we didn’t really need this. One couple forgot to write their last name, but we knew who they were anyway.

Not only did Michelle and I have a great time planning our wedding, but we had a great time dealing with the data that was created as a result! Stay tuned for more posts to come on wedding data and what we did with it.