I’m very proud to have contributed a chapter to Tribal SQL, which is now available for purchase at Amazon and finer bookstores everywhere. This is the first book I’ve had any part in writing, something I never thought I would do. I always hated English and literature classes in school – if it wasn’t math, hard science, or architectural history, I wanted no part of it. I credit the SQL community and blogging for slowly getting me excited about the written word.
Getting this book from idea to print was not a sprint but a marathon. While I’m sure there was plenty going on behind the scenes that I wasn’t aware of, here’s how everything came together from my point of view.
How I got Involved
The first I heard about “SQL Tribal Knowledge” was from Matt Velic’s (@mvelic) blog post almost two years ago. Despite being fans of MidnightDBAs (@MidnightDBA) Jen and Sean McCown, I missed their blog post announcing the project. Basically they wanted to get a whole bunch of unpublished people together, have everyone write a chapter, and then assemble them into a book. Seemed like a great idea to me, so I submitted a few topic ideas, and ended up writing about data compression – a feature I’ve done lots of work with and know rather well.
Actually writing the chapter was both the hardest and easiest parts for me. Getting started was difficult because I wasn’t sure about what style of writing to use. I ended up reading through about a dozen other books not for content, but to see what tone the authors took when putting their words to paper. I ended up going with something not much different from a blog post – perhaps slightly more formal. With a style established, everything flowed easily. After about 2 months, my initial draft was complete and to my liking.
After the first drafts came peer editing. All the authors picked two other chapters to read and comment on. It was a great gut-check on things like grammar, effectiveness of explanations, etc. With both reviews in-hand it was back to the drawingkeyboard for revisions.
After peer editing there was a second round of reviews, this time from outside the author pool. Over 20 bloggers and speakers from the SQL Server community volunteered their time to read and comment on our work. Afterwards, this was followed by another round of revisions.
Throughout the process, Jen and Sean were trying to find a publisher. There was some brief talk about self-publishing, but then Red Gate got involved and agreed to publish the book, now named “Tribal SQL”. This involved – you guessed it – another round of editing, this time by Red Gate’s Tony Davis (@TonyTheEditor). Tony was delightful to work with and I have tremendous respect for what he does. After a few rounds of back-and-forth with him, my chapter was done.
From this point on I’m sure there were a lot of things taking place that I didn’t know about, but the end result is that the book was published, and Red Gate was kind enough to hold a launch event at the PASS Summit and distribute copies. They also held a book signing for all the authors that were in attendance.
Would I do it again?
I would. Writing a chapter is a lot of work – writing an entire book is obviously much more, but I found it enjoyable and rewarding. Given the right topic and/or group of co-authors, I’m sure I could have just as much fun working on another one.
Write Book, Get Published, ?????, Profit?
HA! I should mention that all royalties from the sale of this book are being donated toComputers4Africa. Even if that weren’t the case, any money we might get would work out to a paltry rate when divided by the number of hours spent working on this project. If you ever decide to write a book, make sure you’re doing it for the right reasons. Money is not one of them.
Finally, some thanks are in order. Thanks to Jen and Sean McCown for putting this all together, and to Matt Velic for blogging and making me aware! Thanks to all the other co-authors and editors for their time and hard work. Finally, thanks to the wonderful people at Red Gate, not only for publishing this book, but for the tremendous support they give to the SQL Server community as a whole.
Tomorrow I am making the trek up to Madison to speak at MADPASS! I had a great time there last year and am looking forward to heading back.
I’ll be giving my talk on SQL Server’s data compression feature entitled “The Skinny on Data Compression.” I’ll be explaining how data compression works and sharing some of the things I’ve learned about when and where to deploy data compression that I’ve accumulated over the past 2 years of using it.
Greg Kramer will also be presenting tips and tricks regarding DAX that both Excel and SQL professionals should find useful.
It’s once again the second Tuesday of the month, meaning it’s T-SQL Tuesday! This month’s topic comes to us from Jen McCown (blog | @JenniferMcCown). Jen is asking us to share our knowledge about filegroups, and it just so happens that I have a story about migrating to new storage that involved a large number of file and filegroup operations.
At work we have been very fortunate to get a new SAN, and I was charged with moving data from the old SAN to the new one. This data falls under 2 main groups: large numbers of small files which can be moved at any time without consequence, and data files from the production SQL Server database. I covered how I moved the non-database files in a previous post on the Microsoft FCIV. This time I’ll cover the larger task of moving the databases. To make things much more difficult interesting, I did not have the luxury of being able to take any downtime to accomplish this move. Making use of pre-scheduled maintenance windows was fine, but those are typically 2 hours every 2-3 months, not nearly enough time to move the 22TB of data that was in need of migration.
Just to be clear, the instance of SQL Server was not moving or changing at all. The server was connected to both the old and new SANs, and the objective was to move a particular database from old to new. Yes, there are other databases on the server, but they are located on other storage and did not need to be moved. We use SQL Server 2008 R2 Enterprise Edition, so at least I had the full arsenal of functionality available to me.
I began listing out all the ways to move data that I could think of, and started disqualifying from there:
Copy the files. This would be easy and relatively fast, but requires taking the database offline to move the files. Since uptime is paramount, this is a no-go.
Online index rebuilds. Using enterprise edition means it’s possible to rebuild tables and indexes online without interrupting users. It is also possible to move objects to a new filegroup online as well. This is very useful, but comes with some limitations. Prior to SQL Server 2012, indexes containing LOB (large object) information such as columns of type image, ntext, text, varchar(max), nvarchar(max), varbinary(max) and xml can’t be rebuilt online. SQL Server 2012 has improved this feature so this limitation only applies to image, ntext, and text columns, which are deprecated anyway. But even though SQL Server 2012 lets you perform online rebuilds of tables with LOB data, it still won’t let you move the LOB data to a different filegroup. This solution gives us the uptime we require, but unfortunately can’t move everything.
Mirroring. Much like a rolling upgrade, mirroring can be used to migrate a database between instances. You can set up a mirroring session with the mirror server on the instance you want to migrate to, manually fail over, and you’re now on the new instance. Unfortunately it won’t work in this case because the primary and mirror databases can’t reside on the same instance.
Backup and restore onto the new SAN. This is a very strong candidate, especially since I can backup the database in about 3 hours and restore it almost as quickly. Restoring a backup on the new storage (with a different database name) can take place while the current database remains online. Cutting over would involve backing up the tail of the transaction log on the old database, restoring it on the new one, and then swapping names. The cutover would be quick enough to fit in a maintenance window, making this a very attractive option. I ended up choosing against it because the new storage would be partitioned differently than the old storage and adjusting the files and filegroups and then rebuilding objects across them would make the process more complicated than the solution I ended up choosing.
Add and remove files from the filegroup. This approach can be very useful, but rarely gets mentioned. SQL Server allows you to add and remove files from filegroups at almost any time, and this can be leveraged to move data between different storage systems. What makes it awesome is that the operation is completely online with no interruptions to the user, all types of data (including LOB data) is moved, and it works in any edition of SQL Server. As always, there are trade-offs. Shrinking data files is generally frowned upon as it causes high levels of fragmentation. It’s also very slow, and LOB data makes it even slower. Still, fragmentation can be solved using REBUILD or REORGANIZE, and if speed is not important, it could be the way to go.
The add/remove file method is best for preserving uptime, but due to its slow speed I decided to use a hybrid solution with 4 parts:
1. All tables & indexes that can be rebuilt online on a new filegroup are moved to a temporary filegroup on the new storage.
2. The remaining tables & indexes are moved via adding & removing files to the original filegroup.
3. The contents of the temporary filegroup are moved back into the original filegroup via online rebuilds.
4. The transaction log and primary filegroup (which cannot be moved by the add/remove method) are moved during a scheduled downtime.
The image below shows our starting state: the new SAN is empty, while the old SAN has a (non-primary) filegroup [FG1] with files A, B, and C in it, each 1GB in size. Most systems will contain more filegroups, but for simplicity I only show one here.
The first step was online index rebuilds. For each filegroup on the old SAN I created a temporary one on the new SAN. For example, in this case of having [FG1] on the old storage, I’d create filegroup [FG1_Temp] on the new storage. Then I did online rebuilds onto the temp filegroup for each table and index that would allow it. This can be an arduous task for filegroups with large numbers of objects, so I created a powershell script to automate this which I’m happy to share. Creating the new filegroup and files can be done via the following T-SQL:
The powershell script will move objects online from one filegroup to another. Objects which cannot be moved online will be ignored. When you open the script file you will see variables that need to be filled in for the server name, database name, and source and destination filegroups, and a folder to log to. Optionally you can filter by schema name, object name, and size.
I am happy to share this with the community, but please be advised of the following: This script needs to run in sqlps, the SQL Server powershell extension. It was written for and tested on SQL Server 2008 R2. I have not tested it on SQL Server 2012. I do not consider this to be perfect or in any sort of a finished state. And as always it is your responsibility to read and understand this code before running it on your own systems. Always test in a development environment, no warranty is expressed or implied, etc. etc., use at your own risk!
After the online filegroup migration, the storage layout will appear as follows, with a temporary filegroup on the New SAN:
Once the migration script has completed, the old storage will contain the original filegroup with all the objects which could not be moved online. At this point all those objects still need to be moved onto the new storage, and the best (and perhaps the only) way to do this in an online fashion is by adding and removing files from the filegroup.
Here’s how it works: SQL Server allows the addition and removal of files from filegroups at almost any time via the ALTER DATABASE command. (You can’t add or remove files while a backup is running.) A filegroup can be extended onto new storage by adding files to it which are located on the new storage. The script below shows how to do this by adding files D, E, and F, located on SAN 2.
After executing that, filegroup [FG1] is now spread across the old and new storage with 3 files on each SAN, as well as the 3 files containing the temporary filegroup [FG1_temp] on the new SAN:
At this point, files D, E, and F contain very little data, if any at all. Merely creating a file doesn’t mean it will be populated with existing data, though SQL Server will start to make use of it for operations that allocate space on disk from here on out. The objective now is to remove files A, B, and C, so the filegroup has no presence on the Old SAN. This is done using the DBCC SHRINKFILE command with the EMPTYFILE argument, which will remove all of the data from the file by distributing it across the other files in the filegroup. Depending on the size of the file and the amount of data it contains, this operation can take a long time. Fortunately if you have to stop it, the work that has already been performed will not be lost. Once a file has been emptied in this manner it can be removed by using ALTER DATABASE. The code to accomplish this is as follows:
DBCC SHRINKFILE (A, EMPTYFILE);
ALTER DATABASE DB_NAME REMOVE FILE A;
DBCC SHRINKFILE (B, EMPTYFILE);
ALTER DATABASE DB_NAME REMOVE FILE B;
DBCC SHRINKFILE (C, EMPTYFILE);
ALTER DATABASE DB_NAME REMOVE FILE C;
When this is complete, filegroup [FG1] will only exist on the new SAN, however there is now 2 filegroups and we started with 1. Filegroup [FG1_Temp] can be removed once all objects in it have been moved back into filegroup [FG1], again using the online migration script. I realize this method involves rebuilding some objects more than once, however in my testing it was still faster than moving everything by adding/removing files from the filegroup. I would imagine your mileage will vary depending on your environment and the number of tables containing LOB data, but this is what worked best for me. On my system about 80% of objects could be moved online, leaving only 20% that needed to be moved via the slower method.
Once the online rebuild script completes, things will be nearly complete. All data will be in its original filegroup on the new SAN. Filegroup [FG1_Temp] and its files can now be removed via the following script:
ALTER DATABASE DB_Name REMOVE FILE X;
ALTER DATABASE DB_Name REMOVE FILE Y;
ALTER DATABASE DB_Name REMOVE FILE Z;
ALTER DATABASE DB_Name REMOVE FILEGROUP FG1_Temp;
Once this is repeated for each filegroup in the database, the job will be almost complete. The primary filegroup and transaction log cannot be moved using either method just discussed, so a downtime will be required to move them to the new storage. I was able to make use of existing maintenance windows to detach the database, copy the files to the new SAN,and reattach it. Everything came back up without incident.
At this point the only issue left to tackle is fragmentation. The data that was moved via online rebuilds is not affected, but tables and indexes moved via the add/remove files method will be very highly fragmented. The quickest way to resolve high levels of fragmentation is to rebuild an index, but these are the indexes which cannot be rebuilt online! If an offline rebuild is not possible (and since uptime was very important in my case it wasn’t possible) the next best method is to use the REORGANIZE argument of the ALTER INDEX statement. REORGANIZE is typically slower than REBUILD, but it will allow the index to remain online for users while it runs.
I don’t consider this method to be perfect by any means, but it worked rather well for me given the uptime requirements I had to meet. Thank you very much Jen for the excellent topic, and thank you to all of you who kept reading to the end! I hope this proves helpful!
I’m fortunate to be able to work with some rather large SQL Server instances on a daily basis, including databases over 25TB in size. Due to their bulk, “VLDBs” (Very Large Databases) present some challenges for DBAs: following best practices for small transactional systems might not be feasible on a VLDB. Regular maintenance processes such as backups and integrity checks take longer when there’s more data to scan. You also really need to think twice before doing something like adding an index to a 2TB table! I’ve learned a ton about working with VLDBs in the time I’ve been at my current job, and I hope to share a lot of that knowledge here in the coming months.
Let’s say you have a database that receives daily imports from your company’s accounting system so users can query it and create reports. This setup becomes popular, and management asks you to add weekly loads from the payroll software. You begin importing payroll into the same database and all is good until you get a request to add hourly updates from the inventory application. These requests continue, and after a few additional systems you now have a really large database on your hands. VLDBs are great when they have a purpose and fulfill a need. If a database is huge only because more and more unrelated things kept being added to it, however, that purpose starts going the way of the dodo.
Bigger than necessary
So we now have a database that contains data from multiple systems that are being updated at different frequencies. This is where things have the potential to get hairy. Records in the database’s copy of the accounting system, which is loaded daily, may not have corresponding entries in its copy of the payroll system, which is only loaded weekly. Even if all the systems were loaded into the database at the exact same rate there is no guarantee that all the corresponding records would be available at the same time across systems. Backups are also starting to take a considerable amount of time due to the volume of data they contain. At this point it’s probably worth taking a step back and looking at how the system is designed. A single database may be meeting our needs, but when manageability starts to become difficult we might be better served by making some changes.
SQL Server implements many features at the database level, however the core functionalities of a database in my opinion are logical consistency and recovery. Logical consistency (the “C” in ACID) guarantees that the database will always be in a consistent state. This state is enforced through transactions made possible by the database’s transaction log. In our scenario the database is doing all this work to ensure everything is always in a consistent state, however there’s no consistency between the different systems being stored within the database.
Backups, restores and recovery also take place in terms of the entire database. You can recover to any point in time (assuming the database is using the full recovery model), but you’re recovering the entire database to that point in time. Restoring our example database may bring us to a point in time where the payroll system was fully loaded, however the accounting system’s load was only halfway complete. There’s no way to recover only part of the database further ahead in time than the rest.
In this situation since the separate systems within the database will never be consistent with each other I would lean towards splitting this database up into multiple smaller databases, one for each system. There are many pros and cons to this – here are some of the larger points:
Individual databases are smaller and more manageable than a single large database. Maintenance tasks such as integrity checks will take less time due to the smaller size.
Backups and restores will be smaller and will take less time.
Databases can be recovered individually. Should one database need to be taken offline the others will be unaffected. All downtimes are bad, but a partial downtime is often better than having everything unavailable!
The systems won’t have to share a transaction log when in separate databases. This can result in increased performance when the transaction logs are placed on separate disks.
You now have more databases to maintain.
It’s more difficult (but not impossible) to figure out the state of all the different systems at a particular point in time when they’re in separate databases.
Users will most likely be required to connect to multiple databases.
Setting up security will involve a bit more work, especially in the likely situation of having to accomodate joins across databases.
In many of the cases I’ve seen, splitting things up into multiple databases has been worth the trouble. VLDBs that were getting too large for their own good were transformed into several smaller databases that were much easier to manage. Having to connect to multiple databases proves to be trivial for most users, and things can be further simplified by creating views where appropriate.
VLDBs aren’t a bad thing. As time progresses it will only become more common to have a lot of closely-related data that warrants a very large database. But when a database is huge for no good reason it may be better off being split up. This is especially true in cases where the database contains data from multiple disparate sources. When faced with bringing a new dataset into your environment it’s an excellent time to think about whether this data belongs in its own database or not. While it’s easy to keep adding to a giant “one stop shop” database, bigger isn’t always going to be better.
Where can you find deep dish pizza, questionable politics, amazing architecture and a great SQL Saturday all in one stop? Sweet Home Chicago! The organizing committee in the windy city has been hard at work, and SQL Saturday #211: Chicago 2013 is open for registration.
This year the event will be taking place on Saturday April 13, 2013. DeVry University’s Addison Campus is once again graciously allowing us to use their awesome facility. We’re all looking forward to a great day of free SQL Server training, networking, and #SQLFamily in Chicagoland. Please join us!
As in past years, the event itself is free, but there is a $10.00 charge to cover the cost of lunch. This year’s meal offering has yet to be determined, but Portillo’s was a hit last year and leaves us with big shoes to fill!
The call for speakers is open!Submit an abstract or two! Have you always wanted to present at a SQL Saturday but have been waiting for a sign to do so? Consider this your sign.
B.A. Lovers Unite!
If your first thought was B. A. Baracus of the A-Team, you’re not alone. I’m talking about a different B.A., but I’m sure Mr. T. pities the fool that doesn’t come to his hometown of Chicago a few days early and attend the first-ever PASS Business Analytics Conference! It’s taking place April 10-12 2013, immediately before our SQL Saturday.
SQL in the land of beer and cheese
The weekend before Chicago’s SQL Saturday our neighbors to the North will be hosting one of their own. Consider heading to Madison, WI on Saturday April 6 for SQL Saturday #206. I had a blast there last year and am looking forward to it again!
As you can see, we have plenty going on in the upper midwest April 6-13. Please consider joining us for one or more of these great opportunities to get your learn on. I hope to see you there!