Jul 122018
 

As a former DBA, it should be no surprise that I am a big fan of keeping data safe, and not just corporate data – this extends to my personal data as well. Early on I realized that the cloud was a great way to protect all my photos, videos, and other digital keepsakes that I absolutely could not stand to lose. As cloud offerings matured, products specific to archival were introduced that allowed for long-term storage at very low price points. The tradeoff for this lower storage cost is that should you need to retrieve the data, it is not immediately available and you may have to wait a few hours.┬áThe ideal use case for these products, however, is that the data never needs to be retrieved at all, it is simply an additional copy being stored for safekeeping.

Two of the products I use extensively for this purpose are Amazon Glacier and, more recently, Microsoft Azure Blob Storage Archive Tier. As happy as I’ve been with Amazon Glacier since its introduction in 2012, I always hoped Microsoft would offer a similar service. My wish came true in Fall of 2017 when an archive tier of Azure Blob Storage was announced. Rather than branding this capability as a new product, Microsoft decided to present it as a new tier of Azure Blob Storage, alongside the existing hot and cool storage tiers.

A noticeable difference from the hot and cool storage tiers is that the archive storage tier is only available on a per-blob basis. While a storage account can be configured to have all blobs placed in either the hot or cool tier by default once they are uploaded, the archive tier is not an option. Once a blob is uploaded, it must explicitly be moved into the archive tier. If one is using the Azure Portal to do this, there’s several clicks involved per blob. The free Azure Storage Explorer client is no better. While I found several third party tools that can upload files to the archive tier, none were free. At this point, I decided to write my own method using Powershell, which I am happy to share below.

If you’re already an Azure pro, feel free to skip ahead. But if you’re new to Azure or don’t already have a storage account, follow along with me to get one set up.

Creating a Storage Account

First, log in to the Azure Portal. If you don’t already have an Azure account, you’ll need to create one.

Once you’re into Azure, you’ll need a storage account. If you don’t already have one, these are simple to create. In the Azure Portal, select “Storage accounts”

 

On the storage account screen, click “Add”

 

Next, enter the details for your storage account.

You’ll need to give it a name which is globally unique, so it may take you a few tries. For all new accounts Microsoft recommends the “Resource manager” model. The account kind is “Blob storage”. Choose whichever location you like, but realize that one which is closer to you will probably be faster. Since I’m doing archiving, I opt for standard performance, and set the access tier to Cool by default. I like security, so I require secure transfers. Select your subscription or create one if you need to, and do the same for your resource group. Then click “Create”.

 

Once created, your storage account will appear in the list. Now we need to create a container within that account. Click on the storage account, and then choose “Containers” from the menu on the left. Then click the “+” at the top to add a new container

(click to enlarge)

 

Give your container a name, and select an access level. Since I share my archives with no one, I make my containers Private. Click OK.

 

Your container will now appear in the list. Clicking on the container will show you the container is empty, and you will see that you can upload files to it right through the portal interface.

(click to enlarge)

 

Finally, for the Powershell script to work, you will need a key to access the storage account. Go back to the storage account menu and select “Access keys”. You will see two access keys provided for your account. Copy one of them for pasting into the script below.

(click to enlarge)

The Script

Once you have a storage account setup, PowerShell makes the uploading part simple. This script requires the Azure PowerShell module, and to install that you can find instructions here.

Don’t forget to copy in the name of your storage account (in the case of this demo I would paste “teststore700”) and the access key!

You can also find this script on my GitHub.

This script has just a few parameters, which are as follows:

Parameter Description
File Name of the file or files to be uploaded. These can also be piped in.
ContainerName Name of the container to upload to
DestinationFolder Name of the (virtual) destination folder inside the container. I like everything to be in a folder, so I kept that in mind when scripting this out.
ConcurrentTasks [optional] How many threads should upload the file. This defaults to 1. If you have lots of bandwidth feel free to increase it. Maximum value of 20.
BlobTier [optional] Which tier of blob storage this file should be set to. This value defaults to “Cool”.
Possible Values: {“Hot”, “Cool”, “Archive”}

 

And finally, some use examples:

 

Happy archiving!

Apr 232018
 

I’ve found myself working with PowerShell more frequently as of late, in no small part due to the amazing dbatools module. This has led to me trying to figure out how else I can utilize it for some of my personal internal processes.

I like to use public datasets for experimentation and presentation demos, especially data that people can easily understand and relate to. For some, keeping them up-to-date was a manual process of downloading files, loading tables, and merging. There are of course many better ways to do this, some of which are more automated than others. I could have simply used PowerShell to call bcp, or even just implemented an insert statement and some loops. Then I found dbatools, which has commands which enable me to do an even better job with far less work – just the way I like it!. Here’s how I now keep my datasets current:

Getting The Data

I’ll be using data from the City of Chicago’s Data Portal. They have a tremendous online resource with lots of public datasets available. One that I really like is their listing of towed vehicles. Any time the city tows or impounds a vehicle, a record gets added here and remains for 90 days. It’s very manageable, with only 10 columns and a few thousand rows. (As an added bonus, you can search for license plates you know and then ask your friends about their experience at the impound lot!)

Chicago’s data portal uses Socrata, which is a very well-documented and easy-to-use tool for exposing data. It has a wonderful API for querying and accessing data, but to keep things simple for this post we’re just going to download a CSV file.

If you’re on the page for a dataset, you can download it by clicking on “Export” on the top right and then selecting “CSV”. To avoid all that, the direct link to download a CSV of this dataset is here. Download it and take a look at what we’ve got using your spreadsheet or text editor of choice (mine is Notepad++).

Loading The Data

We’ve got our data, now let’s load it. I like to load the entire downloaded dataset into a stage table, and then copy new rows I haven’t previously seen into my production table that I query from. Here’s the script to create these tables:

Now for the magic – let’s load some data! The dbatools command that does all the heavy lifting here is called Import-DbaCsvToSql. It loads CSV files into a SQL Server table quickly and easily. As an added bonus, the entire import is within a transaction, so if an error occurs everything gets rolled back. I like to specify my tables and datatypes ahead of time, but if you want to load into a table that doesn’t exist yet, this script will create a table and do its best to guess the appropriate datatype. To use, simply point it at a CSV file and a SQL Server instance, database, and (optionally) a table. It will take care of the rest.

The two parameters on the second line tell the command to truncate the table before loading, and that the first line of the CSV file contains column names.

Now the data has been staged, but since this dataset contains all cars towed over the past 90 days, chances are very good that I already have some of these tows in my production table from a previous download. A simple query to insert all rows from staging into production that aren’t already there will do the trick. This query is run using another dbatools command, Invoke-Sqlcmd2.

The ID column uniquely identifies each tow event, and the production table uses it as a primary key, however I have found that occasionally the dataset will contain duplicated rows. The ROW_NUMBER() window function addresses this issue and ensures each ID is attempted to be inserted only once.

Putting it all together

I’ve showed you how simple dbatools makes it to load a CSV file into a table and then run a query to load from staging into production, but the beauty of PowerShell is that it’s easy to do way more than that. I actually scripted this entire process, including downloading the data! You can download the full PowerShell script, along with a T-SQL Script for creating the tables, from my GitHub here.

Happy Data Loading!

Feb 082011
 

T-SQL TuesdayThis Month’s T-SQL Tuesday is being hosted by Pat Wright (blog | twitter) and Pat is asking about tips and tricks for making life easier through automation using either T-SQL or PowerShell.

Recently I have been doing more and more work in PowerShell in order to get familiar with it. So far I’ve been quite happy. Batch scripting always seemed rather clumsy to me, and PowerShell represents a much-needed improvement, especially coming from a .Net programming background like I do.

One of the processes I’ve found particularly helpful to automate in PowerShell is the testing of database backups. A backup is useless if you can’t restore from it, so it’s an excellent idea to make sure you can successfully restore from your backups on a regular basis. Larger databases can take quite a while to restore, and doing it manually can get to be a real pain. Automating the process means it can happen at any time without human intervention, and all I need to do is check my email to determine if the restore succeeded or failed.

How It Works

While the code behind all this (which I’ll get to in a minute) may appear complex, what’s actually happening is pretty simple. The process is built with a few assumptions:
– You are testing a full backup
– Backups are located on a network share that’s easily accessible

The first step is to locate the appropriate backup on the network share and copy it to a temporary folder on the local machine. From there, the database is restored to the local machine with the name specified in the config file. The locations of where to restore the data & transaction log files can also be configured. Finally a DBCC CHECKDB is optionally run once the database is restored, and an email is sent notifying the operator(s) of the outcome and including a log of how the job ran.

The Files

This automated process has two parts: the PowerShell script and an XML config file. I’m a big fan of config files, as they allow me to use the same script on many different systems with different settings. I find this to be much better than the alternative of having lots of slightly different versions of the same script floating around.

The configuration file contains one or more “jobs”, each of which is geared toward restoring a specific database. With all of the configuration settings for a specific database packaged in a job, the number of arguments needed to execute the script is dramatically reduced. There is also a list of email addresses to be alerted upon the success/failure of the script. The configuration file I have included for download is heavily commented to explain the meaning of each setting. I always take the time to write XSDs for my configuration files in hopes of minimizing confusion about their structure – this is included as well.

The PowerShell script itself takes 2 arguments: the name of the job, and the location of the configuration file containing that job. From there, the script reads the configuration file and carries out all the appropriate actions. It requires two modifications before running on your system – the location of the SMTP server that will be sending out notification emails, and an email address it will list as being the sender of said emails.

As with all the scripts you’ll find on my site, these files are provided as-is and with no warranty. It is highly recommended that you read and understand what they are doing before using them at your own risk. Never deploy them to a production system without extensive testing in a development environment.

You can download all the files here. I’ve also posted the PowerShell script below for your viewing pleasure.

Good luck and happy restores!