Bob Pusateri is a Microsoft Certified Master of SQL Server, speaker, blogger, volunteer, and author who loves finding new and exciting ways to break and fix things. He works as a consultant and architect, specializing in data, virtualization, and cloud technologies.
I’m very happy to be presenting at next month’s meeting of the Chicago SQL Server User Group! This meeting will take place on Thursday August 9 at the Microsoft Technology Center in Chicago. For a little something different, I won’t be speaking about SQL Server. Instead, I will be giving an introduction to Azure Cosmos DB from a SQL Server DBA perspective.
I will be presenting “Select Stars: A SQL DBA’s Introduction to Azure Cosmos DB“.
Azure Cosmos DB has quickly become a buzzword in database circles over the past year, but what exactly is it, and why does it matter? This session will cover the basics of Azure Cosmos DB, how it works, and what it can do for your organization. You will learn how it differs from SQL Server and Azure SQL Database, what its strengths are, and how to leverage them.
We will also discuss Azure Cosmos DB’s partitioning, distribution, and consistency methods to gain an understanding of how they contribute to its unprecedented scalability. Finally we will demonstrate how to provision, connect to, and query Azure Cosmos DB. If you’re wondering what Azure Cosmos DB is and why you should care, attend this session and learn why Azure Cosmos DB is an out-of-this-world tool you’ll want in your data toolbox!
I’m really looking forward to giving this talk, as I fully believe Azure Cosmos DB is a technology that will only become more vital with time. If you’d like to attend and get a glimpse of what Azure Cosmos DB is all about, registration for this meeting is open! I hope to see you there!
As a former DBA, it should be no surprise that I am a big fan of keeping data safe, and not just corporate data – this extends to my personal data as well. Early on I realized that the cloud was a great way to protect all my photos, videos, and other digital keepsakes that I absolutely could not stand to lose. As cloud offerings matured, products specific to archival were introduced that allowed for long-term storage at very low price points. The tradeoff for this lower storage cost is that should you need to retrieve the data, it is not immediately available and you may have to wait a few hours. The ideal use case for these products, however, is that the data never needs to be retrieved at all, it is simply an additional copy being stored for safekeeping.
Two of the products I use extensively for this purpose are Amazon Glacier and, more recently, Microsoft Azure Blob Storage Archive Tier. As happy as I’ve been with Amazon Glacier since its introduction in 2012, I always hoped Microsoft would offer a similar service. My wish came true in Fall of 2017 when an archive tier of Azure Blob Storage was announced. Rather than branding this capability as a new product, Microsoft decided to present it as a new tier of Azure Blob Storage, alongside the existing hot and cool storage tiers.
A noticeable difference from the hot and cool storage tiers is that the archive storage tier is only available on a per-blob basis. While a storage account can be configured to have all blobs placed in either the hot or cool tier by default once they are uploaded, the archive tier is not an option. Once a blob is uploaded, it must explicitly be moved into the archive tier. If one is using the Azure Portal to do this, there’s several clicks involved per blob. The free Azure Storage Explorer client is no better. While I found several third party tools that can upload files to the archive tier, none were free. At this point, I decided to write my own method using Powershell, which I am happy to share below.
If you’re already an Azure pro, feel free to skip ahead. But if you’re new to Azure or don’t already have a storage account, follow along with me to get one set up.
Once you’re into Azure, you’ll need a storage account. If you don’t already have one, these are simple to create. In the Azure Portal, select “Storage accounts”
On the storage account screen, click “Add”
Next, enter the details for your storage account.
You’ll need to give it a name which is globally unique, so it may take you a few tries. For all new accounts Microsoft recommends the “Resource manager” model. The account kind is “Blob storage”. Choose whichever location you like, but realize that one which is closer to you will probably be faster. Since I’m doing archiving, I opt for standard performance, and set the access tier to Cool by default. I like security, so I require secure transfers. Select your subscription or create one if you need to, and do the same for your resource group. Then click “Create”.
Once created, your storage account will appear in the list. Now we need to create a container within that account. Click on the storage account, and then choose “Containers” from the menu on the left. Then click the “+” at the top to add a new container
(click to enlarge)
Give your container a name, and select an access level. Since I share my archives with no one, I make my containers Private. Click OK.
Your container will now appear in the list. Clicking on the container will show you the container is empty, and you will see that you can upload files to it right through the portal interface.
(click to enlarge)
Finally, for the Powershell script to work, you will need a key to access the storage account. Go back to the storage account menu and select “Access keys”. You will see two access keys provided for your account. Copy one of them for pasting into the script below.
(click to enlarge)
Once you have a storage account setup, PowerShell makes the uploading part simple. This script requires the Azure PowerShell module, and to install that you can find instructions here.
Don’t forget to copy in the name of your storage account (in the case of this demo I would paste “teststore700”) and the access key!
I always feel honored when chosen to present at a SQL Saturday event. Being selected is never a guarantee, especially these days when organizers seemingly have more speakers and abstracts to pick from than ever before. But I am just over-the-moon happy to share that I was picked to speak at SQL Saturday in Oslo, Norway coming up on September 1. Norway has been on my list of places to visit for years, and I really can’t wait. Thank you so much to the SQL Saturday Oslo organizing team for putting together an amazing schedule and for allowing me to be a part of it – this is going to be a fantastic event!
My presentation at this event is entitled “Select Stars: A SQL DBA’s Introduction to Azure Cosmos DB”. I’ve been working with Azure Cosmos DB for a while now, and it’s really an incredible product. It’s been generating lots of buzz as of late, but there are still plenty of DBAs out there who have yet to use it. This is an introductory-level session that focuses on explaining what Azure Cosmos DB is, how it works, what its strengths are, and how they can be leveraged. To anyone curious about Azure Cosmos DB, this is the session for you!
Registration for SQL Saturday Oslo is now open – register today before it fills up! If you would like to extend your SQL Saturday experience even further, they are also offering four pre-conference sessions on Friday August 31.
I am incredibly stoked to visit Norway and I hope I’ll see you there in just three short months!
If you’ve been a SQL Saturday organizer for a few years, you might remember that prior to March of 2014, PASS handled much of the finances. All money collected from sponsors and meal fees went into a trust account managed by PASS which was earmarked for that specific event. Organizers then requested their funds from PASS.
I completely understand why PASS would want to get out of doing this. PASS handles a lot of things, and to be honest, playing the role of banker for events organized and managed by volunteers doesn’t need to be a part of that.
While I believe this was the right thing to do, it did create some additional headaches for organizers, particularly in terms of taxes depending on local laws. PASS is a not-for-profit organization and could collect this money without any tax liability. With PASS removed from the equation, event organizers now had to establish a PayPal account in their own name to collect funds for SQL Saturday, and they were also personally liable for any taxes on it.
The user group leadership in Chicago didn’t really like that idea, so we decided to incorporate as a not-for-profit ourselves, creating the Chicago SQL Association. Over the years, I’ve been contacted by several other user groups asking how we did this. I’m happy to help, and thought I would share here as well.
Choose A Mission and Scope
Any good project needs a mission and a scope, and the same goes for an organization. We decided our scope would be to support SQL Saturday Chicago, as well as the Chicago SQL Server User Group and the Chicago Suburban SQL Server User Group. Our mission, simply put, is education. We offer free knowledge about the Microsoft data platform to the public through user group meetings and our SQL Saturday event.
Seek Professional Assistance
Creating a corporation isn’t all that difficult. I have no doubt that we could have fumbled our way through it and ended up with a not-for-profit corporation on our own. What I was not so sure of was our ability to get this right the first time without making mistakes that might penalize us in the future. We looked into hiring an attorney to help us navigate this process, but it was going to end up costing thousands of dollars that the organization didn’t have.
One day it dawned on me to look into legal clinics at local law schools. These clinics, often free or available at little cost, offer law students the ability to gain practical experience while helping those who otherwise could not afford legal services. I reached out to the Bluhm Legal Clinic at Northwestern University Pritzker School of Law, and they got back to me very quickly. We were assigned three law students and a professor (also a lawyer) who helped us through the process from beginning to end.
After meeting with our legal team and discussing our goals, mission, and scope, they came to the conclusion that we should seek 501(c)(3) not-for-profit status. They drew up all the paperwork, and all we had to do was sign on the dotted lines and pay the required government filing fees. Within a matter of weeks we had a legitimate not-for-profit corporation, with zero sleep lost worrying that we made any mistakes.
Now for the real work. As I said, creating a corporation is pretty easy. Maintaining it and keeping things legal is another matter entirely. It goes without saying, but, depending on your locale, you’ll probably have multiple required filings to keep your not-for-profit in compliance. Speaking for ourselves, each year we must file simplified taxes with the US Internal Revenue Service, register with the Illinois Secretary of State, and report our finances to the Illinois Attorney General.
We had done a pretty good job of bookkeeping before we incorporated, but now that we were official, it was required by law. We originally kept track of all our finances via shared spreadsheets. This worked decently well and cost us nothing, but come tax time reporting was a nightmare. Of course, there was plenty of commercial software out there for managing business finances, but none of it was cheap. This is where finding organizations that help not-for-profits is invaluable.
I highly recommend applying to TechSoup. TechSoup offers a variety of discounted (and free) technology products to not-for-profits. One of the many things they offer is a significant discount on QuickBooks small business accounting software. It makes the financial part of running a business incredibly simple, and it honestly saved us about 20 hours of work when it came time to do taxes and financial reporting this year.
You’ll also want to register with PayPal as a not-for-profit to get a discounted rate. This means that more of the money collected from sponsorships and meal fees will go into your bank account. Every little bit helps!
A Web Presence
You may want to look into setting up a domain, website, and/or email for your organization. For us, this was probably the most difficult part of the process. I was unable to find any way of obtaining a discounted or free web domain, but that wasn’t a huge problem as domains are rather inexpensive. Hosting a website and email, however, is another story.
We applied to Microsoft’s nonprofit program with the thought that we would meet their criteria in that we are offering free educational opportunities to the public, and furthermore, those opportunities center around Microsoft products. Sounds like a shoo-in, right? Yeah, not so much. Microsoft did not agree and hence we were rejected.
We then tried Google’s nonprofit program, which accepted us. We’re grateful they did, as we can now host association-related email at our own domain, and are working on a website. The irony is not lost on me that Google is helping us educate people about the Microsoft data platform while Microsoft had no interest in doing so.
Your organization will need some sort of bank account. Be sure to shop around at multiple banks to see what they offer not-for-profits, and what their fees are. If you have a local credit union, don’t forget to check them out as well. We ended up going with a credit union and pay no fees at all for our banking.
Get liability insurance. If your organization is running a SQL Saturday, chances are your venue may already require insurance, but even if they don’t, a policy is never a bad idea to have. We found a company that specializes in insuring activities like user groups, and it costs much less than we thought it would.
I’m very grateful we were able to incorporate the Chicago SQL Association as a not-for-profit. Not only have we seen financial benefits, but we have also become more organized as a result of having a corporation to maintain. PASS no longer handling the financial end of SQL Saturday events may have been a short-term annoyance, but it was just the push we needed to get this ball rolling and I’m very glad we did!
I’ve found myself working with PowerShell more frequently as of late, in no small part due to the amazing dbatools module. This has led to me trying to figure out how else I can utilize it for some of my personal internal processes.
I like to use public datasets for experimentation and presentation demos, especially data that people can easily understand and relate to. For some, keeping them up-to-date was a manual process of downloading files, loading tables, and merging. There are of course many better ways to do this, some of which are more automated than others. I could have simply used PowerShell to call bcp, or even just implemented an insert statement and some loops. Then I found dbatools, which has commands which enable me to do an even better job with far less work – just the way I like it!. Here’s how I now keep my datasets current:
Getting The Data
I’ll be using data from the City of Chicago’s Data Portal. They have a tremendous online resource with lots of public datasets available. One that I really like is their listing of towed vehicles. Any time the city tows or impounds a vehicle, a record gets added here and remains for 90 days. It’s very manageable, with only 10 columns and a few thousand rows. (As an added bonus, you can search for license plates you know and then ask your friends about their experience at the impound lot!)
Chicago’s data portal uses Socrata, which is a very well-documented and easy-to-use tool for exposing data. It has a wonderful API for querying and accessing data, but to keep things simple for this post we’re just going to download a CSV file.
If you’re on the page for a dataset, you can download it by clicking on “Export” on the top right and then selecting “CSV”. To avoid all that, the direct link to download a CSV of this dataset is here. Download it and take a look at what we’ve got using your spreadsheet or text editor of choice (mine is Notepad++).
Loading The Data
We’ve got our data, now let’s load it. I like to load the entire downloaded dataset into a stage table, and then copy new rows I haven’t previously seen into my production table that I query from. Here’s the script to create these tables:
-- CREATE STAGE TABLE
-- CREATE FINAL TABLE
Now for the magic – let’s load some data! The dbatools command that does all the heavy lifting here is called Import-DbaCsvToSql. It loads CSV files into a SQL Server table quickly and easily. As an added bonus, the entire import is within a transaction, so if an error occurs everything gets rolled back. I like to specify my tables and datatypes ahead of time, but if you want to load into a table that doesn’t exist yet, this script will create a table and do its best to guess the appropriate datatype. To use, simply point it at a CSV file and a SQL Server instance, database, and (optionally) a table. It will take care of the rest.
The two parameters on the second line tell the command to truncate the table before loading, and that the first line of the CSV file contains column names.
Now the data has been staged, but since this dataset contains all cars towed over the past 90 days, chances are very good that I already have some of these tows in my production table from a previous download. A simple query to insert all rows from staging into production that aren’t already there will do the trick. This query is run using another dbatools command, Invoke-Sqlcmd2.
The ID column uniquely identifies each tow event, and the production table uses it as a primary key, however I have found that occasionally the dataset will contain duplicated rows. The ROW_NUMBER() window function addresses this issue and ensures each ID is attempted to be inserted only once.
Putting it all together
I’ve showed you how simple dbatools makes it to load a CSV file into a table and then run a query to load from staging into production, but the beauty of PowerShell is that it’s easy to do way more than that. I actually scripted this entire process, including downloading the data! You can download the full PowerShell script, along with a T-SQL Script for creating the tables, from my GitHub here.