Feb 262015

Whether you are working in T-SQL, Oracle, MySQL, C#, or Java, the range of possible values for a signed (positive or negative) 32-bit integer is from -2^{31} or (-2,147,483,648) to (2^{31})-1 or (2,147,483,647). The fact that it’s so consistent across so many different platforms (and also against plenty of others I didn’t list) means there has to be more to it than just the preference of some developers somewhere, right? Exactly right.

Down to the bits

Any data your computer deals with, be it numbers, text, music or videos, all end up in binary at one point or another. Binary means values are in “base 2”, where each digit represents a power of 2, and the possible values for that digit are 0 or 1. A digit capable of storing only the values 0 and 1 is typically referred to as a bit, which is short for “binary digit”. This is in stark contrast to the decimal or “base 10” number system commonly used, where each digit represents a power of 10 and the possible values for that digit are 0 to 9. To show how values are calculated in different number systems, here is the value 37 in both binary and decimal:

(click to enlarge)

I realize that’s an extremely brief example, but if you want to learn more about binary numbers, there are tons of great resources online.

But what about negative values? Negative numbers can be represented a bunch of different ways. Sometimes it’s done with the minus sign


or with parentheses


but those are all formats, or different ways of expressing a value. A format can be changed without altering the value itself.

How do you really represent that a value is negative? Since binary consists only of bits storing the values 0 and 1, what if the first bit of a binary number served not as a value, but to show whether it was positive or negative? A “1” in the left-most bit would effectively mean the value is negative, and a “0” would mean it is positive. Losing a bit to storing positive or negative would effectively cut the number of values you could store in half (since you are losing a power of two), but it would be awfully handy. There’s a little bit more to this method than that, but it’s known as…

Two’s Complement

Two’s Complement is a great way to express signed numbers in binary, for reasons we will see shortly.

Expressing a negative value via two’s complement is a rather simple process. Start with the binary expression of the positive value you want to negate. Next, flip all the bits so all the 0’s become 1’s, and all the 1’s become 0’s. If we were to stop here, this would be the one’s complement of the value. To get the two’s complement, we add 1 to the one’s complement. Here’s an example for the number 19:

010011  (19, unsigned, with leading 0 added)
101100  (bits flipped)
101101  (add 1. This is -19 in two's complement)

We now have the two’s complement of 19, which has the value -19.

Why did we have to add one? Let’s say we’re in a five-bit environment and want to find the two’s complement of zero (which would be negative zero). The number zero is represented as 00000 in this case. If we flip each bit, we now have 11111, and by adding one to that we now need a sixth bit so we can arrive at 100000. Since we’re still working in a five-bit environment, we ignore the leftmost bit, bringing us back to 00000, which represents both zero and negative zero. Having a single value for positive and negative zero is a key advantage that two’s complement has over ones’ complement.

An even larger advantage of two’s complement is that addition, subtraction, and multiplication work exactly the same as if all values were positive. You can even use the carry method to do this (like you were probably taught in school, but your kids most likely won’t learn thanks to common core in the US.) As a demonstration, let’s add 24 to the -19 value we just computed. The result should be 5.

(click to enlarge)

And indeed it is! If you’re paying close attention you will notice that I truncated the result to 6 bits and ignored the left-most carry. Restricting the result to the same number of significant bits you are adding is a requirement to arrive at a correct answer.

This is really neat, but it still doesn’t answer our original question about the possible range of values. Let’s look at the values you can create with 3 bits.

Three bits allow 8 or 2^3 values to be stored, and in the case of unsigned numbers, those values are from 0 to 7.  The two’s complement values of those same bit arrangements gives a range of -4 to 3.

Thinking of it in powers of 2, three bits in two’s complement allows us to store values from -(2^2) to (2^2)-1. Both the exponents are 2 instead of 3 because a bit is being used to determine whether the value is positive or negative. If we showed 2 as one less than three, the value range would look like:

-(2^{(3-1)}) to 2^{(3-1)}-1

So for n bits, two’s complement lets you express values ranging from -(2^{(n-1)}) to 2^{(n-1)}-1. This is exactly why for a 32-bit integer, the range is -(2^{31}) to 2^{31}-1.

And there you have it. Two’s complement is the reason why basically any signed data types have the range that they do.


Feb 202013

apple & orangeBeing a DBA and data professional doesn’t mean I always work with SQL Server – sometimes I’m not working with databases at all. We’ve recently acquired some new storage at work (aka Daddy Warbucks bought us a new SAN) and I’ve been charged with moving things to it. Some aspects of this are easier than others and there will be a few more posts coming about that in the future.

For now though, let’s talk about copying files. Lots of files. Server logs, audit logs, things like that. Mostly they were small in size but large in quantity. I ended up with a handful of directories with several thousand files in them that needed to move from SAN A to SAN B. Windows gives us several ways to do this:

  • Copy in Windows Explorer (ewww)
  • The good ol’ DOS Copy command
  • Xcopy, which offers a few more features
  • Robocopy, the most advanced of the MS copy utilities (though I believe it prefers to be called Murphy)

All of these will do a fine job of copying your files, though Robocopy will probably be the fastest due to its multithreading capabilities. But how do you know they all reached their destination intact? Copy and Xcopy offer the option of verification (both using the /v parameter) but sadly Robocopy does not. I’m not sure if verification is just built-in to Robocopy and can’t be disabled, or if it doesn’t exist at all. Either way I didn’t want to risk errors in moving all this data, so I decided to go the extra mile and use another tool to make sure. It didn’t take me long to find the Microsoft File Checksum Integrity Verifier (“FCIV” for short), a nifty little unsupported command-line utility that does exactly what I was looking for.

FCIV In A Nutshell

Basically, FCIV calculates MD5 or SHA-1 hash values for files and outputs them either to the screen or to an XML file. It can also compare files to those checksums saved in XML and tell you if anything differs or is missing. A demo is worth a lot of words, so let’s see it in action!

  • Download Microsoft FCIV and extract the executable wherever you like – for this demo I put it in G:\
  • Download the demo files and extract them. I put mine in G:\demofiles
  • Use FCIV to generate checksums of all files in the folder and save to an XML file with the following syntax:

fciv.exe -add G:\demofiles -wp -sha1 -xml G:\hashdb.xml

-wp means we’re saving only the file names in the XML file, not their full path
-sha1 specifies to calculate a SHA-1 hash on each file. The default is MD5.
-xml means output the checksums to an XML file, in this case the G:\hashdb.xml that follows it.

fciv create screenshot

Let’s open up that XML file and see what it contains:

fciv xml file

As you can see it’s very simple, just the file names and a checksum for each. Now let’s make a few changes.

  • Change the name of the directory the files are in. I changed mine from “demofiles” to “demofiles2”.
  • Delete fileE.txt
  • In fileD.txt, delete the line that says “***DELETE THIS LINE***”

Now let’s use FCIV to verify our files against the checksums we captured in the XML file. Change the current directory to demofiles2 (it won’t work unless you do this) and then run

G:\fciv.exe -v -sha1 -xml G:\hashdb.xml

-v means we’re now in verification mode, so it will verify checksums in the current directory against those in the XML file
-sha1 again specifies we’re using the SHA-1 hash
-xml is the file we’re comparing our calculated checksums against

Here’s the output it produces:

fciv file verify

As you can see, FCIV is telling us that the contents of fileD have changed and fileE is missing. It’s really that easy!

Final Thoughts

I think FCIV is a great utility to keep in your toolbox. Some people may argue that checksum verification isn’t necessary – that Windows does it for you behind the scenes. That may be entirely true, but I wasn’t able to find any concrete documentation proving that it does. Then 10 minutes I spent finding this program online and figuring it out is a very small price to pay for some extra peace of mind in knowing that thousands of files made it to their destination intact.

Others may raise the point that both the MD5 and SHA-1 checksums both suffer from collision vulnerabilities and there are better alternatives out there that this application doesn’t support. They’re totally correct, but it’s also important to remember that we’re using these checksums to detect changes, not for cryptography or protecting secrets. Any form of verification is better than none, and for my purposes FCIV has proven to be very helpful.

Sep 012011

I’m back! I really never left, but due to the time spent putting the final pieces together for my wedding things got a bit stale here. I can’t really say I’m sorry though, as the whole experience created some great blog topics! As you now know if you didn’t already, I got married recently. My fianceé wife and I had a blast planning it all ourselves, and early on we decided that my DBA skills could come in handy as we would be generating a decent amount of data throughout the process. The guest list, invitation information, replies, meal choices, seating charts, gifts received and thank you notes sent could all be stored in a database, and schema designs started popping into my head shortly after we got engaged. As we had a relatively long engagement there was plenty of time to think everything through before the coding began.

Weighing The Options
My plan from the beginning was to utilize my skills and create a database in SQL Server for keeping everything neat, tidy, and in proper relational form. This sounded like a great idea at first, but the more I thought about it the less appealing it became. Designing the database was a non-issue, but designing a front-end was. Since this would be a shared database that both my wife and I would be utilizing, I really wanted a web interface. Despite years of .Net application development experience, I was never all that good at designing front ends. It’s a skill I really wish I had – maybe someday I’ll get good at it.

Another option was just to skip the front end altogether and do everything in SSMS with straight SQL. This would be a big pain, and I have lots of database experience! I knew it would be very difficult for Michelle to use a system in this way, so it was quickly ruled out.

In the end I decided to go with something much simpler, Google Docs. Its sharing functionality was excellent, as we were able to keep pretty much everything stored in the cloud and synchronized between the both of us. Most of our information was stored in spreadsheets (“Excel normal form” counts as a database, right? Right? Bueller?…) We also used the word processor to store notes, such as when either of us would make a phone call so we could keep a log of all activities. The ability for both of us to be editing a document at the same time also came in handy a number of times. Contracts and other correspondence were scanned (if not in electronic form already) and stored as PDFs. We even made good use of Google’s versioning functionality, as several of our scanned documents evolved over time. We still had a gigantic wedding binder with paper copies of most things, but using the cloud made it a lot less necessary. Having everything stored online also made it very easy to do things when away from home – like at work, because many vendors are only open during regular business hours.

But wait, this *is* a SQL Server blog, isn’t it? Don’t worry kids – I’m not going anywhere, but the moral of this story is that even though I really love SQL Server it just wasn’t the right tool for me in this particular case. You’ll be happy to know that I did end up importing a bunch of my spreadsheet data into a database anyway for some reporting and BI purposes, which I’ll cover in another post.

A Better Way To Reply
Aside from the method in which all this wedding data was being stored, another important factor is how it was retrieved. As awesome as online invitations would be, I was not about to send out e-vites for my wedding. That being said, I’ve never been a big fan of wedding reply cards either. I have nothing against the cards themselves, but the traditional design of them has never sat right with me. Let’s look at a standard reply card. I don’t have any handy, but they all look pretty much like this:

RSVP Card 1

First off, I’ve always thought the “M” was stupid. I know I’m supposed to write my name on the line, and I don’t really need any help in writing “Mr.”, “Mrs.”, or “Ms.” If your name happens to start with an “M”, you’re even luckier! This line is fine for writing 1 or 2 names, but we had several instances where families of 4 or 5 were invited. Cramming all those names into that space would be a pain at best.

Another thing I hate about reply cards is the method of selecting a meal. Let’s say 2 people will be attending the wedding using the card above. One person is having the chicken, the other the fish. It’s easily to specify the couple will be eating 1 chicken and 1 fish, but who’s having what? There’s really no way to specify that on the above card without writing really small in a margin somewhere, and that’s just sloppy. There had to be a better way. Here’s what we came up with to address these issues:

RSVP Card 2

As you can see, my spreadsheet database roots came into play in the design of my reply cards. You’re actually looking at the back side – the front gave a reply date and explained the meal choices in more detail. This design allows ample space for names and a rather intuitive method for choosing a meal for each guest. (Yes, I’m aware that “child” is a meal choice and that it looks really bad, but it fit much better into the column than “Chicken Tenders”.)

The final feature we added was the number you can see in the bottom right. On the advice of several friends who had issues at their weddings, we gave each card a primary key ID number. This way if someone mailed a card back without writing their name on it we would still be able to trace it to its owner. We were fortunate in that we didn’t really need this. One couple forgot to write their last name, but we knew who they were anyway.

Not only did Michelle and I have a great time planning our wedding, but we had a great time dealing with the data that was created as a result! Stay tuned for more posts to come on wedding data and what we did with it.

Jul 152010

Images make just about everything more interesting, which is why I do my best to include at least one with each blog post even if for nothing more than comic relief.

I run a few different websites, and a while ago I decided to host the images for all of them in the cloud using Amazon S3 (Simple Storage Service).  I’ve been a very satisfied customer since then.  Not only is it incredibly easy to use, it’s also rather simple to “mask” it so your images look like they’re coming from somewhere else.  If you look at the properties of any image on this page, you’ll see that it’s coming from “img.bobpusateri.com”.  In reality, it’s coming from S3 thanks to the magic of DNS aliasing.  In this post I’ll show you how to do that.

Why’d I Do This?

My original website was hosted out of my house on a single Linux box residing under my desk.  This machine was a real powerhouse at 300Mhz and 128MB RAM (and I was using it until the end of 2009!), so I wanted to move the load of hosting images off that box and into the cloud.    After considering a few different options I decided to use S3 based on both their 99.9% uptime guarantee and what I consider to be very reasonable pricing.  The fact that I’m not paying for anything I don’t use was also rather attractive.

The downsides?  It’s not completely free, but I’ve always believed that you get what you pay for.  You’re charged based on how much data you’re storing and how many times it’s accessed.  Hopefully nobody decides to keep clicking “refresh” forever with the hopes of bankrupting me (don’t worry, it won’t take you too long!)   At the time I did the switch it also required some coding changes to my sites, but that was a one-time expense.

Accessing S3

Once you create an S3 account, you’ll want to get it set up so you can start uploading your awesome images.  Since S3 lives in the cloud, there are a wide variety of clients available for uploading & managing your data.  My personal favorite is Amazon’s recently-released web console.  Actually the console has been around for a while as it supports several Amazon cloud products but it only recently started supporting S3.  Another client I like is S3Fox, an add-on for Mozilla Firefox.  There are many others out there as well depending on your needs.

The Bucket

BucketsThe basis of all thing in S3 is the “bucket” and very object you store in S3 will be in a bucket.  Buckets store key/object pairs, the keys being a string and the object being whatever type you like (in this case, an image file).  Keys are unique within a bucket.  Buckets cost nothing by themself (you’re only billed by what’s stored in them) and each account can contain up to 100 buckets, though you probably won’t need anywhere near that many.  Of particular importance is the bucket’s name, as it determines how you access the objects contained inside.  Bucket names must be unique across all buckets stored in S3, so if you try to create a bucket named “images” it will probably fail because someone else has likely already thought of that.

The other choice you have when creating a bucket is the region its data is stored in.  Amazon currently has 4 different regions to choose from, two in the U.S., Ireland, and Singapore.  In general you’ll probably want to pick the reason that’s closest to your target audience, but you may have reasons for storing it elsewhere (legal compliance, etc.)  The price you pay depends on the region you store the data in.

If you want to “mask” S3 so that it appears as another domain, you’ll need to give your bucket the name of whatever domain you want to mask it as.  This means that for my “img.bobpusateri.com” domain I have a bucket named “img.bobpusateri.com” in my S3 account.

Objects & Permissions

Once you’ve created your bucket, you’ll want to put stuff in there.  Uploading instructions will vary depending on your client, but most of them utilize a standard FTP-type manager allowing you to create folders and copy local files to S3.

By default a bucket’s contents are shared with nobody except its creator.  To allow the world to see an object, you’ll need to alter its permissions so that “everyone” can read it.  This again varies by client, but generally you can right-click on an object, select “Edit Permissions” or “Edit ACLs” (Access Control List) and grant read/view rights to everyone.  This is possible on a per-object basis through S3 (though some clients will recurse through buckets and/or folders) or a “canned access” policy may be applied to an entire bucket.

Accessing Objects

To access an object, you can formulate a URL from its key and the name of the bucket it’s stored in.  For a bucket named “bucket” containing an object with a key of “key”, the URL would be as follows:


In the case of the image in this post, the bucket is named “img.bobpusateri.com” and the key is “bc/2010/07/Buckets.jpg”, which means you can access it from URLs:


But neither of those are all that good looking, are they?  The purpose of this post is to be able to mask the Amazon part of the URL so that it will also work like this:


This is done with a DNS command called CNAME, which more or less creates an alias for a subdomain.  This needs to be done at your hosting provider and will probably be under an advanced options menu somewhere.  You’ll want to set your desired subdomain (“img” in my case) to point to “s3.amazonaws.com”.  Once that’s set up you should be good to go.