I’m always looking for new and interesting data sets for demonstration purposes. I enjoy using data that people can relate to – not only is it easily understood, but often it can be entertaining as well. This data set of real-world parking tickets does all that.
A few weeks ago I came across this blog post by Matt Chapman. Matt filed FOIA requests with the City of Chicago and, after multiple attempts, was able to get access to over 36 million parking tickets written between 2003 and 2016. Matt goes on to explain Chicago’s parking ticket database, how he processed the data, analyzed it, and in one location got Chicago to put up additional “No Parking” signs to reduce parking tickets in that spot by 50%. That is most definitely using analytics for a great cause!
But let’s get back to that data for a second, that’s what really interests me. Matt shared his raw data for others to analyze, but it was formatted as a PostgreSQL dump. Now PostgreSQL is a great tool with an even greater price, but it’s not always the easiest to use. After spinning up a Linux VM and spending hours setting everything up as best I could, I still couldn’t get the dump to restore properly. Apparently I didn’t have all the exact versions of certain extensions installed, and because of that the tables couldn’t be loaded. Grrrr.
After several more hours of manually editing a 13GB text file, I was able to load all the data into SQL Server. With some normalization and proper data typing, the size dropped down to about 3GB, and just about 500MB with compression! It contains dates, locations, license plates, violations, fines, and several other datapoints.
If you’d like this data set for yourself, download it here (500MB download). Inside the archive you’ll find both a SQL Server backup and a file of scripts. The script file contains an additional index you may want to create to enforce uniqueness, as well as a view that joins all tables together.
I hope you find this data as interesting as I did. Enjoy!