The 1940 Census: A True NoSQL Database!

On April 2 of this year, the National Archives released the complete population schedule of the 1940 census. These records were highly anticipated not only for their genealogical value but also because of their detailed information about an incredibly interesting period of U.S. History. This census captured the point in time where the country was finally starting to come out of the great depression but had not yet entered World War II. Many questions it included were new and designed to gauge the effects of the depression, with topics including income, education, unemployment, and migration. In 1940, millions were employed by the WPA, PWA, and other New Deal agencies, and the Farm Security Administration's photography program had a small group of photographers traversing the nation capturing images of everyday American life. Some of my favorite photos come from this collection.

1940 Census Population Schedule Form
Population Schedule Form (Click to Enlarge)

As interesting as the Census is for all its historical and social reasons, there's an equally awesome tale to be told of all its data and the technologies behind it. Setting up a table in SQL Server to store 310 million rows and aggregate results from them would be pretty easy today – many DBAs deal with tables that are orders of magnitude larger than that, but in 1940 it was a major undertaking involving thousands of workers. Today the census is still a non-trivial task, however I'd imagine most of that work goes into getting data from the population into a database, while calculating the results from that is relatively simple.

ETL: Enumerate, Tabulate, Lock Up

Prior to 1960, censuses weren't mailed to your house like they are now. Instead every household was visited in person by an enumerator, a single person responsible for an "Enumeration District", or "ED". EDs varied wildly in size, and could consist of a single block in a large city, or an entire township in a more rural area. The enumerator would stop by and ask questions about each member of your household while writing the answers onto a population schedule form that measured 23.75″ wide by 18.5″ high. Yep, the entire database was on paper. Torn page detection must have been a very serious issue! When the enumerator had information about every last person in their district, they would send their data to the Census Office via log shipping Registered Mail.

Once in Washington, an army of operators transferred each record from the population schedule forms to a punch card. Punch cards had been used to tabulate the census since 1890 and were still the best technology available fifty years later. The 1880 census was tabulated by hand, which took 7 years to complete. Knowing there had to be a better way to calculate results, former census employee Herman Hollerith set out to create a machine that could count results from data stored on punched cards. He won a contract to tabulate the 1890 census, which was completed in only 1 year. By 1900, he had formed the Tabulating Machine Company and greatly increased his fees, knowing he had a monopoly and the Census Office would have no other option than to pay them. By 1910, census employees had developed and patented their own counting machine to avoid using Hollerith's. The Tabulating Machine Company, which by then had merged and changed names to the Computer Tabulating Recording Company, was nearly bankrupted by the loss of business. They eventually got their act together and were able to turn a profit. In 1924, Computer Tabulating Recording Company changed its name to International Business Machines Corporation.

Tabulating by Machine
Women in (1940) Technology: Tabulating By Machine

After the records were copied to punch cards and tabulated by machine, the aggregated results were released immediately for uses like determining congressional seats and allocation of public funds. Since the population schedules contain information on individuals, they are held for 72 years before being released for research purposes. Rather than keep all 3.9 million pages of records on paper, the Census Office used the most compressed format available at the time, microfilm. Apparently they had not yet discovered the rather obscure and undocumented BACKUP CENSUS TO TAPE=’MICROFILM’ WITH COMPRESSION; command. The records released this year are images scanned from that microfilm.

Indexing

Since all the data consists of images, there's no easy way to index them. Optical character recognition software is pretty good these days, but probably not good enough to pick out the handwriting in these images – most of which is in cursive. Instead everything was indexed by enumeration district, meaning you need to know where someone was living during April of 1940 before you can search for them. Many genealogy websites are now working on indexing this data by name, but it is not expected to be completed for a few months.

Finding Your Family

If you had relatives in the US in 1940 and know where they lived at that time, I highly recommend looking for them. Everything can be found for free at http://1940census.archives.gov. The first thing you’ll need to do is find which enumeration district they lived in. If you have an address, you are very much in luck. If you only have a general idea of where they were, then you’ll probably have to do a bit more work to find them. The census site lets you drill down by state, county, and city, and provides a list of EDs that apply. If you’re searching in an urban area you might need to use maps and/or descriptions returned by the search to narrow down exactly which ED they were in. If the official site isn’t finding anything for you, I’ve also had luck using Steve Morse’s 1940 ED Finder. Once armed with a list of relevant enumeration districts, you can view or download the population schedules from each district and look for people you recognize. You’ll probably end up looking through all the sheets because the entries on the forms aren’t always in order. My guess is that enumerators would start going down a street, skip houses where nobody was at home and then come back to them later.

I was fortunate enough to find all of my family, and it's really neat to be able to see a snapshot of their lives at a time when my grandparents were close to my age. It also gave me great appreciation for what a chore recordkeeping was in that era! Even if you have no relatives in this census I think it's still worth taking a look at – it was very interesting to see what kinds of jobs people had, their education level, and how much they were paid. My family was in the suburbs of Chicago at that time, and probably 7 out of 10 people in their area worked in "telephone manufacturing" which would have been at Western Electric's Hawthorne Works. My wife's family was in a rural area downstate, and practically everyone worked on a farm and the few who didn't were employed by the WPA. The best job title I saw when searching for her family was "chicken picker".

Best of luck if you end up searching for your ancestors. If you find any who were employed as a chicken picker, let me know!

Fun Videos