Final Proposal

1940 Census

Identify Dataset

The 1940 data set includes a lot of demographics in comparison to some other data sets; when looking at the data, the person’s name, age, race, address, marital status, education level and things of the sort can be found. The data set includes numeric, text and geographic information. The numeric text include age, estimated birth year, income, and the value of the person’s home; the text includes whether the person rented or owned their home, their relationship to other people in their home, gender, race, marital status, whether they attended high school or college, highest grade they completed, their employment status, birth place of their parents and their native language. In comparison to the numeric and text information given, the geographic information is not much, it includes the individual’s birth place, residence and street name. In each column that has numeric data, the data varies. For example, the column that pertains to the age of those listed in the data set ranges from the age of 1 to 87, the value of the homes range from 2o to 10,000, the income of each person varies between not having an income and making as much as 7,500 dollars. The geographic range of the data shows that many people lived around the same areas; although some of these people are members in a family, there is a significant amount that seems to have no relation to one another but live nearby. Most of the people lived in the downtown area of Albany; the locations ranged from Hamilton Avenue, Stanwix Street, Delaware Avenue, Barrow Street, Second Avenue and other neighboring places. The rows in the data set present us with a variety of information; the subheadings for the rows include race, address, age, employment status and other information and all of these subheadings either describe a person (the individual’s age, race, marital status etc.), a place (address) or a thing.


When the data set is looked at, there is information that does that need to be searched for because it is already given; some information however is not given and conclusions must be made based on the information that is. One comparison that can be made from the information provided is the relationship between gender and employment; upon initial assumption it may appear as though most of the women in the data set did not work and that proves to be true. Although most did not work, some did; some of the women in the data set were servants which would be considered employment and there were other women that received a high level of education which provided them with the knowledge and skills for an occupation. Another comparison that can be drawn from this data is whether the level of education that the person received plays a role in their occupation. Based on the data, those that received less than a high school education are mostly unemployed, those that have a high school education have jobs such as traveling salesman and electricians and the few that received a college education have an occupation as a lawyer or a librarian. A comparison can also be made between whether a home was owned and the gender of the person who owned it. Based on the data, most, if not all of the people that owned homes were male and were the head of their household; their wives were often unemployed and so were most of the daughters. There is a comparison that cannot be determined simply off of the information provided but the relationship between whether the area a person lived in affected their ability to own their home is one to look into; if there is a correlation between the area and home ownership that may help to explain why many people lived in similar areas even those that were not related to one another.

One thought on “Final Proposal

  • April 11, 2016 at 10:24 PM

    Are your no-occupation people children or adults? If you plan to do any kind of argument about # employed or occupations of adults with no education, make sure to filter out children first (you can drag age over to filter as a continuous or discrete measure and set the range to 16-62 or 18-50, whatever you’d argue is working age for the time).

    Camille had a good observation re: attended school vs highest grade–attended school just means “did they attend school during that census year.” So you’d be marked as attending school, highest grade college, while I’d be marked with not attending school, highest grade college. If this is something you plan on looking at more, make sure you take age into consideration–I believe high school through age 18 is mandatory in NY by 1940, but that would be worth checking on.

    It sounds like you’re interested in women’s employment (which would also fit nicely with your capstone project). One way of framing your project might be to just look at the women–are there differences in employment, education, homeownership, etc, by race, marriage status, etc? You can filter to just women by dragging gender over to the filter pane or right click>filter.

Comments are closed.