Final Proposal

For my final project I have chosen option 2, the Data Analysis. The dataset that I have selected to use for my final project is the dataset titled “slave-sales-1775-1865”. The dataset that I have chosen includes both numeric and text data. The numeric data includes information such as the year of slave sales, the age of the slave that was being sold in years, the age of the slave being sold in months, and the appraised value of the slave being sold. The textual data that is inluded describes things such as the state where the sale was taking place, the county in the state, the gender of the slave being sold, any skills that the slave may possess, and any defects that the slave may have. The minimum and maximum ranges for the numeric data in the columns are as follows:

Date of the sale: 1742 –  1865

Age of the slave being sold (in years): 0 – 99

Age of the slave being sold (in months): 0 – 11

Appraised value of the slave: -800 – 525000

The ranges for the descriptive data are as follows:

State of sale (sorted alphabetically): Georgia – Virginia

County of sale (sorted alphabetically): Adams – Williamson

Gender of the slave: Male or Female

Skills the slave may possess (sorted alphabetically): Apprentice – Woodcutter

Defects the slave may possess (sorted alphabetically): Asthmatic – Without Fingers

Each row that the chart contains describes the characteristics of a slave that was sold between the years 1775 and 1865. The amount of information that is given about each slave appears to vary depending on whether or not the slave in questions possessed any skills that would make them more desirable or any defects that could make them less valuable to a potential buyer.


There are a couple of comparisons that can be drawn from this data after analyzing the information in the dataset. The first relationship that I noticed almost immediately is how the appraised value of the slaves changed depending on whether or not they were male or female. It does not appear to be true in all situations, but it seems as though in most cases, slaves that were female were valued at a much lower value than slaves that were listed as males. The next relationship that I noticed between the different columns was that slaves that were listed as having a defect were valued much lower than ones that did not. Conversely, slaves that were listed as possessing some type of skill were valued much higher than slaves that did not. Further, any slave that possessed a skill in tasks that were considered skilled labor, were valued at a higher rate than slaves that possessed skills in domestic labor. Another comparison that I noticed is that the value of slaves in terms of age would resemble a bell curve when put on a graph. This means that newborn slaves and slaves of a young age are valued very little. As slaves get older into there 20’s and 30’s they are valued the most. As they age past this age their value begins to decline again as they enter into old age. One more comparison that I was able to draw from analyzing the data is that the slaves that excelled in skilled labor tasks tended to be male, while the slaves that tended to have more domesticated labor skills tended to be more female.

Final Proposal

Slave Sale 1775-1865

The data presented in the Slave Sale is about the basic information needed when selling slaves. The data consist of numeric, textual, and geographic information. Within the spreadsheet there are nine columns: state, county, date entry, gender, age, appraisal value, skills and defects. Unfortunately, the data is only specific to seven southern states: Georgia, Louisiana, Virginia, North/South Carolina, Mississippi, and Maryland. Aside from the states the counties listed can provide inside on how selling of slaves might have been different within that specific state. The date entries, age and value are all numerical information. As for the date, the entries begin at 1775 and end in 1865. In the data there were many slaves whose age were unknown, from the information given the oldest age was approximately 60 and the youngest was about 3. The lack of age information is due to many slaves/slave masters not knowing or keeping track of when each individual slave was born, some of the ages given may also just be an estimate based on the slaves appearance. The appraisal value given to each slave were obviously the amount the slave master expected to receive after he sold his slave(s). The last category gender, skill and defect were all textual information. The slaves were either males or females and both genders possessed skills. For instance, the men had skill such as cabinet makers or gardeners and women would be cooks or midwives. As for defects, being old or too young fell under this category. Also, deformities and disabilities were considered a defect.

There are many connection between each columns in the data. However, this data still lacks information such as names of slavse and a better record of age. Despite the missing information, the data shows how slaves labeled as defects were appraised at a significantly less value than those who were not labeled. Another comparison is that men were sometimes sold at a higher value than women, but once age plays a roles and not only gender you see that slaves  in there 20-30’s were sold at the highest value. The data can answer question like What types of skills did slaves posses? What were considered defects? and How does gender and age link to slaves appraisal value? However the data cannot give information as to how were slaves appraised? Were these slave masters looking for women who could bear children or looking for men who were physically well?

A proposal? Oh this is so unexpected

The 1915 census sure was diverse, with lots and lots of different shades of white. With numeric, textual, and geographic data, it sure was a riot to look through! It includes the names of the citizens of Albany in 1915, their birth year, their birth place, age, sex, relationship to the rest of the house, their skin color, and their address. 1,216 rows of this, to be exact! A hoot! A holler! Fun for the whole family! The names and genders include men and women, because that was all that existed back then. Ages ranged from 0 years to 4 years to 20 years to 70 years, showing that it wasn’t just old white people in Albany at the time- there were plenty of young white people, too! Relationships range from head of household to the wife to the children, to apprentices and patients laid up (presumably, hopefully) in a doctor’s house. Race ranged anywhere from white to white to white, with a good handful of black or brown people thrown in there, because that means it isn’t racist. Citizens came from a wide range of European places, including germany, Holland, and Finland. I guess these were the immigrants that were juuuust white enough to be allowed into the city. (Though, this was also during the Great Migration, which we did cover in class- so if there were any new black citizens moving to Albany, they were listed as being born in the United States.)

Although occupations is included in the descriptions, not many people seem to have a listed occupation. This might be because there was a lot of unemployment at the time, or it might be because we simply don’t have the data to put into the tables. About 20% of the census has an occupation listed with them (although quite a few occupational inputs also list “no occupation”), but I have a hard ime believing that Albany had an over-80% unemployment rate at any point in its history, much less when other cities were likely starting to experience some sort of economic boom related to preparing for World War I. Everyone who does have an occupation listed, however, is listed as living in only a handful of places- namely, McCarthy Avenue, South Pearl Street, and Kenwood Road. (This is especially interesting in regards to whoever did their midterm walking tour on Pearl Street, and maybe they could better answer why this was one of the few designated places where employed people live. Is it part of the push towards the suburbs? Moving the employed, usually white people into their own “safe” areas?) Surprisingly, every person who is listed with the “relationship” of being a student is a female. I’d assume this is because if a male was listed as being studying under a certain subject, he was listed as an apprentice of that subject. So though a woman and a man might both be studying medicine, the woman would likely be listed as just a student (or maybe, just maybe, a nurse), while the man was likely listed as either a doctor or an apprentice in a house whose head of household was a doctor. Similarly, everyone listed as a servant is female- i.e., there were no male maids, only female. There was also a surprising number of older people living in Albany, and a lot of older people living in a lodging house. Maybe this term was used differently and meant something more like what we’d think of as a nursing home or an assisted living home? It’s just hard to imagine a bunch of old 70- and 80-year-olds living two or three in a room in any other situation. They’re all listed as being patients, so I guess that explanation makes sense.

Final Proposal

For my final project, I chose to do option 2 with the data set being 1756. The data that this dataset includes is the quartering information of soldiers during the year of 1756. There are 12 columns across and 110 longitudinal columns. What each of the 12 latitudinal column describes goes as follows:

  1. Number: The first column is the number of home that is being described in the column across in the list of homes that quarter soldiers. This column ranges from 1 to  327
  2. Name: The second column is the name of the head of the household that is quartering soldiers in their home.
  3. Trade: The third column is the job that the head of the household has.
  4. Gender: The fourth column is the sex of the head of the household, being either male of female.
  5. Conv for officers: The fifth column is how many officers can fit in the house being described in the column across.  This column only ranges from 1 officer to 2 officers.
  6. Officers upon a pinch: The sixth column represents how many officers can fit in the household, in a pinch. This number, for the most part, is higher than the conv for officers column, but not by much (usually just 1 higher).
  7. Conv for men: The seventh column is how many men can fit into the quartering home being described. The numbers in this column range from 2 men to 6 men.
  8. Men upon a pinch: This column is how many men can fit into the home being described, if absolutely necessary. This number ranges from 4 men to 10 men. This number, in every case, is higher than the previous column, conv for men,
  9. Number of fireplaces: This column is how many fire places that there are per household. The numbers range from 0 to 6.
  10. Rooms without fire: This column tells us how many rooms that there are in the home that don’t include fireplaces. The numbers range from 0 to 4.
  11. Rooms the family occupied: This column represents how many rooms in the house that are being occupied by family members.
  12. (no label): The last column is what the condition of the house is like. The options are none, good house, a very good house and rich, and rich.

The first relationship that I have found in the dataset that I have chosen is that most of the heads of the households that are willing to quarter soldiers are men, and when women did quarter soldiers, it was only men. Only 1 woman in the entire dataset was willing to quarter officers. Because of this, I’m going to research if there are and risks of women quartering soldiers vs. risks of women quartering officers. I’m also interested in fining the percent of households that had female heads of the household, versus the percent of households that were headed by men. I also want to find if there were any benefits for quartering soldiers during this time period.

The second relationship that I found in the dataset that I chose is that every single house that is listed as a good house, a very good house and rich, and rich are all headed by males. Due to this, I wonder what income for women was in that year, versus the average income for men for that same year. The only jobs that women are listed for having in this data set are Indian trader, muntua maker, mead house worker, and merchant. The list of jobs that are held by men during this time period include weapon maker, tavern keeper, taylor, silversmith, selling liquor, shop keeper, inn keeper,Indian trader, dram shop, brewer, britches maker, and black smith. Due to the short list of jobs that women had, I also am interested in finding out the types of jobs that men and women both had during that time period.

The third relationship in the dataset that I found is  that many men were willing to quarter men, but not willing to quarter officers. Again, I’m interesting in finding out why this is.

Final Proposal

For our final project, I have chosen the 1800 Census for the City of Albany. I selected this dataset because it provides a relatively detailed breakdown of the population as opposed to other available censuses. The dataset provides a count of each person residing within city limits, and offers both qualitative (text) and quantitative (numeric) data regarding the other members of their households.

Each row of data in the census begins with the person’s full name before providing counts of who else resides in their home. These quantitative counts are divided into twelve categories, in order:

  1. Free White Males < 10 Years (Min: 0 | Max: 5)
  2. Free White Males 10-16 Years (Min: 0 | Max: 5)
  3. Free White Males 16-26 Years(Min: 0 | Max: 10)
  4. Free White Males 26-45 Years (Min: 0 | Max: 14)
  5. Free White Males 45+ Years (Min: 0 | Max: 10)
  6. Free White Females < 10 Years (Min: 0 | Max: 14)
  7. Free White Females 10-16 Years (Min: 0 | Max: 4)
  8. Free White Females 16-26 Years (Min: 0 | Max: 4)
  9. Free White Females 26-45 Years (Min: 0 | Max: 10)
  10. Free White Females 45+ Years (Min: 0 | Max: 10)
  11. Other Free Persons [Except Untaxed Indians] (Min: 0 | Max: 9)
  12. Slaves (Min: 0 | Max: 11)

In addition to these counts, there are two qualitative columns of data: Head of Household Race and Head of Household Gender. For Head of Household Race, only two possibilities occur: White and Black. Similarly, there are only two possibilities for Head of Household Gender: either f (female) or m (male).

I plan to analyze three primary relationships between two or several data columns. First, I am going to examine the relationship (and any related trends) between the Total Number of Free White Males and the Total Number of Slaves in a household. I am seeking to answer whether or not the volume of the former affects the latter in any way. I will make the comparison using averages for Free White Men and Slaves in each household.

Second, I am going to investigate the relationship between Households with a Female Head of Household and the Distribution of Other Members of the Household. I am looking to see whether there are identifiable trends in occupant distribution for homes run by a dominant female. I will find the average counts of all members in households that fall in this category.

Third, along the same vein as the last, I will study the relationship between Households with a Black Head of Household and the Distribution of Other Members of the Household. Again, I will find the total number of counts for each category of occupant only for homes where the head of household is African American (Black). The purpose of this examination is to spot trends relating to the race of the head of household.

Final Proposal

1880 Census

Identity Dataset:

The dataset for the 1880 Census in Albany, New York consists of a lot of basic information as well as a few more detailed pieces of information. In a row the information you get about someone is his or her first and last name, age, race, gender, estimated birth year, relationship in terms of family, marital status, birthplace, and the birthplace of both their father and mother. The data is numeric, textual, and geographic. The columns about birthplace are geographic, the data about age and birth year estimates are numeric, and the remaining columns are textual. It is majorly textual, but the other columns are just as, if not more, important comparatively to the text only columns. The ranges of the two numeric columns correlate because they consist of birth year estimates and age. The minimum age of the dataset is one month old and the maximum age is 98 years old. The minimum year of birth is 1782 and the maximum is 1880. The relationship in terms of family status ranges from son or daughter to self. There are a lot of different terms used in this set. Any sort of relationship that can be had including in laws is listed. Marital status contains the option of single, married, widowed or divorced. Birthplace lists either a country or a state if they were born within the United States. This is true for all three of the birthplace columns in the dataset. Gender consists of either male or female. The most interesting one is the race column. White is the only listed option. Each row is describing a person that was living in Albany during 1880.


Comparisons within the dataset:

I think that a lot of different connections can be made within the dataset and how different columns correlate to each other. One I would start off with is the correlation, or possibly lack there of, of the birthplace and the father and mother’s birthplaces. There is obviously a connection through your relationship to them, but within an Albany census it is interesting to take a look at. Obviously the person in question ended up in Albany by 1880 so how did that happen. There are a plethora of answers to this question, but it is interesting to look at the data because you can kind of piece an understanding together. If they were born in Albany and both parents are from Europe, than you can gather that the parents came over to America and than had children. If the child was also born in Europe you can gather that the children came over to America by themselves and so on. Another connection can be made between age, marital status, and gender. There is bound to be correlation between age and marital status because as you get older you may get married, but when you add in the gender factor it becomes more intriguing. Back during this time it was not unlikely for a woman to get married to an older man at a very young age. This dataset does not directly tell us when anyone was married, but you can start to gather information based on the current marital status versus his or her age. When taking gender into account you will most likely see a younger age for woman when compared to men and their marital status. The last connection between two columns I would make on this dataset is age and relationship within the family. Some people in this dataset are listed as self, which presumably means they have no family within the city of Albany or are not married past a certain age. Some are listed as a son or daughter. I would be curious to find out when you either become self from son or daughter or if self is only someone without family. It makes sense that you become a husband or wife once married, but self is an interesting role.

Final Proposal

1940 Census

Identify Dataset

The 1940 data set includes a lot of demographics in comparison to some other data sets; when looking at the data, the person’s name, age, race, address, marital status, education level and things of the sort can be found. The data set includes numeric, text and geographic information. The numeric text include age, estimated birth year, income, and the value of the person’s home; the text includes whether the person rented or owned their home, their relationship to other people in their home, gender, race, marital status, whether they attended high school or college, highest grade they completed, their employment status, birth place of their parents and their native language. In comparison to the numeric and text information given, the geographic information is not much, it includes the individual’s birth place, residence and street name. In each column that has numeric data, the data varies. For example, the column that pertains to the age of those listed in the data set ranges from the age of 1 to 87, the value of the homes range from 2o to 10,000, the income of each person varies between not having an income and making as much as 7,500 dollars. The geographic range of the data shows that many people lived around the same areas; although some of these people are members in a family, there is a significant amount that seems to have no relation to one another but live nearby. Most of the people lived in the downtown area of Albany; the locations ranged from Hamilton Avenue, Stanwix Street, Delaware Avenue, Barrow Street, Second Avenue and other neighboring places. The rows in the data set present us with a variety of information; the subheadings for the rows include race, address, age, employment status and other information and all of these subheadings either describe a person (the individual’s age, race, marital status etc.), a place (address) or a thing.


When the data set is looked at, there is information that does that need to be searched for because it is already given; some information however is not given and conclusions must be made based on the information that is. One comparison that can be made from the information provided is the relationship between gender and employment; upon initial assumption it may appear as though most of the women in the data set did not work and that proves to be true. Although most did not work, some did; some of the women in the data set were servants which would be considered employment and there were other women that received a high level of education which provided them with the knowledge and skills for an occupation. Another comparison that can be drawn from this data is whether the level of education that the person received plays a role in their occupation. Based on the data, those that received less than a high school education are mostly unemployed, those that have a high school education have jobs such as traveling salesman and electricians and the few that received a college education have an occupation as a lawyer or a librarian. A comparison can also be made between whether a home was owned and the gender of the person who owned it. Based on the data, most, if not all of the people that owned homes were male and were the head of their household; their wives were often unemployed and so were most of the daughters. There is a comparison that cannot be determined simply off of the information provided but the relationship between whether the area a person lived in affected their ability to own their home is one to look into; if there is a correlation between the area and home ownership that may help to explain why many people lived in similar areas even those that were not related to one another.

Final proposal

The Title:
The Albany Muster Rolls of the 8th Militia.

The Data Set:
The data set of the Albany Muster Roll of the 8th Militia includes the name of each enlisting Soldier, when the soldier enlisted, the soldiers age, where the soldier was born, the soldier’s trade prior to enlisting, the company the soldier belonged to, the soldier’s rank, the soldiers stature, and the soldiers descriptive qualities that included: complexion, eye color, and hair color, and lastly the volume/page the soldier’s name was found on. The Albany Muster Roll of the 8th Militia has both descriptive,and numeric data within its data set. The data within the data set that can be considered descriptive includes the following:the occupation or the trade of the soldier enlisting in the Militia, where the enlisting soldier originally came from (that ranges from places such as Germany, to Connecticut), the soldiers ranking in the Militia (Lieutenant, Captain, Private, etc), as well as the soldiers physical attributes such as eye color ( blue, brown, etc), complexion (brown, fair, swarthy), and hair color (brown, black, fair). The Albany Muster Rolls of the 8th Militia can also be considered numerical data. The data sets includes the following: the age of every soldier entering the Militia, as well as the date that each soldier enlisted. The range of the numerical data for the ages is the minimum age of 16, and foreseeably the maximum age of 58. The range of enlistment date is from the beginning of april 1760, to the end of june 1762, three years prior to the breakout of the American Revolution.

Three Relationships or Comparisons:
The first comparison in the data set of the Albany Muster Rolls of the 8th Militia is if the data set can show a connect between the trades of the enlisting soldiers to the actual rank they received in the Militia. From my brief inspection of the data sets it seems plausible that soldiers who before they enlisted worked as labourers, carpenters, or maintained a blue collar type of trade became Captains, but there are outliers, where for example, a baker also became a Captain.
The second relationship in the data set of the Albany Muster Rolls of the 8th Militia is which homeland or state received the most male enlistments in Albany during the two years between 1760 through 1762. More specifically does the majority of men from certain homelands outweigh others because of migration patterns prior to 1760 through 1762.
The third relationship in the data set of the Albany Muster Rolls of the 8th Militia is whether race, ethnicity or the enlisted complexion (dark, swarthy, fair, negro, pale, ruddy, etc.) of the soldier also affected the rank he was able to achieve. I would also like to look at how the age of the enlisting soldiers to their rank in the 8th Militia.