"The Evolution of City Population Density in the United States" 
Brian Minton, Pierre-Daniel Sarte, Kevin Bryan
README.txt
January 2008

All data and programs used in this paper can be found at:
	http://www.richmondfed.org/research/research_economists/pierre-daniel_sarte.cfm




1. Files

Note: All data in .xls, all code tested in Gauss 7.0.
city_data.xls			| Legal Cities 40-00
cityMSA_data.xls		| MSA 50-80
cityurb_data.xls		| Urbanized Areas 50-00
CITYload.gss			| Load Legal Cities
CITYload_date.gss		| Load Old/New City binaries
CITYload_reg.gss		| Load Region Codes
CITYloadmsa.gss			| Load MSAs
CITYloadurban.gss		| Load Urbanized Areas
RosPar.gss			| Estimate Density for Legal Cities
RosPar_date.gss			| Estimate Density of New/Old
RosPar_reg.gss			| Estimate Density in each Region
RosParMSA.gss			| Estimate Density of MSAs
RosParUrb.gss			| Estimate Density of Urbanized Areas
UnivariateSummary.gss		| Construct Table of Summary Stats



2. Data Sources

2.1 Legal Cities
Legal Cities include all cities and Census-designated places (previously called unincorporated places) in the US with a population greater than 25000 in a given decade from 1940-2000.  Data is found in the "Number of Inhabitants" publication put out by the US Census a couple years after each decennial survey.  Data was crosschecked, when available, with data in the Univ. of Virginia City and County Data Books.  Area is in Square Miles.  The border of a legal city is defined as the actual legal border of such a city, or, in the case of a CDP, the border determined by the Census Bureau in consultation with local and state experts.

2.2 MSAs
MSAs include all Census-defined MSAs from 1950 to 1980.  Before 1950, the Census did not report MSAs, and after 1980, a major change was made in the definition of MSA which makes data from 1990 and 2000 incomparable with data from 1980 and before.  Areas and Populations are from the Number of Inhabitants publication, crosschecked with an MSA database used by Dobkins and Ioannides (2000).  MSAs are defined as a central city meeting a population threshold, the county containing that city, and any counties outside that city meeting a density threshold and a threshold for percentage of workers that travel to the central city.  See the Census Bureau Geographic Areas Reference Manual, Chapter 13.

2.3 Urbanized Areas
Urbanized Areas include all census-defined Urbanized Areas from 1950 to 2000, also as reported in the Number of Inhabitants publication. An Urbanized Area is a densely-settled area with at least 50,000 inhabitants and 1000 residents per square mile.  In general, this is a central city plus the "urban fringe", a collection of census blocks surrounding that city that meet density requirements.  The definition of Urbanized Area has been fairly consistent from 1950 to 2000; see the Census Bureau Geographic Areas Reference Manual, Chapter 12, for a more complete description.



3. Data Format

Areas are reported in square miles.  Density is population per square mile.  A "-1" implies that the city during a given decade does not have any data.  Regional definitions are based on the Census-defined regions of the United States.  "East" is defined as New England and Middle Atlantic.  "South" is South Atlantic and East South Central.  "Midwest" is West South Central, East North Central and West North Central.  "West" is Mountain and Pacific.  The binary dummy for old cities is 1 for every city with a population above 25000 in 1940 that still has a population over 25000 in 2000.  The new city binary dummy is equal to 1 for every city with a population greater than 25000 in 2000 that had a population below 25000, or did not exist, in 1940.

All population data is from the US Census Number of Inhabitants surveys, which are published soon after a decennial census.  Area data after 1970 comes from the same publication.  Before 1970, area data for individual cities in not readily available; we've used data from the City and County Database prepared by a researcher in the 1970s at the University of Virginia (http://fisher.lib.virginia.edu/collections/stats/ccdb/).  As this database does not include CDPs, there are missing entries for CDPs in our database for 1940, 1950 and 1960.  In total, 16 cities in 1940, 29 in 1950 and 90 in 1960 are not included in this database even though their populations are above 25000.  This problem only exists for legal cities; the urbanized area and MSA datasets are complete.



4. Programs

The programs CITYload*.gss load in overall legal cities, cities by region, cities with old/new dummies, urbanized areas and MSAs.  Before loading the data into Gauss, each program takes the log of the population, area and density series.

RosPar*.gss nonparametrically estimates the distribution of a given dataset, and constructs a plot of this distribution.  Details can be found in Appendix 1 and 2 of the paper.  

