Open Access Open Access  Restricted Access Subscription Access

doi:10.3808/jei.201700381
Copyright © 2017 ISEIS. All rights reserved

Mining Spatio-temporal Data on Industrialization from Historical Registries

D. Berenbaum1, D. Deighan1, T. Marlow2, A. Lee1, S. Frickel2 and M. Howison1*

  1. Data Science Practice, Computing & Information Services, Brown University, 3 Davol Square, Providence, RI 02912, USA
  2. Institute at Brown for Environment and Society, Brown University, 80 Waterman Street, Providence, RI 02912, USA

*Corresponding author. Tel.: +1 401-863-6743; fax: +1 401-863-7216. E-mail address: mhowison@brown.edu (M. Howison).

Abstract


Despite the growing availability of big data in many fields, historical data on socio-evironmental phenomena are often not available due to a lack of automated and scalable approaches for collecting, digitizing, and assembling them. We have developed a datamining method for extracting tabulated, geocoded data from printed directories. While scanning and optical character recognition (OCR) can digitize printed text, these methods alone do not capture the structure of the underlying data. Our pipeline integrates both page layout analysis and OCR to extract tabular, geocoded data from structured text. We demonstrate the utility of this method by applying it to scanned manufacturing registries from Rhode Island that record 41 years of industrial land use. The resulting spatio-temporal data can be used for socio-environmental analyses of industrialization at a resolution that was not previously possible. In particular, we find strong evidence for the dispersion of manufacturing from the urban core of Providence, the state’s capital, along the Interstate 95 corridor to the north and south.

Keywords: structured text, historical data, geocoding, page layout analysis, socio-environmental analysis


Full Text:

PDF

Supplementary Files:

Refbacks

  • There are currently no refbacks.