Source materials

Here is a list with links to the jupyter notebook and original dataset used to generate the findings on this page:

Summary and challenges

There are 74,065 streetlights listed by their lat and long in this dataset. All records are of type LIGHT so the only useful information in this dataset is the location of the light. Given this we first plotted out each of the lights on a map of the city of Boston. The map shows a sparse distribution of light locations in the lower-left corner of the map.

street-lights_overall

Missing Data

This is shown more clearly by comparing two figures below. The bottom figure indicates that there are multiple streets with no lights in zip code 02132 (West Roxbury).

street-lights_dense

street-lights_missing

To explore whether this sparsity was in fact missing data, we investigated Google Street View to explore some of these West Roxbury streets. We found that these streets do indeed have streetlights (as is shown in the photograph below) confirming that this dataset is not a complete representation of streetlights in Boston. This missing streetlight data has the potential of skewing any model that uses this information as a predictor.

street-lights_street-view

Feature Engineering

However, to further explore the usefulness of this data and any effect of this missing data, we have begun work on engineering a model feature that measures streetlight density within 100 meters (configurable) of each crime in the crime dataset. For example, here are the number of streetlights within 100 meter proximity of the first twenty observations in the crime dataset:

[0, 4, 8, 55, 43, 37, 53, 33, 33, 27, 28, 54, 54, 28, 33, 2, 26, 19, 22, 31].