4. Historical Records

Each dataset has a time range or a specific start and end date. These dates signify the day that data collection began and the last day that it was updated.

The Reimagine 911 team found that about 50% of data providers made at least six years of data available. However, we also found that around 30% of providers shared less than 12 months of data and would most likely be insufficient for most research purposes.

Horizontal bar graph of dataset time ranges where 50% of data providers made at least six years of data available.
Fig. 4.1 While nearly half of all cities offer six or more years of data, another third only provides ≤ 12 months of data.

Time Range Elements

There are three main elements of a time range: start date, end date, and date type. While start date is the earliest date in the dataset and end date is the most recent, the meaning of these terms can vary between (and sometimes within) datasets. In order to fully understand a dataset's time range, you need to understand its date type.

Range Types

We use the term “Date Type” to explain when and how often a city updates its datasets. In Reimagine 911’s Open Data Reviewarrow-up-right of open 911 datasets, we identified three different ways that time series data is shared:

Fixed: Datasets with this date type are historical. The data covers a time range in the past (for example a dataset from 2010-2017) and are no longer updated.

Present: The dataset’s time range has a fixed start date in the past and is still updated through to the present day (for example a dataset with a start date in 2010 that is updated on a weekly basis).

Recent_x: Data that is updated regularly, with the most current data replacing the earlier data. For example, a dataset annotated with “recent_7” contains data for the past seven days only. Be aware of these datasets since the short time frame can be extremely limiting for analysis.

Time Range Inconsistencies

It is not unusual for cities to change how they collect date range data. This creates inconsistencies in time series data. There are two main ways these changes manifest:

Date format: The actual formats of the dates themselves may change. For example, the format could change from a numerical mm/dd/yyyy format to a written Month, Day, Year format.

Dataset schema: The way that a city structures data may change over time. While not specifically related to start and end dates or the date range, it is an important consideration when working with data from different time ranges. For example, a city could suddenly change the structure of its call record types, dividing the category “vehicle accidents” into three separate subcategories. Comparing different time ranges may not capture (or may introduce incorrect) trends in the data you wish to inspect. While it is possible to compare datasets with different schemas, it adds a layer of complexity. It is important to know about differences in schema as early as possible to ensure proper analysis.

It’s necessary to look at the data itself in order to determine schema changes. Going back to the call type example, the changes would not appear in the dataset’s column headers–all of the data is still in the call type column. The changes would be found in individual records.

Time Range Distribution

The majority of datasets have a start date in either 2021 or 2022 (see Fig. 4.2) A significant number of datasets contain six or more years of data. These datasets will be best for uncovering trends and changes over time.

Dual axis chart of dataset start dates where one-third of datasets have start dates in 2021 or 2022.
Fig. 4.2 One third of datasets reviewed have start dates in 2021 or 2022.

Last updated