2.3.3 Big Data

“Big data” in general refers to a large volume of information, the different types of data, the real-time nature of the data, and the tools and techniques to manage and analyze it. For transportation and traffic engineering, FHWA defines big data as having at least seven dimensions:
  • Volume – the amount of data available;
  • Velocity – how quickly the data is generated or gathered;
  • Variety – how the data is structured;
  • Veracity – how trustworthy is the data;
  • Value – how meaningful is the data;
  • People – refers to those who process and analyze the data; and
  • Governance – how the data is gathered and processed
Big data has many advantages over traditional methods of collection. The data is readily available through third-party providers and does not involve infrastructure investment. Because the data is collected and stored by the provider, there is no need for local storage. Collecting the data does not involve manual and in-field methods. Data can be analyzed for an entire day, month, or year, not just during peak periods. The flexibility of data collection may enable a deeper understanding of travel patterns and other traffic operations-related issues.
There are some challenges and limitations associated with big data. Due to the automated nature of the data collection, there is a lack of information on field conditions, such as construction, detours, weather, or malfunctioning traffic signals. Typically, the data provider does not share its process for how the data is aggregated, processed, and analyzed, and the user relies on it and trusts the provider’s methods. Other challenges may arise when the defined study area does not have a one-to-one relation within the defined limits of the provider’s platform.
Additional detail about providers of big data (e.g., INRIX, Replica, and StreetLight) related to transportation planning and traffic engineering is provided in the sections below. As of 2024, TxDOT has subscriptions to INRIX and Replica, and the subscriptions are available to TxDOT employees and contractors working on TxDOT projects or initiatives. To access this data, consultants and external entities need a TxDOT sponsor and to fill out forms agreeing to terms of use before access is granted. Sample forms are provided in
Appendix C, Sections 2-5
. Information regarding Replica data access can be found in Appendix C, Sections 6 (References 5 and 6). Information regarding INRIX data access can be found in
Appendix C, Section 6 (Reference 7)
.

2.3.3.1 INRIX

INRIX aggregates probe-based data from numerous sources, including crowd-sourced, public, and proprietary data. The data types include consumer devices (e.g., connected cars, mobile phones), local fleets (e.g., service, delivery), and long-haul trucks. All data types collected are GPS based. Historical data availability goes back three years. MOEs reported in the platform include travel times, speeds, congestion, and bottlenecks. INRIX data does not capture stops; therefore, that information is not available. The following data analysis tools are available to users:
  • Real-Time Traffic Flow – provides speed and travel time data by roadway segment and is updated every minute for most roadways.
  • Roadway Analytics – provides average speed and other data by roadway segment. The data can be organized into 15-minute bins and is available for dates from January 2018 onward.
  • Historical Speed Profile – analytical tool that helps the user visualize, monitor, measure, and manage the performance of roadway networks.
  • Trip Analytics – a tool that helps provide detailed data on trips such as origindestination.

2.3.3.2 StreetLight Data

StreetLight data processes anonymized location records from various sources such as smart phones, navigation devices, and trucks. Additional context is added to this data by layering information from other sources such as parcel data and the roadway network. This data is then analyzed and aggregated into normalized travel patterns. Users can access this data via the StreetLight InSight platform.
StreetLight data primarily helps in analyzing origin-destination, annual average daily traffic (AADT) counts, segments, trip lengths, and route choice. StreetLight AADT counts are typically only used if no other data source is available. It is recommended that any traffic counts collected using the StreetLight platform be validated with permanent count stations or ADT where available. If there are no permanent count locations available, refer to temporary count locations or historical counts. It is important to note major differences between StreetLight counts and those collected from historical data, STARS II, or other resources.

2.3.3.3 Replica Data

Replica combines anonymized and aggregated data from a variety of sources, including census data, surveys, and location-based services, to create a synthetic vehicular dataset. From this synthetic vehicular dataset, users can derive OD flows, study freight patterns, analyze transit ridership, and perform link analyses, among other features.