Data Requirements for Statistical Analysis
    The greatest challenge in applying the statistical analysis of stream gauge data is obtaining a sufficiently large sample of streamflow measurements (or estimates) so that the sample is representative of the entire population of flows. 
Three
 types of data may be considered (USGS 2018
), systematic data, historical data, and paleoflood and botanical information.
Systematic data are flow records generated from a defined set of rules and recorded on a regular basis. For example, the 
 (USGS) annual maximum flow record for a gauge consists of the maximum instantaneous flow value for each year, recorded every year over a given time period. If annual maximum flow values were recorded only for years in which large events occurred, then the record would no longer be systematic. Gaps (missing years) in the systematic record do not preclude use of such data so long as the gaps are the result of missing data, and not the result of filtering the data based on flow magnitude.
Historical data are flow estimates for events not included in the systematic record. These data typically consist of historically significant events, and thus are a sample of extreme events observed by locals. Historical data should be included in the analysis when possible. In cases where only a short systematic record is available, historical data are particularly valuable. Use of historical data also ensures that the results of the analysis will be consistent with the experience of the local community 
(USGS 2018). Bulletin 17C incorporates new procedures on how to better include historical data in the analysis.
Paleoflood and botanical information can also be part of a statistical stream gauge analysis. Paleofloods are different from historical floods in that they are determined by geologic and physical evidence of past floods rather than human records or referenced from built infrastructure. Geomorphic surfaces, like terraces adjacent to rivers, can be used to place limits on flood discharges to estimate nonexceedance bounds. Paleoflood data are treated similarly to historical flood data for flood frequency analysis. Botanical information consists of vegetation that records evidence of flood(s) or stability of a geomorphic surface over time. Examples include corrasion scars, adventitious sprouts, tree age, and tree ring anomalies. For flood frequency analysis, it is common to describe botanical information as binomial-censored observations. Bulletin 17C includes guidance on how to incorporate this information.
For highway drainage design purposes, a statistical analysis of stream gauge data is typically applied only when adequate data from stream gauging stations are available. The definition of adequate data comes from USGS practice and is provided in Table 4-3.
| Desired percent chance exceedance (ARI) | Minimum record length (years) | 
|---|---|
| 10-year | 8 | 
| 25-year | 10 | 
| 50-year | 15 | 
| 100-year | 20 | 
For TxDOT application, sources for annual peak flow data include:
- USGS (NWIS).
- US Department of the Interior, USGS - Texas, Surface Water. These are prepared annually and contain records for 1 water year per publication. As a result, abstracting annual peaks for a long record is time consuming.
- water bulletins.
- River authority and municipal sources such as Lower Colorado River Authority (LCRA) .
If the available data sources allow the designer to construct a sufficiently large sample of annual peak streamflow values, then the following conditions must also be satisfied or accounted for before undertaking the statistical analysis:
- The data must be representative of the design condition of the watershed.
- The data must not be significantly affected by upstream regulation(such as, dams, reservoirs, and diversions).
- The systematic record must be stationary, with no general trend of increasing or decreasing flows resulting from changes to the watershed.
- The data must be homogeneous, with flow values resulting from the same types of events. If annual peak flows can result from either rainfall or snowmelt, then a mixed population analysis may be required.
- Errors in flow measurements must not be significant relative to other uncertainties in the analysis.