PLUSWIND: A new hourly wind speed and generation database for US wind plants

In A Thousand Mile Walk to the Gulf, John Muir elegantly discussed the limits of scientific characterization of the wind, saying: “The substance of the winds is too thin for human eyes, their written language is too difficult for human minds, and their spoken language mostly too faint for the ears.” Over 100 years later, scientists still face challenges describing, tracking, and predicting wind speeds. These challenges are especially important given the vast energy generating capacity of the current United States wind fleet – wind power now provides roughly 10% of all US electricity generation – and the critical role wind will play in future decarbonization.

While extensive ground level wind speed monitoring exists, wind plants gather energy from wind far above ground, far above the public network of surface wind speed monitors. Observations of wind speeds at relevant heights for wind power generation (80 to 120 meters above the ground) are rare, though a limited number of tall towers and remote sensing measurements provide insight in certain locations. Plant operators track wind speeds and energy output over time, but these data is privately held and closely guarded as trade secrets. Therefore, scientists, developers, and others involved in energy markets often rely on meteorological models of wind speed to both understand past trends and develop near term forecasts of wind energy. But how accurately do these models represent wind speed trends and how does that accuracy impact estimates of wind generation? A new study and data repository released by Berkeley Lab, appearing in the journal Scientific Data, helps to shed light on some of these questions. The repository (called PLUSWIND) is publicly available and contains hourly wind speed and generation estimates covering 2018 – 2021 for existing wind plants located within the contiguous United States (Figure 1). PLUSWIND contains three separate estimates of wind speed and generation based on three commonly used meteorological models (MERRA2, ERA5, and HRRR). 

  Figure 1. Wind plants included in the PLUSWIND data repository (created with Google Maps).   An important benefit of the PLUSWIND repository is that it allows users quick and simple access to hourly wind speeds and estimated generation based on easy to use .csv files. This helps users avoid the time required to download, interpret, and process raw meteorological data from the different models. Generation for each plant is estimated based on the plant’s turbines characteristics. The data can be used to evaluate modeling strategies, explore trends in wind speed and energy generation, and can be paired with other data for further investigations. For example, generation trends can be paired with energy prices to explore trends in value over time.   A second key feature of the PLUSWIND repository is the validation included in the study – generation estimates are compared to recorded wind generation data across many regions. The study finds that accuracy varies widely at the individual plant level but is improved when aggregated to a regional level. Users of data from these meteorological models will need to determine if the accuracy described here is fit for their purpose. Figure 2 below shows the spread in annual bias across plants in the major electricity markets in the United States. In the ERCOT (Electric Reliability Council of Texas), MISO (Midcontinent Independent System Operator), PJM (the regional transmission operator for the mid-Atlantic region), and SPP (Southwest Power Pool) regions, HRRR and MERRA2 have relatively small annual biases in the median case, but ERA5 has a consistent low bias. All models have larger biases in other regions that have more complex topology, such as in California and the Northeast.
Figure 2. Mean normalized annual bias for individual plants across major electricity markets in the United States (for example, a value of -0.5 indicates that modeled generation was half of the recorded generation for the year). The black horizontal bar denotes the median value for each model-ISO combination. The regions represent electricity system operators, CAISO, the California Independent System Operator, ERCOT, the Electric Reliability Council of Texas, MISO, the Midcontinent Independent System Operator, PJM, the regional transmission operator for the mid-Atlantic region, SPP, the Southwest Power Pool covering a region ranging roughly from Oklahoma to North Dakota, ISO-NE, the Independent System Operator of New England, and NYISO, the New York Independent System Operator.   The study also examined the ability of the models to create realistic representations of the daily cycle of wind speed and energy production in each region. In this case, the HRRR model clearly provided the best results. Figure 3 below shows hourly average estimated and recorded total production in CAISO (California) and SPP (roughly Oklahoma to North Dakota). The key comparison in Figure 3 is the similarity (or lack of similarity) of the overall shape of the daily cycle of recorded and estimated generation (the total bias is best observed in Figure 2). In SPP we see that generation estimated based on both the HRRR and ERA5 models provide shifted, but similar shapes to the recorded daily wind generation in the region. MERRA2-based generation is not well correlated with recorded generation. In CAISO, only HRRR provided a reasonable approximation of the daily cycle in wind energy, while MERRA2 and ERA5 have strong low biases and provide little correlation with the daily shape of wind energy generation.
Figure 3. Seasonal and average daily variation of recorded and modeled generation for 2021 for SPP and CAISO. This figure should be used to compare the shape of the daily cycle but not overall bias (due to uncertainty in the set of plants included by each system operator report and because plants that began operation in 2021 are not modeled), overall bias is best assessed with Figure 2.    The full article contains comparisons across all market regions as well as additional types of comparisons to recorded generation. The article discusses parameterizations of losses and other modeling details. It is free and publicly available at:   Scientific Data article: Link to article   The Plant-Level US multi-model WIND and generation (PLUSWIND) data repository is publicly available as well. The repository is hosted at the US Department of Energy Wind Data Hub:    PLUSWIND data repository: http://doi.org/10.21947/1903602   For additional reading, please see two relevant prior articles published in the journal Wind Energy, “Limitations of reanalysis data for wind power applications,” and “What can surface wind observations tell us about interannual variation in wind energy output?”   We thank the Wind Energy Technology Office within the US Department of Energy for supporting this work.   Please reach out with questions about the data repository.   Regards, Dev Millstein