To calculate water consumption for each generated response we follow methods from “Making AI Less Thirsty” by Li et al. (2025). This requires determining water usage effectiveness (WUE) for both onsite and offsite of data centers and energy efficiency onsite (PUE) to combine to calculate the total water footprint: \[{WaterOperational} = E \cdot [{WUE}_{onsite} + {PUE} \cdot {WUE}_{offsite}]\] where E = energy consumed by AI and \[{WUE}_{offsite} = \frac{\sum_k b_{k,t} \cdot EWIF_{k}}{\sum_kb_{k,t}}\] where bk,t denotes the amount of electricity generated from fuel type k at time t for the grid serving the data center and EWIFk denotes the electricity water intensity factor for fuel type k.
To determine PUE (Power Usage Effectiveness, or energy efficiency of the data center) and WUEonsite (water efficiency onsite a data center) values, we use self-reported values by Microsoft for their data centers, as OpenAI reportedly leases Microsoft data centers for ChatGPT models (Microsoft 2025). Unfortunately, the locations they have values for are limited so we are forced to use high-level values for the Americas, Asia Pacific, Europe & Middle East & Africa, or Global. This is an area of uncertainty due to high-level values and use of assumptions as the specific, local-level data we desired was not available.
To determine WUEoffsite (water efficiency of energy production offsite) values, we found datasets containing the electricity mix of the energy grid by location and water efficiency for different energy production fuel types. For the energy grid, we found a dataset containing percentages of each fuel type in the energy grid by state in the U.S. in the year 2023 from the U.S. EPA's Emissions & Generation Resource Integrated Database (eGRID), a comprehensive source of data on almost all electric power generated in the U.S. This dataset is based on available plant-specific data for all U.S. electricity generating plants that provide power to the electric grid (U.S. Environmental Protection Agency 2025a). Outside of the U.S., we found yearly, country-level electricity generation data for over 200 geographies, collected from multi-country datasets and national sources for the year 2024 (Ember 2026). For water efficiency we found a dataset containing L/kWh values for the following fuel types: wind, solar, gas, geothermal, nuclear, coal, oil, hydropower, and biomass (Jin et al. 2019; Visualizing Energy 2023). These datasets were utilized in combination to calculate the electricity water intensity factor for WUEoffsite.
The next step is determining the Adjusted Water Impact of our users' water consumption. This is calculated for the aggregated data and scaled up data, rather than for each individual user. To calculate this we use the following formula from Wu et al. (2025): \[{AWI} = (W_{on} + W_{off}) \cdot {WSF}_b\] where Won and Woff refer to the water consumption on and offsite (calculated in the previous step) and WSFb refers to the water stress factor at the hydrological basin b. This water stress factor can be calculated with prioritization of short or long-term effects. For this project, we calculate a value for short, mid, and long-term for comparison. This means setting the discount rate $\gamma$ to the values 1% (long-term impact prioritization) and 10% (mid-term impact prioritization). For short-term impacts, the immediate water stress current impact can be calculated as: \[{WSF}_{b}^{short} = {WS}_{t_0,b}\] which denotes the water stress at basin b at time t0. For mid and long-term impacts, this variable can be calculated as below: \[{WSF}_b^{long} = \sum_{t \in T} w_t \cdot {WS}_{t,b}\] where WSt,b refers to the water stress at time t and hydrological basin b and wt accounts for the timescale we are concerned with. For our calculations, we consider the following years: 2019 as our baseline and 2030, 2050, and 2080 as our set of future years. To determine the hydrological basin, we query the Aqueduct 4.0 dataset to determine the watershed ID that is unique to each watershed basin (World Resources Institute 2024; Data Center Map 2024). Both current and projected water stress values for 2019, 2030, 2050, and 2080 can then be retrieved from the Aqueduct 4.0 dataset to complete the calculations of the Adjusted Water Impact (Kuzma et al. 2023). These AWI calculations are then completed for each watershed we have collected data from. As some counties are located within the same watershed their water consumption totals are combined for AWI calculations.
The Adjusted Water Impact metric from Wu et al. (2025) accounts for temporal variation in water stress by using a discounting approach inspired by environmental economics. We currently only include AWI calculations for the United States and Canada. We also chose to adopt the business as usual projection of water stress from Aqueduct 4.0 (as Wu et al. (2025) does).