Spatial Stream Network Analysis of Nechako Watershed Stream Temperatures 2022b

0001-01-01

Draft: 2024-10-25 09:29:43

The suggested citation for this analytic appendix is:

Hill, N.H., Thorley, J.L., & Irvine, A. (2024) Spatial Stream Network Analysis of Nechako Watershed Stream Temperatures 2022b. A Poisson Consulting Analysis Appendix. URL: https://www.poissonconsulting.ca/f/1295467017.

Background

The primary goal of the current analyses is to answer the following question:

How can we model stream temperature to include spatial correlation through a stream network?

Data Preparation

Sub-hourly water temperature data collected in the Nechako Watershed in northern British Columbia between 2019 and 2021 were downloaded as csv files from Zenodo (Morris et al. 2022).

Hourly air temperature data (at two metres above ground level) for the Nechako Watershed for the years 2019-2021 were downloaded from the ERA-5-Land simulation using the Copernicus Climate Change Service (C3S) Climate Data Store as NetCDF files (Muñoz Sabater 2019).

Daily baseflow and surface runoff data for the Nechako Watershed for the years 2019-2021 using the ACCESS1-0_rcp85 climate scenario were downloaded from the Pacific Climate Impacts Consortium’s Gridded Hydrologic Model Output as NetCDF files (Victoria Pacific Climate Impacts Consortium 2020).

The data were prepared for analysis using R version 4.4.1 (R Core Team 2022).

Key assumptions of the data preparation included:

All stream temperature data are correct, except those flagged “Fail”, which were excluded from analysis (see Gilbert et al. (2022) for details).
The simulated air temperature and discharge data from their respective modeled simulations are reasonable approximations of the truth.
The daily discharge at each stream temperature site was calculated by taking the weighted average of the sum of the baseflow and runoff for the watershed area upstream of the site; these were then averaged over weekly periods for each site.
One site that corresponded to just four consecutive observed data points proved insufficient to capture the annual-scale fluctuations in stream temperature; this site (WHC) was dropped from the analysis.
There were 146 observations with negative stream temperatures measurements, which were all set to 0˚C.

Statistical Analysis

Model parameters were estimated using Bayesian methods. The estimates were produced using Stan (Carpenter et al. 2017). For additional information on Bayesian estimation the reader is referred to McElreath (2020).

Unless stated otherwise, the Bayesian analyses used weakly informative prior distributions (Gelman, Simpson, and Betancourt 2017). The posterior distributions were estimated from 1500 Markov Chain Monte Carlo (MCMC) samples thinned from the second halves of 3 chains (Kery and Schaub 2011, 38–40). Model convergence was confirmed by ensuring that the potential scale reduction factor \(\hat{R} \leq 1.05\) (Kery and Schaub 2011, 40) and the effective sample size (Brooks et al. 2011) \(\textrm{ESS} \geq 150\) for each of the monitored parameters (Kery and Schaub 2011, 61).

The sensitivity of the posteriors to the choice of prior distributions was evaluated by doubling the standard deviations of the priors and then using \(\hat{R}\) to evaluate whether the samples were drawn from the same posterior distribution (Thorley and Andrusak 2017).

The parameters are summarized in terms of the point estimate, lower and upper 95% compatibility limits (Rafi and Greenland 2020) and the surprisal s-value (Greenland 2019). Together a pair of lower and upper compatibility limits (CLs) are referred to as a compatibility interval (CI). The estimate is the median (50th percentile) of the MCMC samples while the 95% CLs are the 2.5th and 97.5th percentiles. The s-value indicates how surprising it would be to discover that the true value of the parameter is in the opposite direction to the estimate (Greenland 2019). An s-value of \(>\) 4.32 bits, which is equivalent to a p-value \(<\) 0.05 (Kery and Schaub 2011; Greenland and Poole 2013), indicates that the surprise would be equivalent to throwing at least 4.3 heads in a row on a fair coin.

Variable selection was based on the heuristic of directional certainty (Kery and Schaub 2011; Murtaugh 2014; Castilho and Prado 2021). Fixed effects were included if their s-value was \(>\) 4.32 bits (Kery and Schaub 2011). Based on a similar argument, random effects were included if their standard deviation had a lower 95% CL \(>\) 5% of the median estimate.

The analyses were implemented using R version 4.4.1 (R Core Team 2022) and the embr family of packages.

Model Descriptions

Stream Temperature

The data were analyzed using a Spatial Stream Network model (Ver Hoef and Peterson 2010; Peterson and Hoef 2010), with code adapted from the SSNbayes package (Santos-Fernandez et al. 2022). The necessary stream network distances and connectivity were calculated using the BC Freshwater Atlas. Air and stream temperature data were averaged by site and week; modeling was done on this weekly time scale.

The expected stream temperatures were modeled using the 4-parameter version of the air2stream model (Toffolon and Piccolroaz 2015). The average stream temperature (in ˚C) in the first week, \(W_{s,j=1}\) was estimated by the model and assumed to be the same for all sites.

For all subsequent weeks (i.e., \(j > 1\)), the change in the stream temperature (in ˚C) between week \(j - 1\) and week \(j\) for the \(s^{th}\) site, \(\Delta W_{s,j}\), was modeled as follows:

\[\begin{equation} \Delta W_{s,j} = \frac{1}{(\theta_{s,j})^{a4_s}}(a1_s + a2_s A_{s,j} - a3_s W_{s,j - 1}), \end{equation}\]

where \(a1_s\), \(a2_s\), \(a3_s\), and \(a4_s\) are the parameters of the air2stream model for the \(s^{th}\) site, \(A_{s,j}\) is the air temperature (in ˚C) for the \(s^{th}\) site in the \(j^{th}\) week, \(W_{s, j - 1}\) is the expected stream temperature at the \(s^{th}\) site in the previous week, and \(\theta_{s,j}\) is the dimensionless discharge for the \(s^{th}\) site in the \(j^{th}\) week. \(\theta_{s,j}\) was calculated as follows:

\[\begin{equation} \theta_{s,j} = \frac{d_{s,j}}{\bar{d_s}} \end{equation}\]

where \(d_{s,j}\) is the discharge for the \(s^{th}\) site in the \(j^{th}\) week, and \(\bar{d_s} = \frac{\sum_{j = 1}^{J}(d_{s,j})}{J}\) is the mean discharge across all \(J\) weeks for the \(s^{th}\) site.

The expected stream temperature for the \(s^{th}\) site in the \(j^{th}\) week was then calculated:

\[\begin{equation} W_{s,j} = W_{s,j - 1} + \Delta W_{s,j}. \end{equation}\]

Growing Season Degree Days (GSDD) are the accumulated thermal units (in ˚C) during the growing season based on the mean daily water temperature values, which is a useful predictor of age-0 rainbow and westslope cutthroat trout size at the beginning of winter. The start and end of the growing season were based on the definitions of Coleman and Fausch (2007):

Start: the beginning of the first week that average stream temperatures exceeded and remained above 5˚C for the season.
End: the last day of the first week that average stream temperature dropped below 4˚C.

GSDD were derived for each site and year by assuming that the daily stream temperatures at each site were the predicted weekly mean stream temperature for every day in the given week.

Key assumptions of the model include:

The stream network is dendritic, not braided.
The expected stream temperatures were set to 0˚C if they were estimated to be negative by the model.
The stream temperature in the first week is the same for all sites.
The parameters of the air2stream model (\(a1\), \(a2\), \(a3\), and \(a4\)) vary randomly by site.
The residual variation is multivariate normally distributed.
The covariance structure of the residual variation combines the following covariance components:
- Nugget (allows for variation at a single location)
- Exponential tail-down (allows for spatial dependence between flow-connected and flow-unconnected locations)

Preliminary analysis found that:

The exponential tail-down model was better at explaining the spatial correlation in the data than exponential tail-up or euclidean distance models (Ver Hoef and Peterson 2010; Peterson and Hoef 2010).
The full 8-parameter air2stream model did not converge.
Preliminary analysis found that allowing the initial stream temperature to vary randomly by site produced unrealistic stream temperatures for January (> 10˚C).

Model Templates

Stream Temperature

.data {
  int nsite;
  int nweek;

  int <lower=0> N_y_obs; // number observed values
  int <lower=0> N_y_mis; // number missing values
  int <lower=1> i_y_obs [N_y_obs] ;  // [N_y_obs,T]
  int <lower=1> i_y_mis [N_y_mis] ;  // [N_y_mis,T]
  vector [N_y_obs] y_obs;  // matrix[N_y_obs,1] y_obs[T];
  real discharge [nsite * nweek];
  real air_temp [nsite * nweek];
  int<lower=0> site[nsite * nweek];
  int<lower=0> week[nsite * nweek];

  matrix [nsite, nsite] D;
  matrix [nsite, nsite] I;
  matrix [nsite, nsite] H;
  matrix [nsite, nsite] flow_con_mat;

parameters {
  vector<lower=0, upper=30>[N_y_mis] y_mis; // declaring the missing y

  real<lower=0> sigma_nug; // sd of nugget effect
  real<lower=0> sigma_td; // sd of tail-down
  real<lower=0> alpha_td; // range of the tail-down model
  real<lower=0> bInitialTemp;
  real<lower=0> s1;
  real<lower=0> s2;
  real<lower=0> s3;
  real<lower=0> s4;
  real<lower=-5, upper=15> m1;
  real<lower=-5, upper=1.5> m2;
  real<lower=-5, upper=5> m3; 
  real<lower=-1, upper=1> m4;
  real<lower=-5, upper=15> a1[nsite];
  real<lower=-5, upper=1.5> a2[nsite]; 
  real<lower=-5, upper=5> a3[nsite]; 
  real<lower=-1, upper=1> a4[nsite];

transformed parameters {
  vector[nsite * nweek] y; // long vector of y
  vector[nsite] Y[nweek]; // array of y
  matrix[nsite, nsite] C_td; // tail-down cov
  real <lower=0> var_nug; // nugget
  real <lower=0> var_td; // partial sill tail-down

  vector[nsite * nweek] eTempDiff;
  vector<lower=0, upper=30>[nsite * nweek] eTemp;
  vector[nsite] mu [nweek];
  y[i_y_obs] = y_obs;
  y[i_y_mis] = y_mis;
  var_nug = sigma_nug^2; // variance nugget
  var_td = sigma_td^2; // variance tail-down
  // Place observations into matrices
  for (t in 1:nweek){
    Y[t] = y[((t - 1) * nsite + 1):(t * nsite)];
  }
  eTemp[1:nsite] = rep_vector(bInitialTemp, nsite);
  for (i in (nsite + 1):(nweek * nsite)) {
    eTempDiff[i] = (1/(discharge[i]^a4[site[i]])) * (a1[site[i]] + a2[site[i]] * air_temp[i] - a3[site[i]] * eTemp[i - nsite]);
    
    eTemp[i] = eTemp[i - nsite] + eTempDiff[i];
    if (eTemp[i] < 0) {
      eTemp[i] = 0.0;
    }
  }
  // Define 1st mu
  mu[1] = eTemp[1:nsite];
  // Define rest of mu; ----
  for (t in 2:nweek){
    mu[t] = eTemp[((t - 1) * nsite + 1):(t * nsite)];
  }
  // Covariance matrices ----
  // Tail-down exponential model
    for (i in 1:nsite) {
    for (j in 1:nsite) {
      if (flow_con_mat[i, j] == 1) { // if points are flow connected
        C_td[i, j] = var_td * exp(-3 * H[i, j] / alpha_td);
      }
      else{ // if points are flow unconnected
        C_td[i, j] = var_td * exp(-3 * (D[i, j] + D[j, i]) / alpha_td);
      }
    }
  }

model {
  sigma_nug ~ exponential(0.05); // sd nugget
  sigma_td ~ exponential(2); // sd tail-down
  alpha_td ~ normal(0, 20000) T[0, ]; // range tail-down
  bInitialTemp ~ normal(0, 0.1) T[0, ];

  s1 ~ exponential(50);
  s2 ~ exponential(50);
  s3 ~ exponential(50);
  s4 ~ exponential(50);
  m1 ~ normal(0.8, 1);
  m2 ~ normal(0.4, 1);
  m3 ~ normal(0.4, 1);
  m4 ~ normal(0.1, 1);
  a1 ~ normal(m1, s1);
  a2 ~ normal(m2, s2);
  a3 ~ normal(m3, s3);
  a4 ~ normal(m4, s4);

  for (t in 1:nweek) {
    target += multi_normal_cholesky_lpdf(Y[t] | mu[t], cholesky_decompose(C_td + var_nug * I + 1e-6));
  }

Block 1. Model description.

Results

Tables

Stream Temperature

Table 1. Parameter descriptions.

Parameter	Description
`C_td`	Covariance matrix of the tail-down exponential model
`D`	Downstream hydrologic distance matrix
`H`	Total hydrologic distance matrix
`I`	The identity matrix
`N_y_mis`	Number of missing water temperature values
`N_y_obs`	Number of observed water temperature values
`Y[t]`	Vector of water temperature values for all sites in the `t`^th week
`a1[i]`	Intercept-type parameter of the air2stream model for the `i`^th site
`a2[i]`	Effect of `air_temp[i]` on `eTempDiff[i]` for the `i`^th site
`a3[i]`	Effect of the previous week’s expected water temperature (`eTemp[i - nsite]`) on `eTempDiff[i]`, for the `i`^th site
`a4[i]`	Effect of `discharge[i]` on `eTempDiff[i]` for the `i`^th site
`air_temp[i]`	The `i`^th air temperature value (˚C)
`alpha_td`	The variance of spatially independent points
`bInitialTemp`	Expected average water temperature for the week starting 01-01-2019 for all sites
`discharge[i]`	Dimensionless discharge for the `i`^th observation (discharge for that observation divided by the mean discharge across all observations for that site)
`eTempDiff[i]`	Expected difference in average water temperature from the previous week
`eTemp[i]`	Expected value of `water_temp[i]`
`flow_con_mat`	Site connectivity matrix
`i_y_mis`	Indexes of missing water temperature values
`i_y_obs`	Indexes of observed water temperature values
`m1`	Mean of the site-wise random effect for the `a1` parameter
`m2`	Mean of the site-wise random effect for the `a2` parameter
`m3`	Mean of the site-wise random effect for the `a3` parameter
`m4`	Mean of the site-wise random effect for the `a4` parameter
`mu[t]`	Vector of `eTemp` values for all sites in the `t`^th week
`nsite`	Number of sites
`nweek`	Number of weeks
`s1`	Standard deviation of the site-wise random effect for the `a1` parameter
`s2`	Standard deviation of the site-wise random effect for the `a2` parameter
`s3`	Standard deviation of the site-wise random effect for the `a3` parameter
`s4`	Standard deviation of the site-wise random effect for the `a4` parameter
`sigma_nug`	Standard deviation of the nugget effect
`sigma_td`	Standard deviation of the exponential tail-down covariance model
`site[i]`	The `i`^th site
`var_nug`	Variance of the nugget effect
`var_td`	Variance of the exponential tail-down covariance model
`week[i]`	The `i`^th week
`y[i]`	The `i`^th water temperature value (˚C)
`y_mis`	Vector of missing water temperature values
`y_obs`	Vector of observed water temperature values

Table 2. Model coefficients.

term	estimate	lower	upper	svalue
alpha_td	9.77e+03	6.16e+03	1.69e+04	10.6
bInitialTemp	7.14e-02	4.01e-03	2.36e-01	10.6
m1	6.30e-01	4.65e-01	7.96e-01	10.6
m2	3.22e-01	2.67e-01	3.76e-01	10.6
m3	3.21e-01	2.73e-01	3.74e-01	10.6
m4	2.79e-01	1.80e-01	3.93e-01	10.6
s1	2.67e-01	2.07e-01	3.45e-01	10.6
s2	1.18e-01	9.08e-02	1.59e-01	10.6
s3	1.08e-01	8.46e-02	1.45e-01	10.6
s4	2.01e-01	1.55e-01	2.66e-01	10.6
sigma_nug	4.20e-01	3.85e-01	4.58e-01	10.6
sigma_td	1.37e+00	1.19e+00	1.59e+00	10.6

Table 3. Model convergence.

n	K	nchains	niters	nthin	ess	rhat	converged
3297	2126	3	500	5	543	1.013	TRUE

Table 4. Model sensitivity.

all	analysis	sensitivity	bound
all	1.013	1.076	1.132

Figures

Stream Temperature

figures/temperature-air2stream/water-temp.png — Figure 2. Predicted water temperature by date (with 95% CIs). The points are the observed data.

figures/temperature-air2stream/gsdd-annual-site.png — Figure 3. Predicted GSDD by year and site (with 95% CIs).

figures/temperature-air2stream/gsdd-map.png — Figure 4. GSDD median estimate and width of 95% CI by year and site. The black lines are the stream network.

Acknowledgements

The organisations and individuals whose contributions have made this analytic appendix possible include:

Hillcrest Geographics
- Simon Norris

References

Brooks, Steve, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng, eds. 2011. Handbook for Markov Chain Monte Carlo. Boca Raton: Taylor & Francis.

Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan : A Probabilistic Programming Language.” Journal of Statistical Software 76 (1). https://doi.org/10.18637/jss.v076.i01.

Castilho, Leonardo Braga, and Paulo In’acio Prado. 2021. “Towards a Pragmatic Use of Statistics in Ecology.” PeerJ 9 (September): e12090. https://doi.org/10.7717/peerj.12090.

Coleman, Mark A., and Kurt D. Fausch. 2007. “Cold Summer Temperature Limits Recruitment of Age-0 Cutthroat Trout in High-Elevation Colorado Streams.” Transactions of the American Fisheries Society 136 (5): 1231–44. https://doi.org/10.1577/T05-244.1.

Gelman, Andrew, Daniel Simpson, and Michael Betancourt. 2017. “The Prior Can Often Only Be Understood in the Context of the Likelihood.” Entropy 19 (10): 555. https://doi.org/10.3390/e19100555.

Gilbert, Derek E., Jeremy E. Morris, Anna R. Kaveney, and Stephen J. D’ery. 2022. “Sub-Hourly Water Temperature Data Collected Across the Nechako Watershed, 2019-2021.” Data in Brief 43 (August): 108425. https://doi.org/10.1016/j.dib.2022.108425.

Greenland, Sander. 2019. “Valid p -Values Behave Exactly as They Should: Some Misleading Criticisms of p -Values and Their Resolution With s -Values.” The American Statistician 73 (sup1): 106–14. https://doi.org/10.1080/00031305.2018.1529625.

Greenland, Sander, and Charles Poole. 2013. “Living with p Values: Resurrecting a Bayesian Perspective on Frequentist Statistics.” Epidemiology 24 (1): 62–68. https://doi.org/10.1097/EDE.0b013e3182785741.

Kery, Marc, and Michael Schaub. 2011. Bayesian Population Analysis Using WinBUGS : A Hierarchical Perspective. Boston: Academic Press. http://www.vogelwarte.ch/bpa.html.

McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. 2nd ed. CRC Texts in Statistical Science. Boca Raton: Taylor; Francis, CRC Press.

Morris, Jeremy, Derek Gilbert, Anna Kaveney, and Stephen D’ery. 2022. “Sub-Hourly Water Temperature Collected by UNBC’s Northern Hydrometeorology Group (NHG) Across the Nechako Watershed, 2019-2021.” https://doi.org/10.5281/zenodo.6426023.

Muñoz Sabater, J. 2019. “ERA5-Land Hourly Data from 1950 to Present.” Copernicus Climate Change Service (C3S) Climate Data Store (CDS). https://doi.org/10.24381/cds.e2161bac.

Murtaugh, Paul A. 2014. “In Defense of p Values.” Ecology 95 (3): 611–17. https://doi.org/10.1890/13-0590.1.

Peterson, Erin E., and Jay M. Ver Hoef. 2010. “A Mixed‐model Moving‐average Approach to Geostatistical Modeling in Stream Networks.” Ecology 91 (3): 644–51. https://doi.org/10.1890/08-1668.1.

R Core Team. 2022. “R: A Language and Environment for Statistical Computing.” Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Rafi, Zad, and Sander Greenland. 2020. “Semantic and Cognitive Tools to Aid Statistical Science: Replace Confidence and Significance by Compatibility and Surprise.” BMC Medical Research Methodology 20 (1): 244. https://doi.org/10.1186/s12874-020-01105-9.

Santos-Fernandez, Edgar, Jay M. Ver Hoef, Erin E. Peterson, James McGree, Daniel Isaak, and Kerrie Mengersen. 2022. “Bayesian Spatio-Temporal Models for Stream Networks.” Computational Statistics & Data Analysis 170 (June): 107446. https://doi.org/10.1016/j.csda.2022.107446.

Thorley, Joseph L., and Greg F. Andrusak. 2017. “The Fishing and Natural Mortality of Large, Piscivorous Bull Trout and Rainbow Trout in Kootenay Lake, British Columbia (2008–2013).” PeerJ 5 (January): e2874. https://doi.org/10.7717/peerj.2874.

Toffolon, Marco, and Sebastiano Piccolroaz. 2015. “A Hybrid Model for River Water Temperature as a Function of Air Temperature and Discharge.” Environmental Research Letters 10 (11): 114011. https://doi.org/10.1088/1748-9326/10/11/114011.

Ver Hoef, Jay M., and Erin E. Peterson. 2010. “A Moving Average Approach for Spatial Statistical Models of Stream Networks.” Journal of the American Statistical Association 105 (489): 6–18. https://doi.org/10.1198/jasa.2009.ap08248.

Victoria Pacific Climate Impacts Consortium, University of. 2020. “Gridded Hydrologic Model Output.” https://data.pacificclimate.org/portal/hydro_model_out/map/.