| Title: | Bayesian Regression Robustness Assessment Test |
|---|---|
| Description: | Tests for a linear relationship in the log ratio between an observed and simulated series and an independent variable. Typically this the error in modelled streamflow at an annual time scale, and a rainfall input. The approach allows for multiple sites as random factors and for multiple replicates of the simulated values. The approach is outlined in Gibbs et al. (2026) Hydrological Processes, 40(4) e70525. |
| Authors: | Matt Gibbs [aut, cre] |
| Maintainer: | Matt Gibbs <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.0.2 |
| Built: | 2026-05-27 06:04:36 UTC |
| Source: | https://github.com/csiro/hydro_brrat_package |
Function to apply the Bayesian Regression Robustness Assessment Test, to evaluate if model simulations are independent of the climate input used, suggesting a robust model.
This package makes use of Stan to fit the linear regression model, and the precompiled model created for the package with rstantools.
Maintainer: Matt Gibbs [email protected]
Stan Development Team (NA). RStan: the R interface to Stan. R package version 2.32.6. https://mc-stan.org
Gibbs et al. (2026) in review
Useful links:
Report bugs at https://github.com/csiro/hydro_BRRAT_Package/issues
Function to apply the BRRAT. A linear regression is fit between the error between the transformed Sim
and Obs data and the corresponding Clim data, to test for a dependent
relationship between the error and climate, indicating a model that is not
robust to the input climate data. Stan is used to fit the linear regression,
and hence a distribution of slope values are returned.
BRRAT(Sim, Obs, ...)BRRAT(Sim, Obs, ...)
Sim |
Data frame type containing time series of simulated values (typically annual streamflow). |
Obs |
Data frame type containing time series of observed values (typically annual streamflow and rainfall). |
... |
Arguments passed to
An optional |
Multiple catchments are supported through random factors in the hierarchical
linear regression model,indicated by different values in an ID column.
Multiple replicates of the Simulated time series are supported, matched based
on the year column.
a list with members:
rstan model object for testing of convergence etc
list of data in stan format used to fit the model, after transformation
order of random factors if used
samples of slope values from the posterior distribution
statistics indicating the probability that the slope is: greater than 0 prob_beta_gt0, less than 0 prob_beta_lt0, and non-zero p_nonzero
#generate 2 models of annual streamflow samples <- 50 #generate some annual rainfall totals Clim <- sort(pmax(0,rnorm(samples,600,100))) #calculate annual runoff from tanh curve observed <- pmax(0,(Clim - 150) - 450 * tanh((Clim - 150)/450)) Obs <- data.frame(P=Clim,Q=observed,year=1:length(Clim)) #non-robust model, underestimate the wetter years Sim1 <- observed*seq(1,0.7,length.out=length(observed))*rnorm(samples,1,0.05) Sim1 <- data.frame(Q=Sim1,year=1:length(Clim)) #robust model, only random error Sim2 <- observed*rnorm(samples,1,0.05) Sim2 <- data.frame(Q=Sim2,year=1:length(Clim)) #apply BRRAT test to both, reduce chains and iters to speed up example. fit1 <- BRRAT(Sim1,Obs,chains=1,iter=500) fit2 <- BRRAT(Sim2,Obs,chains=1,iter=500) dat <- data.frame(b = c(fit1$beta_draws,fit2$beta_draws), id = c(rep("fit1",length(fit1$beta_draws)), rep("fit2",length(fit2$beta_draws)))) #fit2 is centered on 0, fit1 has a negative slope ggplot2::ggplot(dat)+ ggplot2::geom_density(ggplot2::aes(b,fill=id)) #can also be viewed using the plotting function plot_BRRAT_population(fit1) plot_BRRAT_population(fit2) #Summarise statistics of the mean can be viewed using summarise_BRRAT(fit1) summarise_BRRAT(fit2)#generate 2 models of annual streamflow samples <- 50 #generate some annual rainfall totals Clim <- sort(pmax(0,rnorm(samples,600,100))) #calculate annual runoff from tanh curve observed <- pmax(0,(Clim - 150) - 450 * tanh((Clim - 150)/450)) Obs <- data.frame(P=Clim,Q=observed,year=1:length(Clim)) #non-robust model, underestimate the wetter years Sim1 <- observed*seq(1,0.7,length.out=length(observed))*rnorm(samples,1,0.05) Sim1 <- data.frame(Q=Sim1,year=1:length(Clim)) #robust model, only random error Sim2 <- observed*rnorm(samples,1,0.05) Sim2 <- data.frame(Q=Sim2,year=1:length(Clim)) #apply BRRAT test to both, reduce chains and iters to speed up example. fit1 <- BRRAT(Sim1,Obs,chains=1,iter=500) fit2 <- BRRAT(Sim2,Obs,chains=1,iter=500) dat <- data.frame(b = c(fit1$beta_draws,fit2$beta_draws), id = c(rep("fit1",length(fit1$beta_draws)), rep("fit2",length(fit2$beta_draws)))) #fit2 is centered on 0, fit1 has a negative slope ggplot2::ggplot(dat)+ ggplot2::geom_density(ggplot2::aes(b,fill=id)) #can also be viewed using the plotting function plot_BRRAT_population(fit1) plot_BRRAT_population(fit2) #Summarise statistics of the mean can be viewed using summarise_BRRAT(fit1) summarise_BRRAT(fit2)
Writes the output of summarise_BRRAT() to two CSV files: one for the
population slope (file_global) and one for per-site coefficients (file_per_id).
export_BRRAT_csv(fit_obj, file_global, file_per_id, level = 0.95)export_BRRAT_csv(fit_obj, file_global, file_per_id, level = 0.95)
fit_obj |
A fitted object returned by |
file_global |
Path/filename for the global table CSV. No default is
provided; the user must supply an explicit path (e.g. via |
file_per_id |
If applicable to the model, path/filename for the per-site table CSV. No default is provided; the user must supply an explicit path. |
level |
Credible interval level passed to |
Export summary tables to CSV files
(Invisibly) a list with the two data frames written:
list(global = <data.frame>, per_id = <data.frame>).
set.seed(1) samples <- 30 Clim <- sort(pmax(0, rnorm(samples, 600, 100))) observed <- pmax(0, (Clim - 150) - 450 * tanh((Clim - 150) / 450)) Obs <- data.frame(P = Clim, Q = observed, year = 1:samples) Sim <- data.frame(Q = observed * rnorm(samples, 1, 0.05), year = 1:samples) res <- BRRAT(Sim, Obs, chains = 1, iter = 500) export_BRRAT_csv(res, file_global = tempfile(fileext = ".csv"), file_per_id = tempfile(fileext = ".csv"), level = 0.95 )set.seed(1) samples <- 30 Clim <- sort(pmax(0, rnorm(samples, 600, 100))) observed <- pmax(0, (Clim - 150) - 450 * tanh((Clim - 150) / 450)) Obs <- data.frame(P = Clim, Q = observed, year = 1:samples) Sim <- data.frame(Q = observed * rnorm(samples, 1, 0.05), year = 1:samples) res <- BRRAT(Sim, Obs, chains = 1, iter = 500) export_BRRAT_csv(res, file_global = tempfile(fileext = ".csv"), file_per_id = tempfile(fileext = ".csv"), level = 0.95 )
Plots site-specific regression lines using draws of intercept_id[j] and
slope_id[j] for each site ID. Requires more than 1 ID in the initial
regression by BRRAT(). The output overlays lines and credible bands
across sites; optionally facetted into separate panels for readability.
plot_BRRAT_by_id(fit_obj, n_grid = 150, level = 0.8, facet = TRUE)plot_BRRAT_by_id(fit_obj, n_grid = 150, level = 0.8, facet = TRUE)
fit_obj |
A fitted object returned by |
n_grid |
Integer; number of x points per site for the prediction grids
(default |
level |
Credible interval level (default |
facet |
Logical; if |
Plot per-site lines (random intercepts & slopes) with credible intervals
For site , the line is computed as:
evaluated on a grid of x values transformed to the centered/scaled space used
during fitting. The function reconstructs equal-tail credible bands by propagating
posterior draws of intercept_id[j] and slope_id[j].
A ggplot2 object showing per-site lines and credible intervals.
set.seed(1) samples <- 30 make_site <- function(id) { Clim <- sort(pmax(0, rnorm(samples, 600, 100))) observed <- pmax(0, (Clim - 150) - 450 * tanh((Clim - 150) / 450)) list( Obs = data.frame(P = Clim, Q = observed, year = 1:samples, ID = id), Sim = data.frame(Q = observed * rnorm(samples, 1, 0.05), year = 1:samples, ID = id) ) } s1 <- make_site("A") s2 <- make_site("B") Obs <- rbind(s1$Obs, s2$Obs) Sim <- rbind(s1$Sim, s2$Sim) res <- BRRAT(Sim, Obs, chains = 1, iter = 500) p_sites <- plot_BRRAT_by_id(res, n_grid = 150, level = 0.80, facet = TRUE) print(p_sites)set.seed(1) samples <- 30 make_site <- function(id) { Clim <- sort(pmax(0, rnorm(samples, 600, 100))) observed <- pmax(0, (Clim - 150) - 450 * tanh((Clim - 150) / 450)) list( Obs = data.frame(P = Clim, Q = observed, year = 1:samples, ID = id), Sim = data.frame(Q = observed * rnorm(samples, 1, 0.05), year = 1:samples, ID = id) ) } s1 <- make_site("A") s2 <- make_site("B") Obs <- rbind(s1$Obs, s2$Obs) Sim <- rbind(s1$Sim, s2$Sim) res <- BRRAT(Sim, Obs, chains = 1, iter = 500) p_sites <- plot_BRRAT_by_id(res, n_grid = 150, level = 0.80, facet = TRUE) print(p_sites)
Plots the population-level regression line and its
equal-tail credible band by transforming the x-grid to the centered/scaled space
used during fitting. The plot overlays a (potentially downsampled) scatter of the
original observations for context.
plot_BRRAT_population(fit_obj, n_grid = 200, level = 0.95)plot_BRRAT_population(fit_obj, n_grid = 200, level = 0.95)
fit_obj |
A fitted object returned by |
n_grid |
Integer; number of x points for the prediction grid (default |
level |
Credible interval level for the band (default |
Plot population-level regression Qe ~ P with uncertainty band
The function extracts posterior draws of alpha and beta from the Stan fit and
evaluates along a grid of x values on the original
scale, internally converting to the centered/scaled space via
.
A ggplot2 object showing the population regression line and credible band.
set.seed(1) samples <- 30 Clim <- sort(pmax(0, rnorm(samples, 600, 100))) observed <- pmax(0, (Clim - 150) - 450 * tanh((Clim - 150) / 450)) Obs <- data.frame(P = Clim, Q = observed, year = 1:samples) Sim <- data.frame(Q = observed * rnorm(samples, 1, 0.05), year = 1:samples) res <- BRRAT(Sim, Obs, chains = 1, iter = 500) p_pop <- plot_BRRAT_population(res, n_grid = 200, level = 0.95) print(p_pop)set.seed(1) samples <- 30 Clim <- sort(pmax(0, rnorm(samples, 600, 100))) observed <- pmax(0, (Clim - 150) - 450 * tanh((Clim - 150) / 450)) Obs <- data.frame(P = Clim, Q = observed, year = 1:samples) Sim <- data.frame(Q = observed * rnorm(samples, 1, 0.05), year = 1:samples) res <- BRRAT(Sim, Obs, chains = 1, iter = 500) p_pop <- plot_BRRAT_population(res, n_grid = 200, level = 0.95) print(p_pop)
Computes tidy posterior summaries for the population slope (beta) and
per-site intercepts and slopes (from intercept_id and slope_id generated
quantities) returned by the Stan fit. The function reports posterior means,
standard deviations, equal-tail credible intervals at a chosen level, and a
two-sided tail probability for non-zero slope at the population level.
summarise_BRRAT(fit_obj, level = 0.95)summarise_BRRAT(fit_obj, level = 0.95)
fit_obj |
A fitted object returned by |
level |
Credible interval level (default |
Summarise global slope and per-site coefficients into tidy tables
The two-sided tail probability is computed as:
which is analogous to a two-sided p-value in spirit, but is a Bayesian posterior tail probability. It shrinks toward 0 as the posterior mass concentrates away from zero.
A list with one or two data frames:
global: one row summarising the population slope beta (mean, sd,
lower/upper credible interval, prob_gt0, prob_lt0, and p_nonzero).
per_id: if applicable (more than one site ID in the data) one
row per site ID with posterior summaries for
intercept_id and slope_id (mean, sd, lower/upper credible intervals).
Assumes the Stan model emits intercept_id and slope_id in generated quantities,
i.e., per-site fully-composed coefficients:
set.seed(1) samples <- 30 Clim <- sort(pmax(0, rnorm(samples, 600, 100))) observed <- pmax(0, (Clim - 150) - 450 * tanh((Clim - 150) / 450)) Obs <- data.frame(P = Clim, Q = observed, year = 1:samples) Sim <- data.frame(Q = observed * rnorm(samples, 1, 0.05), year = 1:samples) res <- BRRAT(Sim, Obs, chains = 1, iter = 500) tabs <- summarise_BRRAT(res, level = 0.95) tabs$global head(tabs$per_id)set.seed(1) samples <- 30 Clim <- sort(pmax(0, rnorm(samples, 600, 100))) observed <- pmax(0, (Clim - 150) - 450 * tanh((Clim - 150) / 450)) Obs <- data.frame(P = Clim, Q = observed, year = 1:samples) Sim <- data.frame(Q = observed * rnorm(samples, 1, 0.05), year = 1:samples) res <- BRRAT(Sim, Obs, chains = 1, iter = 500) tabs <- summarise_BRRAT(res, level = 0.95) tabs$global head(tabs$per_id)