API Reference

biolith.models.occu(site_covs: ~jax.Array, obs_covs: ~jax.Array, coords: ~jax.Array | None = None, ell: float = 1.0, false_positives_constant: bool = False, false_positives_unoccupied: bool = False, obs: ~jax.Array | None = None, prior_beta: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f99219ed370 with batch shape () and event shape ()>, prior_alpha: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f9923bbb9b0 with batch shape () and event shape ()>, regressor_occ: ~typing.Type[~biolith.regression.abstract.AbstractRegression] = <class 'biolith.regression.linear.LinearRegression'>, regressor_det: ~typing.Type[~biolith.regression.abstract.AbstractRegression] = <class 'biolith.regression.linear.LinearRegression'>, prior_prob_fp_constant: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Beta object at 0x7f9921bd2ba0 with batch shape () and event shape ()>, prior_prob_fp_unoccupied: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Beta object at 0x7f992190ce10 with batch shape () and event shape ()>, prior_gp_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f992190d590 with batch shape () and event shape ()>, prior_gp_length: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f9921a1ad70 with batch shape () and event shape ()>, site_random_effects: bool = False, obs_random_effects: bool = False, prior_site_re_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f9921a1aea0 with batch shape () and event shape ()>, prior_obs_re_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f9921892e70 with batch shape () and event shape ()>) None

Bernoulli occupancy model inspired by MacKenzie et al. (2002) with optional false positives inspired by Royle and Link (2006).

References

  • MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.

  • Royle, J.A., and W.A. Link. 2006. Generalized site occupancy models allowing for false positive and false negative errors. Ecology 87:835-841.

Parameters:
  • site_covs (jnp.ndarray) – An array of site-level covariates of shape (n_sites, n_site_covs).

  • obs_covs (jnp.ndarray) – An array of observation-level covariates of shape (n_sites, n_revisits, n_obs_covs).

  • coords (jnp.ndarray, optional) – Coordinates for a spatial random effect when provided.

  • ell (float) – Spatial kernel length scale used if coords is provided.

  • false_positives_constant (bool) – Flag indicating whether to model a constant false positive rate.

  • false_positives_unoccupied (bool) – Flag indicating whether to model false positives in unoccupied sites.

  • obs (jnp.ndarray, optional) – Observation matrix of shape (n_sites, n_revisits) or None.

  • prior_beta (numpyro.distributions.Distribution) – Prior distribution for the site-level regression coefficients.

  • prior_alpha (numpyro.distributions.Distribution) – Prior distribution for the observation-level regression coefficients.

  • regressor_occ (Type[AbstractRegression]) – Class for the occupancy regression model, defaults to LinearRegression.

  • regressor_det (Type[AbstractRegression]) – Class for the detection regression model, defaults to LinearRegression.

  • prior_prob_fp_constant (numpyro.distributions.Distribution) – Prior distribution for the constant false positive rate.

  • prior_prob_fp_unoccupied (numpyro.distributions.Distribution) – Prior distribution for the false positive rate in unoccupied sites.

  • prior_gp_sd (numpyro.distributions.Distribution) – Prior distribution for the spatial random effect scale.

  • prior_gp_length (numpyro.distributions.Distribution) – Prior distribution for the spatial kernel length scale.

  • site_random_effects (bool) – Flag indicating whether to include site-level random effects.

  • obs_random_effects (bool) – Flag indicating whether to include observation-level random effects.

  • prior_site_re_sd (numpyro.distributions.Distribution) – Prior distribution for the site-level random effect standard deviation.

  • prior_obs_re_sd (numpyro.distributions.Distribution) – Prior distribution for the observation-level random effect standard deviation.

Examples

>>> from biolith.models import occu, simulate
>>> from biolith.utils import fit
>>> data, _ = simulate()
>>> results = fit(occu, **data)
>>> print(results.samples['psi'].mean())
biolith.models.occu_cop(site_covs: ~jax.Array, obs_covs: ~jax.Array, coords: ~jax.Array | None = None, ell: float = 1.0, session_duration: ~jax.Array | None = None, false_positives_constant: bool = False, false_positives_unoccupied: bool = False, obs: ~jax.Array | None = None, prior_beta: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f99219187d0 with batch shape () and event shape ()>, prior_alpha: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f99219e3310 with batch shape () and event shape ()>, regressor_occ: ~typing.Type[~biolith.regression.abstract.AbstractRegression] = <class 'biolith.regression.linear.LinearRegression'>, regressor_det: ~typing.Type[~biolith.regression.abstract.AbstractRegression] = <class 'biolith.regression.linear.LinearRegression'>, prior_rate_fp_constant: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Exponential object at 0x7f99218b0590 with batch shape () and event shape ()>, prior_rate_fp_unoccupied: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Exponential object at 0x7f992190e710 with batch shape () and event shape ()>, prior_gp_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f991c7aabe0 with batch shape () and event shape ()>, prior_gp_length: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f991c7aacf0 with batch shape () and event shape ()>, site_random_effects: bool = False, obs_random_effects: bool = False, prior_site_re_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f992195c850 with batch shape () and event shape ()>, prior_obs_re_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f992195c750 with batch shape () and event shape ()>) None

Count occupancy model using a Poisson detection process, inspired by Pautrel et al. (2024).

References

  • Pautrel, L., Moulherat, S., Gimenez, O., & Etienne, M.-P. (2024). Analysing biodiversity observation data collected in continuous time: Should we use discrete- or continuous-time occupancy models? Methods in Ecology and Evolution, 15, 935–950.

Parameters:
  • site_covs (jnp.ndarray) – Per-site covariates, shape (n_sites, n_site_covs).

  • obs_covs (jnp.ndarray) – Observation covariates, shape (n_sites, time_periods, n_obs_covs).

  • coords (Optional[jnp.ndarray], optional) – Site coordinates for spatial effects, shape (n_sites, 2).

  • ell (float) – Spatial correlation length for the GP prior.

  • session_duration (Optional[jnp.ndarray], optional) – Duration of each sampling session, shape (n_sites, time_periods).

  • false_positives_constant (bool) – If True, model a constant false positive rate for all sites.

  • false_positives_unoccupied (bool) – If True, model a false positive rate only for unoccupied sites.

  • obs (Optional[jnp.ndarray], optional) – Observed counts, shape (n_sites, time_periods).

  • prior_beta (dist.Distribution) – Prior distribution for occupancy coefficients.

  • prior_alpha (dist.Distribution) – Prior distribution for detection coefficients.

  • regressor_occ (Type[AbstractRegression]) – Class for the occupancy regression model, defaults to LinearRegression.

  • regressor_det (Type[AbstractRegression]) – Class for the detection regression model, defaults to LinearRegression.

  • prior_rate_fp_constant (dist.Distribution) – Prior for the constant false positive rate parameter.

  • prior_rate_fp_unoccupied (dist.Distribution) – Prior for the false positive rate parameter in unoccupied sites.

  • prior_gp_sd (dist.Distribution) – Prior distribution for the spatial random effect scale.

  • prior_gp_length (dist.Distribution) – Prior distribution for the spatial kernel length scale.

  • site_random_effects (bool) – Flag indicating whether to include site-level random effects.

  • obs_random_effects (bool) – Flag indicating whether to include observation-level random effects.

  • prior_site_re_sd (dist.Distribution) – Prior distribution for the site-level random effect standard deviation.

  • prior_obs_re_sd (dist.Distribution) – Prior distribution for the observation-level random effect standard deviation.

Examples

>>> from biolith.models import occu_cop, simulate_cop
>>> from biolith.utils import fit
>>> data, _ = simulate_cop()
>>> results = fit(occu_cop, **data)
>>> print(results.samples['psi'].mean())
biolith.models.occu_cs(site_covs: ~jax.Array, obs_covs: ~jax.Array, coords: ~jax.Array | None = None, ell: float = 1.0, obs: ~jax.Array | None = None, prior_beta: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f9921a4a360 with batch shape () and event shape ()>, prior_alpha: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f9921a4a410 with batch shape () and event shape ()>, regressor_occ: ~typing.Type[~biolith.regression.abstract.AbstractRegression] = <class 'biolith.regression.linear.LinearRegression'>, regressor_det: ~typing.Type[~biolith.regression.abstract.AbstractRegression] = <class 'biolith.regression.linear.LinearRegression'>, prior_mu: ~numpyro.distributions.distribution.Distribution | ~typing.Tuple[~numpyro.distributions.distribution.Distribution] = <numpyro.distributions.continuous.Normal object at 0x7f991c7bd630 with batch shape () and event shape ()>, prior_sigma: ~numpyro.distributions.distribution.Distribution | ~typing.Tuple[~numpyro.distributions.distribution.Distribution] = <numpyro.distributions.continuous.Gamma object at 0x7f99218b1400 with batch shape () and event shape ()>, prior_gp_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f9921919c70 with batch shape () and event shape ()>, prior_gp_length: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f9921919d60 with batch shape () and event shape ()>, site_random_effects: bool = False, obs_random_effects: bool = False, prior_site_re_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f9921881b70 with batch shape () and event shape ()>, prior_obs_re_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f99218819b0 with batch shape () and event shape ()>) None

Continuous-score occupancy model inspired by Rhinehart et al. (2022), modeling classification scores as being drawn from true or false positive distributions.

References

  • Rhinehart, T. A., Turek, D., & Kitzes, J. (2022). A continuous-score occupancy model that incorporates uncertain machine learning output from autonomous biodiversity surveys. Methods in Ecology and Evolution, 13, 1778–1789.

Parameters:
  • site_covs (jnp.ndarray) – Site-level covariates (n_sites, n_site_covs).

  • obs_covs (jnp.ndarray) – Observation covariates (n_sites, time_periods, n_obs_covs).

  • coords (Optional[jnp.ndarray]) – Site coordinates for spatial effects (n_sites, 2) or None.

  • ell (float) – Optional distance matrix parameter.

  • obs (Optional[jnp.ndarray]) – Observations (n_sites, time_periods) or None.

  • prior_beta (dist.Distribution) – Prior for occupancy coefficients.

  • prior_alpha (dist.Distribution) – Prior for detection coefficients.

  • regressor_occ (Type[AbstractRegression]) – Class for the occupancy regression model, defaults to LinearRegression.

  • regressor_det (Type[AbstractRegression]) – Class for the detection regression model, defaults to LinearRegression.

  • prior_mu (dist.Distribution or Tuple[dist.Distribution]) – Prior for mean continuous scores.

  • prior_sigma (dist.Distribution or Tuple[dist.Distribution]) – Prior for standard deviations of continuous scores.

  • prior_gp_sd (dist.Distribution) – Prior distribution for the spatial random effect scale.

  • prior_gp_length (dist.Distribution) – Prior distribution for the spatial kernel length scale.

  • site_random_effects (bool) – Flag indicating whether to include site-level random effects.

  • obs_random_effects (bool) – Flag indicating whether to include observation-level random effects.

  • prior_site_re_sd (dist.Distribution) – Prior distribution for the site-level random effect standard deviation.

  • prior_obs_re_sd (dist.Distribution) – Prior distribution for the observation-level random effect standard deviation.

Examples

>>> from biolith.models import occu_cs, simulate_cs
>>> from biolith.utils import fit
>>> data, _ = simulate_cs()
>>> results = fit(occu_cs, **data)
>>> print(results.samples['psi'].mean())
biolith.models.occu_rn(site_covs: ~jax.Array, obs_covs: ~jax.Array, coords: ~jax.Array | None = None, ell: float = 1.0, false_positives_constant: bool = False, max_abundance: int = 100, obs: ~jax.Array | None = None, prior_beta: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f99219312b0 with batch shape () and event shape ()>, prior_alpha: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f9921931320 with batch shape () and event shape ()>, regressor_abu: ~typing.Type[~biolith.regression.abstract.AbstractRegression] = <class 'biolith.regression.linear.LinearRegression'>, regressor_det: ~typing.Type[~biolith.regression.abstract.AbstractRegression] = <class 'biolith.regression.linear.LinearRegression'>, prior_prob_fp_constant: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Beta object at 0x7f992190e5d0 with batch shape () and event shape ()>, prior_gp_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f9921a3b2b0 with batch shape () and event shape ()>, prior_gp_length: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f9921a03dd0 with batch shape () and event shape ()>, site_random_effects: bool = False, obs_random_effects: bool = False, prior_site_re_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f9921a03d10 with batch shape () and event shape ()>, prior_obs_re_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f9921a4a620 with batch shape () and event shape ()>) None

Occupancy model inspired by Royle and Nichols (2003), relating observations to the number of individuals present at a site.

References

  • Royle, J. A. and Nichols, J. D. (2003) Estimating Abundance from Repeated Presence-Absence Data or Point Counts. Ecology, 84(3) pp. 777–790.

Parameters:
  • site_covs (jnp.ndarray) – An array of site-level covariates of shape (n_sites, n_site_covs).

  • obs_covs (jnp.ndarray) – An array of observation-level covariates of shape (n_sites, n_revisits, n_obs_covs).

  • coords (jnp.ndarray, optional) – Coordinates for a spatial random effect when provided.

  • ell (float) – Spatial kernel length scale used if coords is provided.

  • false_positives_constant (bool) – Flag indicating whether to model a constant false positive rate.

  • max_abundance (int) – Maximum abundance cutoff for the Poisson distribution.

  • obs (jnp.ndarray, optional) – Observation matrix of shape (n_sites, n_revisits) or None.

  • prior_beta (numpyro.distributions.Distribution) – Prior distribution for the site-level regression coefficients.

  • prior_alpha (numpyro.distributions.Distribution) – Prior distribution for the observation-level regression coefficients.

  • regressor_abu (Type[AbstractRegression]) – Class for the abundance regression model, defaults to LinearRegression.

  • regressor_det (Type[AbstractRegression]) – Class for the detection regression model, defaults to LinearRegression.

  • prior_prob_fp_constant (numpyro.distributions.Distribution) – Prior distribution for the constant false positive rate.

  • prior_prob_fp_unoccupied (numpyro.distributions.Distribution) – Prior distribution for the false positive rate in unoccupied sites.

  • prior_gp_sd (numpyro.distributions.Distribution) – Prior distribution for the spatial random effect scale.

  • prior_gp_length (numpyro.distributions.Distribution) – Prior distribution for the spatial kernel length scale.

  • site_random_effects (bool) – Flag indicating whether to include site-level random effects.

  • obs_random_effects (bool) – Flag indicating whether to include observation-level random effects.

  • prior_site_re_sd (dist.Distribution) – Prior distribution for the site-level random effect standard deviation.

  • prior_obs_re_sd (dist.Distribution) – Prior distribution for the observation-level random effect standard deviation.

Examples

>>> from biolith.models import occu_rn, simulate_rn
>>> from biolith.utils import fit
>>> data, _ = simulate_rn()
>>> results = fit(occu_rn, **data)
>>> print(results.samples['abundance'].mean())
biolith.models.simulate(n_site_covs: int = 1, n_obs_covs: int = 1, n_sites: int = 100, deployment_days_per_site: int = 365, session_duration: int = 7, prob_fp_unoccupied: float = 0.0, prob_fp_constant: float = 0.0, simulate_missing: bool = False, min_occupancy: float = 0.25, max_occupancy: float = 0.75, min_observation_rate: float = 0.1, max_observation_rate: float = 0.5, random_seed: int = 0, spatial: bool = False, gp_sd: float = 1.0, gp_l: float = 0.2, site_random_effects: bool = False, obs_random_effects: bool = False, site_re_sd: float = 0.5, obs_re_sd: float = 0.3) tuple[dict, dict]

Generate a synthetic dataset for the occu() model.

Returns (data, true_params) suitable for fit().

Examples

>>> from biolith.models import simulate
>>> data, params = simulate()
>>> list(data.keys())
['site_covs', 'obs_covs', 'obs', 'coords', 'ell']
biolith.models.simulate_cop(n_site_covs: int = 1, n_obs_covs: int = 1, n_sites: int = 100, deployment_days_per_site: int = 365, session_duration: int = 7, simulate_missing: bool = False, min_occupancy: float = 0.25, max_occupancy: float = 0.75, min_observation_rate: float = 0.5, max_observation_rate: float = 10.0, random_seed: int = 0, spatial: bool = False, gp_sd: float = 1.0, gp_l: float = 0.2) tuple[dict, dict]

Simulate data for occu_cop().

Returns (data, true_params) for use with fit().

Examples

>>> from biolith.models import simulate_cop
>>> data, params = simulate_cop()
>>> sorted(data.keys())
['coords', 'ell', 'obs', 'obs_covs', 'session_duration', 'site_covs']
biolith.models.simulate_cs(n_site_covs: int = 1, n_obs_covs: int = 1, n_sites: int = 100, deployment_days_per_site: int = 365, session_duration: int = 7, simulate_missing: bool = False, min_occupancy: float = 0.25, max_occupancy: float = 0.75, random_seed: int = 0, spatial: bool = False, gp_sd: float = 1.0, gp_l: float = 0.2) tuple[dict, dict]

Simulate data for occu_cs().

Returns (data, true_params) for fit().

Examples

>>> from biolith.models import simulate_cs
>>> data, params = simulate_cs()
>>> sorted(data.keys())
['coords', 'ell', 'obs', 'obs_covs', 'site_covs']
biolith.models.simulate_rn(n_site_covs: int = 1, n_obs_covs: int = 1, n_sites: int = 100, deployment_days_per_site: int = 365, session_duration: int = 7, prob_fp: float = 0.0, simulate_missing: bool = False, min_occupancy: float = 0.25, max_occupancy: float = 0.75, min_observation_rate: float = 0.1, max_observation_rate: float = 0.5, random_seed: int = 0, spatial: bool = False, gp_sd: float = 1.0, gp_l: float = 0.2) tuple[dict, dict]

Simulate data for occu_rn().

Returns (data, true_params) for fit().

Examples

>>> from biolith.models import simulate_rn
>>> data, params = simulate_rn()
>>> sorted(data.keys())
['coords', 'ell', 'obs', 'obs_covs', 'site_covs']
biolith.utils.fit(model_fn: Callable, site_covs=None, obs_covs=None, obs=None, session_duration=None, num_samples: int = 1000, num_warmup: int = 1000, random_seed: int = 0, num_chains: int = 5, kernel: Literal['nuts', 'hmc', 'mixed_hmc', 'discrete_hmc_gibbs', 'hmcecs'] | None = None, timeout: int | None = None, **kwargs) FitResult

Fit a NumPyro model using the provided data.

Parameters:
  • model_fn – The model function to fit.

  • site_covs – Array or Pandas DataFrame containing site-level covariates.

  • obs_covs – Array or Pandas DataFrame containing observation-level covariates.

  • obs – Array or Pandas DataFrame containing the observed data.

  • session_duration – Array or Pandas DataFrame containing the session duration for each observation.

  • num_samples – Number of posterior samples to draw.

  • num_warmup – Number of warmup steps for the sampler.

  • random_seed – Seed used for the random number generator.

  • num_chains – Number of MCMC chains to run.

  • kernel – Name of the sampling kernel to use. Possible values include "nuts", "hmc", "mixed_hmc", "discrete_hmc_gibbs", or "hmcecs". Defaults to "nuts" for most models.

  • timeout – Optional timeout (in seconds) for the sampling step.

  • **kwargs – Additional keyword arguments passed to model_fn.

Returns:

A tuple-like object containing the posterior samples and the MCMC object itself.

Return type:

FitResult

Examples

>>> from biolith.models import simulate, occu
>>> from biolith.utils import fit
>>> data, _ = simulate()
>>> results = fit(occu, **data)
biolith.utils.predict(model_fn: Callable, mcmc, site_covs=None, obs_covs=None, obs=None, session_duration=None, num_samples: int = 1000, random_seed: int = 0, infer_discrete: bool = False, timeout: int | None = None, **kwargs) dict

Generate posterior predictive samples using a fitted model.

Parameters:
  • model_fn – The model function used for inference.

  • mcmc – An MCMC object returned by fit().

  • site_covs – Array or Pandas DataFrame containing site-level covariates.

  • obs_covs – Array or Pandas DataFrame containing observation-level covariates.

  • obs – Array or Pandas DataFrame containing the observed data.

  • session_duration – Array or Pandas DataFrame containing the session duration for each observation.

  • num_samples – Number of predictive draws to generate.

  • random_seed – Seed used for the random number generator.

  • infer_discrete – Whether to only sample discrete latent variables.

  • timeout – Optional timeout (in seconds) for the prediction step.

  • **kwargs – Additional keyword arguments passed to model_fn.

Returns:

Dictionary of posterior predictive samples.

Return type:

dict

Examples

>>> from biolith.models import simulate, occu
>>> from biolith.utils import fit, predict
>>> data, _ = simulate()
>>> results = fit(occu, **data, num_samples=10, num_warmup=10, num_chains=1)
>>> preds = predict(occu, results.mcmc, **data, num_samples=5)
biolith.evaluation.deviance(model_fn: Callable, posterior_samples: Dict[str, Array], **kwargs) float

Calculates deviance as a scoring rule following spOccupancy and Hooten and Hobbs (2015).

The deviance is calculated as:

\[D = -2 \log\left(\frac{1}{Q} \sum_{q=1}^{Q} \prod_{i=1}^{n} \prod_{j=1}^{J_i} p(y_{ij} | z_i^{(q)}, p_{ij}^{(q)})\right)\]

where \(Q\) is the number of posterior samples, \(n\) is the number of sites, \(J_i\) is the number of visits at site \(i\), \(y_{ij}\) is the observed detection at site \(i\) and visit \(j\), \(z_i^{(q)}\) is the latent occupancy state for site \(i\) in sample \(q\), and \(p_{ij}^{(q)}\) is the detection probability for site \(i\) and visit \(j\) in sample \(q\).

References

  • Hooten, M.B. and Hobbs, N.T. (2015), A guide to Bayesian model selection for ecologists. Ecological Monographs, 85: 3-28.

Parameters:
  • model_fn (The model function used to fit the data.)

  • posterior_samples (A dictionary containing posterior samples from a fitted model.)

  • observation_keys (A set of keys that represent the observed data in the posterior samples.)

  • **kwargs (Additional keyword arguments passed to the log_likelihood or model function.)

Returns:

float

Return type:

The deviance value as a scalar.

Examples

>>> from biolith.models import simulate, occu
>>> from biolith.utils import fit, predict
>>> from biolith.evaluation import deviance
>>> data, _ = simulate()
>>> results = fit(occu, **data)
>>> preds = predict(occu, results.mcmc, **data)
>>> dev = deviance(preds, data["obs"])
biolith.evaluation.deviance_manual(posterior_samples: Dict[str, Array], data: Dict[str, Array]) float

Calculates deviance manually for a non-false positive, Bernoulli occupancy model.

Parameters:
  • posterior_samples (A dictionary containing posterior samples from a fitted model.)

  • data (A dictionary containing observed data.)

Returns:

float

Return type:

The deviance value as a scalar.

Examples

>>> from biolith.models import occu, simulate
>>> from biolith.utils import fit, predict
>>> from biolith.evaluation import lppd_manual
>>> data, _ = simulate()
>>> results = fit(occu, **data)
>>> posterior_samples = predict(occu, results.mcmc, **data)
>>> dev = deviance_manual(posterior_samples, data)
biolith.evaluation.log_likelihood(model_fn: Callable, posterior_samples: Dict[str, Array], observation_keys: set = {'s', 'y'}, **kwargs) dict[str, Array]

Calculates the log likelihood of observations given a fitted model.

Parameters:
  • model_fn (The model function used to fit the data.)

  • posterior_samples (A dictionary containing posterior samples from a fitted model.)

  • observation_keys (A set of keys that represent the observed data in the posterior samples.)

  • **kwargs (Additional keyword arguments passed to the log_likelihood or model function.)

Returns:

dict[str, jnp.ndarray]

Return type:

The log likelihood for each sample and observation.

Examples

>>> from biolith.models import simulate, occu
>>> from biolith.utils import fit, predict
>>> from biolith.evaluation import lppd
>>> data, _ = simulate()
>>> results = fit(occu, **data)
>>> preds = predict(occu, results.mcmc, **data)
>>> log_likelihood(occu, preds, **data)
biolith.evaluation.lppd(model_fn: Callable, posterior_samples: Dict[str, Array], **kwargs) float

Calculates the log pointwise predictive density (lppd) for a fitted model.

The lppd is calculated as:

\[\text{lppd} = \sum_{i=1}^{n} \log\left(\frac{1}{Q} \sum_{q=1}^{Q} p(y_i | \theta^{(q)})\right)\]

where \(n\) is the number of sites, \(Q\) is the number of posterior samples, \(y_i\) is the observed detection history for site \(i\) across \(J_i\) revisits, and \(\theta^{(q)}\) represents the model parameters in posterior sample \(q\).

This function computes the lppd using the log likelihood of the model given the posterior samples and the observed data. It uses the log_likelihood function from numpyro to evaluate the model’s likelihood for the provided observations.

Parameters:
  • model_fn (The model function used to fit the data.)

  • posterior_samples (A dictionary containing posterior samples from a fitted model.)

  • **kwargs (Additional keyword arguments passed to the log_likelihood or model function.)

Returns:

float

Return type:

The log pointwise predictive density (lppd) value.

Examples

>>> from biolith.models import simulate, occu
>>> from biolith.utils import fit, predict
>>> from biolith.evaluation import lppd
>>> data, _ = simulate()
>>> results = fit(occu, **data)
>>> preds = predict(occu, results.mcmc, **data)
>>> lppd(occu, preds, **data)
biolith.evaluation.lppd_manual(posterior_samples: Dict[str, Array], data: Dict[str, Array]) float

Calculates the log pointwise predictive density (lppd) manually for a non-false positive, Bernoulli occupancy model.

Parameters:
  • posterior_samples (A dictionary containing posterior samples from a fitted model.)

  • data (A dictionary containing observed data.)

Returns:

float

Return type:

The log pointwise predictive density (lppd) value.

Examples

>>> from biolith.models import occu, simulate
>>> from biolith.utils import fit, predict
>>> from biolith.evaluation import lppd_manual
>>> data, _ = simulate()
>>> results = fit(occu, **data)
>>> posterior_samples = predict(occu, results.mcmc, **data)
>>> lppd_manual(posterior_samples, data)
biolith.evaluation.posterior_predictive_check(posterior_samples: Dict[str, Array], obs: Array, group_by: Literal['site', 'revisit'] = 'site', statistic: Literal['freeman-tukey', 'chi-squared'] = 'freeman-tukey') float

Performs posterior predictive checks of a Bayesian occupancy model.

The Bayesian p-value is calculated as:

\[p_B = P(T(y^{\text{rep}}, \theta) > T(y, \theta) | y)\]

where \(T(\cdot, \cdot)\) is the test statistic, \(y\) is the observed data, \(y^{\text{rep}}\) is replicated data from the posterior predictive distribution, and \(\theta\) represents the model parameters.

For Freeman-Tukey statistic:

\[T_{\text{FT}} = \sum (\sqrt{y} - \sqrt{E})^2\]

For Chi-squared statistic:

\[T_{\chi^2} = \sum \frac{(y - E)^2}{E}\]

where \(E\) is the expected value and the sum is over the grouping dimension.

This function calculates a Bayesian p-value to assess the goodness-of-fit of the model. It compares the observed data with replicated data generated from the model’s posterior predictive distribution using a specified discrepancy statistic.

The discrepancy is calculated for both the real and replicated data for each posterior sample. The Bayesian p-value is the proportion of times the discrepancy of the replicated data exceeds that of the observed data.

Parameters:
  • (dict) (posterior_samples) –

    call or a similar source. It must contain the following keys: - ‘y’: Replicated observation data from the posterior predictive

    distribution. Shape: (num_samples, num_revisits, num_sites).

    • ’psi’: Posterior samples for the occupancy probability.

      Shape: (num_samples, num_sites).

    • ’prob_detection’: Posterior samples for the detection probability. This is

      necessary to compute the expected values for the GOF tests. Shape: (num_samples, num_sites, num_revisits).

  • (jnp.ndarray) (obs)

  • (str) (statistic) – statistic. Must be either ‘site’ or ‘revisit’.

  • (str) – Must be either ‘freeman-tukey’ or ‘chi-squared’.

Returns:

float – model fit, while values near 0 or 1 suggest misfit.

Return type:

The Bayesian p-value. A value close to 0.5 suggests a good

Examples

>>> from biolith.models import simulate, occu
>>> from biolith.utils import fit, predict
>>> from biolith.evaluation import posterior_predictive_check
>>> data, _ = simulate()
>>> results = fit(occu, **data)
>>> preds = predict(occu, results.mcmc, **data)
>>> posterior_predictive_check(preds, data["obs"])
Raises:
  • ValueError – If ‘group_by’ or ‘statistic’ are not valid options.:

  • KeyError – If ‘posterior_samples’ is missing the required keys: (‘y’, ‘psi’, ‘prob_detection’).

biolith.evaluation.residuals(posterior_samples: Dict[str, Array], obs: Array) Tuple[Array, Array]

Calculates occupancy and detection residuals from posterior samples.

The residuals are calculated as:

Occupancy residuals:

\[o_i^{(q)} = z_i^{(q)} - \psi_i^{(q)}\]

Detection residuals:

\[d_{ij}^{(q)} = y_{ij} - p_{ij}^{(q)} \quad \text{conditional on} \quad z_i^{(q)} = 1\]

where \(z_i^{(q)}\) is the latent occupancy state for site \(i\) in posterior sample \(q\), \(\psi_i^{(q)}\) is the occupancy probability, \(y_{ij}\) is the observed detection at site \(i\) and visit \(j\), and \(p_{ij}^{(q)}\) is the detection probability.

This function implements the residual definitions from Wright et al. (2019) to help diagnose occupancy model fit. It separates residuals for the occupancy process from the detection process.

References

  • Wright, W. J., K. M. Irvine, and M. D. Higgs. 2019. Identifying occupancy model inadequacies: can residuals separately assess detection and presence? Ecology 100(6):e02703. 10.1002/ecy.2703

Parameters:
  • posterior_samples (A dictionary containing posterior samples from a fitted model.) – Must include ‘z’ (latent occupancy state), ‘psi’ (occupancy probability), and ‘prob_detection’ (detection probability).

  • obs (Ground truth observations of shape (n_sites, n_visits).)

Returns:

  • A tuple containing

  • - jnp.ndarray (Occupancy residuals of shape (n_samples, n_sites).)

  • - jnp.ndarray (Detection residuals of shape (n_samples, n_sites, n_visits).) – For a given posterior draw, residuals at sites considered unoccupied (z=0) are returned as np.nan.

Examples

>>> from biolith.models import simulate, occu
>>> from biolith.utils import fit, predict
>>> from biolith.evaluation import residuals
>>> data, _ = simulate()
>>> results = fit(occu, **data)
>>> preds = predict(occu, results.mcmc, **data)
>>> occ_res, det_res = residuals(preds, data["obs"])