API Reference

biolith.models.occu(site_covs: ~jax.Array, obs_covs: ~jax.Array, coords: ~jax.Array | None = None, ell: float = 1.0, false_positives_constant: bool = False, false_positives_unoccupied: bool = False, obs: ~jax.Array | None = None, prior_beta: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f5352bb9e50 with batch shape () and event shape ()>, prior_alpha: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f5352dffa80 with batch shape () and event shape ()>, prior_prob_fp_constant: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Beta object at 0x7f5352de4ad0 with batch shape () and event shape ()>, prior_prob_fp_unoccupied: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Beta object at 0x7f5352aec910 with batch shape () and event shape ()>, prior_gp_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f5352aed090 with batch shape () and event shape ()>, prior_gp_length: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f5352c462c0 with batch shape () and event shape ()>) None

Bernoulli occupancy model inspired by MacKenzie et al. (2002) with optional false positives inspired by Royle and Link (2006).

References

  • MacKenzie, D. I., J. D. Nichols, G. B. Lachman, S. Droege, J. Andrew Royle, and C. A. Langtimm. 2002. Estimating Site Occupancy Rates When Detection Probabilities Are Less Than One. Ecology 83: 2248-2255.

  • Royle, J.A., and W.A. Link. 2006. Generalized site occupancy models allowing for false positive and false negative errors. Ecology 87:835-841.

Parameters:
  • site_covs (jnp.ndarray) – An array of site-level covariates of shape (n_sites, n_site_covs).

  • obs_covs (jnp.ndarray) – An array of observation-level covariates of shape (n_sites, n_revisits, n_obs_covs).

  • coords (jnp.ndarray, optional) – Coordinates for a spatial random effect when provided.

  • ell (float) – Spatial kernel length scale used if coords is provided.

  • false_positives_constant (bool) – Flag indicating whether to model a constant false positive rate.

  • false_positives_unoccupied (bool) – Flag indicating whether to model false positives in unoccupied sites.

  • obs (jnp.ndarray, optional) – Observation matrix of shape (n_sites, n_revisits) or None.

  • prior_beta (numpyro.distributions.Distribution) – Prior distribution for the site-level regression coefficients.

  • prior_alpha (numpyro.distributions.Distribution) – Prior distribution for the observation-level regression coefficients.

  • prior_prob_fp_constant (numpyro.distributions.Distribution) – Prior distribution for the constant false positive rate.

  • prior_prob_fp_unoccupied (numpyro.distributions.Distribution) – Prior distribution for the false positive rate in unoccupied sites.

  • prior_gp_sd (numpyro.distributions.Distribution) – Prior distribution for the spatial random effect scale.

  • prior_gp_length (numpyro.distributions.Distribution) – Prior distribution for the spatial kernel length scale.

Examples

>>> from biolith.models import occu, simulate
>>> from biolith.utils import fit
>>> data, _ = simulate()
>>> results = fit(occu, **data)
>>> print(results.samples['psi'].mean())
biolith.models.occu_cop(site_covs: ~jax.Array, obs_covs: ~jax.Array, coords: ~jax.Array | None = None, ell: float = 1.0, session_duration: ~jax.Array | None = None, false_positives_constant: bool = False, false_positives_unoccupied: bool = False, obs: ~jax.Array | None = None, prior_beta: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f535018e7a0 with batch shape () and event shape ()>, prior_alpha: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f535018e580 with batch shape () and event shape ()>, prior_rate_fp_constant: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Exponential object at 0x7f5352de7770 with batch shape () and event shape ()>, prior_rate_fp_unoccupied: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Exponential object at 0x7f5352aee350 with batch shape () and event shape ()>, prior_gp_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f5352c46060 with batch shape () and event shape ()>, prior_gp_length: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f5352bc6690 with batch shape () and event shape ()>) None

Count occupancy model using a Poisson detection process, inspired by Pautrel et al. (2024).

References

  • Pautrel, L., Moulherat, S., Gimenez, O., & Etienne, M.-P. (2024). Analysing biodiversity observation data collected in continuous time: Should we use discrete- or continuous-time occupancy models? Methods in Ecology and Evolution, 15, 935–950.

Parameters:
  • site_covs (jnp.ndarray) – Per-site covariates, shape (n_sites, n_site_covs).

  • obs_covs (jnp.ndarray) – Observation covariates, shape (n_sites, time_periods, n_obs_covs).

  • coords (Optional[jnp.ndarray], optional) – Site coordinates for spatial effects, shape (n_sites, 2).

  • ell (float) – Spatial correlation length for the GP prior.

  • session_duration (Optional[jnp.ndarray], optional) – Duration of each sampling session, shape (n_sites, time_periods).

  • false_positives_constant (bool) – If True, model a constant false positive rate for all sites.

  • false_positives_unoccupied (bool) – If True, model a false positive rate only for unoccupied sites.

  • obs (Optional[jnp.ndarray], optional) – Observed counts, shape (n_sites, time_periods).

  • prior_beta (dist.Distribution) – Prior distribution for occupancy coefficients.

  • prior_alpha (dist.Distribution) – Prior distribution for detection coefficients.

  • prior_rate_fp_constant (dist.Distribution) – Prior for the constant false positive rate parameter.

  • prior_rate_fp_unoccupied (dist.Distribution) – Prior for the false positive rate parameter in unoccupied sites.

  • prior_gp_sd (dist.Distribution) – Prior distribution for the spatial random effect scale.

  • prior_gp_length (dist.Distribution) – Prior distribution for the spatial kernel length scale.

Examples

>>> from biolith.models import occu_cop, simulate_cop
>>> from biolith.utils import fit
>>> data, _ = simulate_cop()
>>> results = fit(occu_cop, **data)
>>> print(results.samples['psi'].mean())
biolith.models.occu_cs(site_covs: ~jax.Array, obs_covs: ~jax.Array, coords: ~jax.Array | None = None, ell: float = 1.0, obs: ~jax.Array | None = None, prior_beta: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f5352afc230 with batch shape () and event shape ()>, prior_alpha: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f5352afd7c0 with batch shape () and event shape ()>, prior_mu: ~numpyro.distributions.distribution.Distribution | ~typing.Tuple[~numpyro.distributions.distribution.Distribution] = <numpyro.distributions.continuous.Normal object at 0x7f5352bfe7b0 with batch shape () and event shape ()>, prior_sigma: ~numpyro.distributions.distribution.Distribution | ~typing.Tuple[~numpyro.distributions.distribution.Distribution] = <numpyro.distributions.continuous.Gamma object at 0x7f5352aa86e0 with batch shape () and event shape ()>, prior_gp_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f535018e9c0 with batch shape () and event shape ()>, prior_gp_length: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f535018ead0 with batch shape () and event shape ()>) None

Continuous-score occupancy model inspired by Rhinehart et al. (2022), modeling classification scores as being drawn from true or false positive distributions.

References

  • Rhinehart, T. A., Turek, D., & Kitzes, J. (2022). A continuous-score occupancy model that incorporates uncertain machine learning output from autonomous biodiversity surveys. Methods in Ecology and Evolution, 13, 1778–1789.

Parameters:
  • site_covs (jnp.ndarray) – Site-level covariates (n_sites, n_site_covs).

  • obs_covs (jnp.ndarray) – Observation covariates (n_sites, time_periods, n_obs_covs).

  • coords (Optional[jnp.ndarray]) – Site coordinates for spatial effects (n_sites, 2) or None.

  • ell (float) – Optional distance matrix parameter.

  • obs (Optional[jnp.ndarray]) – Observations (n_sites, time_periods) or None.

  • prior_beta (dist.Distribution) – Prior for occupancy coefficients.

  • prior_alpha (dist.Distribution) – Prior for detection coefficients.

  • prior_mu (dist.Distribution or Tuple[dist.Distribution]) – Prior for mean continuous scores.

  • prior_sigma (dist.Distribution or Tuple[dist.Distribution]) – Prior for standard deviations of continuous scores.

  • prior_gp_sd (dist.Distribution) – Prior distribution for the spatial random effect scale.

  • prior_gp_length (dist.Distribution) – Prior distribution for the spatial kernel length scale.

Examples

>>> from biolith.models import occu_cs, simulate_cs
>>> from biolith.utils import fit
>>> data, _ = simulate_cs()
>>> results = fit(occu_cs, **data)
>>> print(results.samples['psi'].mean())
biolith.models.occu_rn(site_covs: ~jax.Array, obs_covs: ~jax.Array, coords: ~jax.Array | None = None, ell: float = 1.0, false_positives_constant: bool = False, max_abundance: int = 100, obs: ~jax.Array | None = None, prior_beta: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f5352c17350 with batch shape () and event shape ()>, prior_alpha: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Normal object at 0x7f5352c17650 with batch shape () and event shape ()>, prior_prob_fp_constant: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.Beta object at 0x7f5352aee490 with batch shape () and event shape ()>, prior_gp_sd: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f5352b2c650 with batch shape () and event shape ()>, prior_gp_length: ~numpyro.distributions.distribution.Distribution = <numpyro.distributions.continuous.HalfNormal object at 0x7f5352b2c350 with batch shape () and event shape ()>) None

Occupancy model inspired by Royle and Nichols (2003), relating observations to the number of individuals present at a site.

References

  • Royle, J. A. and Nichols, J. D. (2003) Estimating Abundance from Repeated Presence-Absence Data or Point Counts. Ecology, 84(3) pp. 777–790.

Parameters:
  • site_covs (jnp.ndarray) – An array of site-level covariates of shape (n_sites, n_site_covs).

  • obs_covs (jnp.ndarray) – An array of observation-level covariates of shape (n_sites, n_revisits, n_obs_covs).

  • coords (jnp.ndarray, optional) – Coordinates for a spatial random effect when provided.

  • ell (float) – Spatial kernel length scale used if coords is provided.

  • false_positives_constant (bool) – Flag indicating whether to model a constant false positive rate.

  • max_abundance (int) – Maximum abundance cutoff for the Poisson distribution.

  • obs (jnp.ndarray, optional) – Observation matrix of shape (n_sites, n_revisits) or None.

  • prior_beta (numpyro.distributions.Distribution) – Prior distribution for the site-level regression coefficients.

  • prior_alpha (numpyro.distributions.Distribution) – Prior distribution for the observation-level regression coefficients.

  • prior_prob_fp_constant (numpyro.distributions.Distribution) – Prior distribution for the constant false positive rate.

  • prior_prob_fp_unoccupied (numpyro.distributions.Distribution) – Prior distribution for the false positive rate in unoccupied sites.

  • prior_gp_sd (numpyro.distributions.Distribution) – Prior distribution for the spatial random effect scale.

  • prior_gp_length (numpyro.distributions.Distribution) – Prior distribution for the spatial kernel length scale.

Examples

>>> from biolith.models import occu_rn, simulate_rn
>>> from biolith.utils import fit
>>> data, _ = simulate_rn()
>>> results = fit(occu_rn, **data)
>>> print(results.samples['abundance'].mean())
biolith.models.simulate(n_site_covs: int = 1, n_obs_covs: int = 1, n_sites: int = 100, deployment_days_per_site: int = 365, session_duration: int = 7, prob_fp_unoccupied: float = 0.0, prob_fp_constant: float = 0.0, simulate_missing: bool = False, min_occupancy: float = 0.25, max_occupancy: float = 0.75, min_observation_rate: float = 0.1, max_observation_rate: float = 0.5, random_seed: int = 0, spatial: bool = False, gp_sd: float = 1.0, gp_l: float = 0.2) tuple[dict, dict]

Generate a synthetic dataset for the occu() model.

Returns (data, true_params) suitable for fit().

Examples

>>> from biolith.models import simulate
>>> data, params = simulate()
>>> list(data.keys())
['site_covs', 'obs_covs', 'obs', 'coords', 'ell']
biolith.models.simulate_cop(n_site_covs: int = 1, n_obs_covs: int = 1, n_sites: int = 100, deployment_days_per_site: int = 365, session_duration: int = 7, simulate_missing: bool = False, min_occupancy: float = 0.25, max_occupancy: float = 0.75, min_observation_rate: float = 0.5, max_observation_rate: float = 10.0, random_seed: int = 0, spatial: bool = False, gp_sd: float = 1.0, gp_l: float = 0.2) tuple[dict, dict]

Simulate data for occu_cop().

Returns (data, true_params) for use with fit().

Examples

>>> from biolith.models import simulate_cop
>>> data, params = simulate_cop()
>>> sorted(data.keys())
['coords', 'ell', 'obs', 'obs_covs', 'session_duration', 'site_covs']
biolith.models.simulate_cs(n_site_covs: int = 1, n_obs_covs: int = 1, n_sites: int = 100, deployment_days_per_site: int = 365, session_duration: int = 7, simulate_missing: bool = False, min_occupancy: float = 0.25, max_occupancy: float = 0.75, random_seed: int = 0, spatial: bool = False, gp_sd: float = 1.0, gp_l: float = 0.2) tuple[dict, dict]

Simulate data for occu_cs().

Returns (data, true_params) for fit().

Examples

>>> from biolith.models import simulate_cs
>>> data, params = simulate_cs()
>>> sorted(data.keys())
['coords', 'ell', 'obs', 'obs_covs', 'site_covs']
biolith.models.simulate_rn(n_site_covs: int = 1, n_obs_covs: int = 1, n_sites: int = 100, deployment_days_per_site: int = 365, session_duration: int = 7, prob_fp: float = 0.0, simulate_missing: bool = False, min_occupancy: float = 0.25, max_occupancy: float = 0.75, min_observation_rate: float = 0.1, max_observation_rate: float = 0.5, random_seed: int = 0, spatial: bool = False, gp_sd: float = 1.0, gp_l: float = 0.2) tuple[dict, dict]

Simulate data for occu_rn().

Returns (data, true_params) for fit().

Examples

>>> from biolith.models import simulate_rn
>>> data, params = simulate_rn()
>>> sorted(data.keys())
['coords', 'ell', 'obs', 'obs_covs', 'site_covs']
biolith.utils.fit(model_fn: callable, site_covs=None, obs_covs=None, obs=None, session_duration=None, num_samples: int = 1000, num_warmup: int = 1000, random_seed: int = 0, num_chains: int = 5, kernel: Literal['nuts', 'hmc', 'mixed_hmc', 'discrete_hmc_gibbs', 'hmcecs'] = 'nuts', timeout: int | None = None, **kwargs) FitResult

Fit a NumPyro model using the provided data.

Parameters:
  • model_fn – The model function to fit.

  • site_covs – Array or Pandas DataFrame containing site-level covariates.

  • obs_covs – Array or Pandas DataFrame containing observation-level covariates.

  • obs – Array or Pandas DataFrame containing the observed data.

  • session_duration – Array or Pandas DataFrame containing the session duration for each observation.

  • num_samples – Number of posterior samples to draw.

  • num_warmup – Number of warmup steps for the sampler.

  • random_seed – Seed used for the random number generator.

  • num_chains – Number of MCMC chains to run.

  • kernel – Name of the sampling kernel to use. Possible values include "nuts", "hmc", "mixed_hmc", "discrete_hmc_gibbs", or "hmcecs".

  • timeout – Optional timeout (in seconds) for the sampling step.

  • **kwargs – Additional keyword arguments passed to model_fn.

Returns:

A tuple-like object containing the posterior samples and the MCMC object itself.

Return type:

FitResult

Examples

>>> from biolith.models import simulate, occu
>>> from biolith.utils import fit
>>> data, _ = simulate()
>>> results = fit(occu, **data)
biolith.utils.predict(model_fn: callable, mcmc, site_covs=None, obs_covs=None, obs=None, session_duration=None, num_samples: int = 1000, random_seed: int = 0, infer_discrete: bool = False, timeout: int | None = None, **kwargs) dict

Generate posterior predictive samples using a fitted model.

Parameters:
  • model_fn – The model function used for inference.

  • mcmc – An MCMC object returned by fit().

  • site_covs – Array or Pandas DataFrame containing site-level covariates.

  • obs_covs – Array or Pandas DataFrame containing observation-level covariates.

  • obs – Array or Pandas DataFrame containing the observed data.

  • session_duration – Array or Pandas DataFrame containing the session duration for each observation.

  • num_samples – Number of predictive draws to generate.

  • random_seed – Seed used for the random number generator.

  • infer_discrete – Whether to only sample discrete latent variables.

  • timeout – Optional timeout (in seconds) for the prediction step.

  • **kwargs – Additional keyword arguments passed to model_fn.

Returns:

Dictionary of posterior predictive samples.

Return type:

dict

Examples

>>> from biolith.models import simulate, occu
>>> from biolith.utils import fit, predict
>>> data, _ = simulate()
>>> results = fit(occu, **data, num_samples=10, num_warmup=10, num_chains=1)
>>> preds = predict(occu, results.mcmc, **data, num_samples=5)
biolith.evaluation.lppd(model_fn: callable, posterior_samples: Dict[str, Array], **kwargs) float

Calculates the log pointwise predictive density (lppd) for a fitted model.

This function computes the lppd using the log likelihood of the model given the posterior samples and the observed data. It uses the log_likelihood function from numpyro to evaluate the model’s likelihood for the provided observations.

Parameters:
  • model_fn (The model function used to fit the data.)

  • posterior_samples (A dictionary containing posterior samples from a fitted model.)

  • **kwargs (Additional keyword arguments passed to the log_likelihood or model function.)

Returns:

float

Return type:

The log pointwise predictive density (lppd) value.

Examples

>>> from biolith.models import simulate, occu
>>> from biolith.utils import fit, predict
>>> from biolith.evaluation import lppd
>>> data, _ = simulate()
>>> results = fit(occu, **data)
>>> preds = predict(occu, results.mcmc, **data)
>>> lppd(occu, preds, **data)
biolith.evaluation.posterior_predictive_check(posterior_samples: Dict[str, Array], obs: Array, group_by: Literal['site', 'revisit'] = 'site', statistic: Literal['freeman-tukey', 'chi-squared'] = 'freeman-tukey') float

Performs posterior predictive checks of a Bayesian occupancy model.

This function calculates a Bayesian p-value to assess the goodness-of-fit of the model. It compares the observed data with replicated data generated from the model’s posterior predictive distribution using a specified discrepancy statistic.

The discrepancy is calculated for both the real and replicated data for each posterior sample. The Bayesian p-value is the proportion of times the discrepancy of the replicated data exceeds that of the observed data.

Parameters:
  • (dict) (posterior_samples) –

    call or a similar source. It must contain the following keys: - ‘y’: Replicated observation data from the posterior predictive

    distribution. Shape: (num_samples, num_revisits, num_sites).

    • ’psi’: Posterior samples for the occupancy probability.

      Shape: (num_samples, num_sites).

    • ’prob_detection’: Posterior samples for the detection probability. This is

      necessary to compute the expected values for the GOF tests. Shape: (num_samples, num_sites, num_revisits).

  • (jnp.ndarray) (obs) – Shape: (num_sites, num_revisits).

  • (str) (statistic) – statistic. Must be either ‘site’ or ‘revisit’.

  • (str) – Must be either ‘freeman-tukey’ or ‘chi-squared’.

Returns:

float – model fit, while values near 0 or 1 suggest misfit.

Return type:

The Bayesian p-value. A value close to 0.5 suggests a good

Examples

>>> from biolith.models import simulate, occu
>>> from biolith.utils import fit, predict
>>> from biolith.evaluation import posterior_predictive_check
>>> data, _ = simulate()
>>> results = fit(occu, **data)
>>> preds = predict(occu, results.mcmc, **data)
>>> posterior_predictive_check(preds, data["obs"])
Raises:
  • ValueError – If ‘group_by’ or ‘statistic’ are not valid options.:

  • KeyError – If ‘posterior_samples’ is missing the required keys: (‘y’, ‘psi’, ‘prob_detection’).

biolith.evaluation.residuals(posterior_samples: Dict[str, Array], obs: Array) Tuple[Array, Array]

Calculates occupancy and detection residuals from posterior samples.

This function implements the residual definitions from Wright et al. (2019) to help diagnose occupancy model fit. It separates residuals for the occupancy process from the detection process.

References

  • Wright, W. J., K. M. Irvine, and M. D. Higgs. 2019. Identifying occupancy model inadequacies: can residuals separately assess detection and presence? Ecology 100(6):e02703. 10.1002/ecy.2703

Parameters:
  • posterior_samples (A dictionary containing posterior samples from a fitted model.) – Must include ‘z’ (latent occupancy state), ‘psi’ (occupancy probability), and ‘prob_detection’ (detection probability).

  • obs (The original 2D observation data array of shape (n_sites, n_visits)) – used to fit the model.

Returns:

  • A tuple containing

  • - jnp.ndarray (Occupancy residuals of shape (n_samples, n_sites).)

  • - jnp.ndarray (Detection residuals of shape (n_samples, n_sites, n_visits).) – For a given posterior draw, residuals at sites considered unoccupied (z=0) are returned as np.nan.

Examples

>>> from biolith.models import simulate, occu
>>> from biolith.utils import fit, predict
>>> from biolith.evaluation import residuals
>>> data, _ = simulate()
>>> results = fit(occu, **data)
>>> preds = predict(occu, results.mcmc, **data)
>>> occ_res, det_res = residuals(preds, data["obs"])