Glossary – Environmental Modelling

from Beven, K J, Environmental Modelling: An Uncertain Future? Routledge:London, 2009

Aleatory Uncertainty The way in which a quantity varies in some random stochastic way in a system. Often used in contrast to epistemic uncertainty.

Alpha(a)-cut Use in fuzzy set theory to define a level of uncertainty with respect to the range of a degree of membership function (normally [0-1]).

Autocorrelated Errors A time series of model residuals that exhibit correlation at successive time steps. In distributed models, errors might be correlated in both space and time.

Auxiliary conditions The set of variables (and sometimes hypotheses) that need to be specified to run a model for a particular case. Includes initial conditions, boundary conditions and parameters representing system characteristics.

Axiom a proposition accepted as being fundamentally true on the basis of principle, past research, or purely hypothetical speculation. Axioms usually that specify something is or is not true and are the foundation for deductive reasoning.

Bayes Equation Equation for calculating a posterior probability given a prior probability and a likelihood function. Used in the GLUE methodology to calculate posterior model likelihood weights from subjective prior weights and a likelihood measure chosen for model evaluation.

Bayesianism / Bayes Explanation Approach in which subjective, empirical (inductive) and theoretical (deductive) information can be combined. Fits in well with current ideas about the sociological context of science and the way it is done in practice.

Behavioural Simulation A simulation that gives an acceptable reproduction of any observations available for model evaluation. Simulations that are not acceptable are non-behavioural .

Blind Validation Evaluation of a model using parameter values estimated before the modeller has seen any output data.

Boundary Conditions Constraints and values of variables required to run a model for a particular flow domain and time period. May include input variables such as rainfalls and temperatures; or constraints such as specifying a fixed head (Dirichlet boundary condition) or impermeable boundary (Neumann boundary condition) or specified flux rate (Cauchy boundary condition)

Calibration The process of adjusting parameter values of a model to obtain a better fit between observed and predicted variables. May be done manually or using an automatic calibration algorithm

Coherence A principle of probability theory that expresses that observations should be used in the best possible way to condition probabilities of a possible outcome. Can be applied most rigorously in ideal cases where input errors and model structural errors are negligible. Where such uncertainties are known to be significant it is not always obvious what is the best possible way. A less formal, but more generally useful, definition is that observations should not be used to condition probabilities of an outcome in a way that is clearly worse than some other conditioning process.

Conditioning The process of refining a model structure, or a distribution of parameter values of a model structure as more data become available (see also data assimilation and real-time forecasting).

Confirmation A process of evaluating model outputs to check that a model is still providing acceptable simulations. Now preferred to the terms validation or verification.

Copula A method of transformation from a space of scaled unit axes to a complex multivariate distribution with dependencies.

Covariance Matrix A vary of expressing the statistical uncertainty for a set of multiple parameters or variables as a square matrix of coefficients. The diagonal elements in the matrix represent the variance of each individual member of the set; the off-diagonal elements the covariation between pairs of members. The higher the degree of covariance, the greater the interaction between pairs of parameters or variables. The covariance can be scaled to represent correlation between the members.

Crisp set A set for which the boundaries of the set are defined such that any potential member of the set will either be in the set or excluded.

Data Assimilation The process of using observational data to update model predictions (see Chapter 5, also Real-Time Forecasting and Updating)

Deduction Inference from specific premises to some prediction. Theories based on physical assumptions without resort to empirical generalisations are examples of deductive reasoning. Examples common in mathematics and logic but rare in environmental science.

Degree of membership An expression of the strength with which a member of a fuzzy set is associated with that set. Normally takes the range [0-1], with zero values defining the range of support for the fuzzy set.

Disaggregation / Downscaling The process of distributing variables calculated at large scales to estimate appropriate values at smaller scales. Required, for example, in modelling impacts of climate change available at the grid scale of a GCM at scales of local interest.

Entropy Used here as a measure of information due to Claude Shannon in 1948 using an analogy with thermodynamic entropy.

Epistemology Study of the possibility and theory of knowledge. Evolutionary epistemology embodies the idea that knowledge will be revised and improved over time.

Epistemic Uncertainty The way in which the response of a system varies in ways that cannot be simply described by random stochastic variation. Often used in contrast to aleatory uncertainty. Also known as Knightian uncertainties (after Frank Knight (1885-1872) who himself referred to “true uncertainties” that could not be insured against, as opposed to risk that could be assessed probabilistically, see Knight, 1921)

Equifinality 1. The concept that there may be many models of a system that are acceptably consistent with the observations available, derived from the General Systems theory of Ludvig von Bertalanffy (1968) and adopted in environmental modelling by Beven (1993, 2006).

Equifinality 2. The adaption of the von Bertalanffy concept to geomporphology by Culling (1957) that that similar landforms might arise from different processes and histories.

Equifinality 3. The nonlinear dynamical systems version later expressed by Culling (1987, rejecting his earlier view). Culling distinguished between strict equifinality where a perturbed system will return to its original form after some transition time and weaker forms in which equifinality implies only persistence or stability of some property of the system in its trajectory in state space (as might be observed, for example, if there is some attractor in state space).

Extension Principle The extension principle is used in fuzzy theory and allows the extrapolation of degrees of membership from members of a fuzzy set to functions of the values of those members. Thus if the fuzzy set A is defined by the membership values of a discrete set of points in X, {x1, x2, x3, ….xn} with membership values {μ(x1), μ(x2), μ(x3), … μ(xn)}, then for any other fuzzy set f(A) that is a function of A, membership will be defined by the membership values{μ(f(x1)), μ(f(x2)), μ(f(x3)), … μ(f(xn))}.

Falsification Answer to the problem of induction, primarily due to Karl Popper. The idea that science proceeds by setting up theories and then seeking evidence to falsify them. In a strong version, whereas no amount of evidence can completely confirm a theory, one false prediction might be sufficient to falsify it. This is the standard model of the “scientific method”, which is now recognised to largely an ideal that is rarely followed. To avoid falsification, it is more usual to neglect some evidence as “outliers” or modify a theory to take account of the new evidence, perhaps by changing auxiliary hypotheses or calibration parameters.

Formal likelihood A quantitative measure of the acceptability of a particular model or parameter set in reproducing the system response being modelled based on a formal parametric function to represent the structure of the errors.

Fuzzy Logic A system of logical rules involving variables associated with a continuous fuzzy measure (normally in the range 0 to 1) rather than the binary measure (right/wrong, 0 or 1) of traditional logic. Rules are available for operations such as addition and multiplication of fuzzy measures and for variables grouped in fuzzy sets. Such rules can be used to reflect imperfect knowledge of how a variable will respond in different circumstances in terms of the possibilities of potential outcomes.

Fuzzy Measure A degree of membership of a quantity to a fuzzy set (see fuzzy logic).

Fuzzy Set A set of quantities, thought to have something in common, but for which membership of the set cannot be described precisely but only through a degree or grade of membership or fuzzy measure.

Genetic algorithm A method of optimisation based on treating parameter sets as “genes” that then “evolve” by melding, mutation, and off-spring processes.

Global optimum A set of parameter values that gives the best fit possible to a set of observations

Heteroscedastic Errors A time series of model residuals that exhibit a changing variance over a simulation period (see also Autocorrelated Errors)

Hypothesis A set of propositions about how a system works. Can be expressed either qualitatively or as a theory or model.

History Matching The calibration of a model by adjusting parameter values to reduce the differences between observations and predicted variables

Identifiability The ease with which particular parameter values in a model might be calibrated or conditioned by comparing model outputs to observed variables (see also calibration and conditioning)

Incoherence The use of observations to condition probabilities in a way that does not properly reflect the information content of the data (see coherence).

Incommensurate/incommensurability Used here to refer to variables or parameters with the same name that refer to different quantities because of a change in scale

Independence Two variables are independent if a change in the value of one variable has no effect on the effect of the other.

Induction The inference from experimental evidence to general theory. The “problem of induction” was originally identified by David Hume (1711-1776). This is that past evidence may not necessary be a good guide to future experience (all swans are white…; groundwater flow is Darcian….) so that no theory can ever be verified by induction.

Informal Likelihood A quantitative, but subjectively chosen, measure of the acceptability of a particular model or parameter set in reproducing the system response being modelled

Initial Conditions The auxiliary conditions required at the start of a run of a model to define all initial model states.

Inverse Problem see History Matching

Learning set A set of observed data used in the calibration of a Neural Net model

Likelihood Measure A quantitative measure of the acceptability of a particular model or parameter set in reproducing the system response being modelled

Linearity A model (or model component) is linear if the outputs are in direct proportion to the inputs.

Linguistic uncertainties result from the fact that language, including the scientific vocabulary, is often underspecific, ambiguous, vague, context dependent, or exhibits theoretical indeterminacies. Linguistic uncertainties often overlap with epistemic uncertainties.

Local Optimum A local peak in the parameter response surface where a set of parameter values gives a better fit to the observations than all parameter sets around it, but not as good a fit as the global optimum

Marginal distribution In a multivariate distribution, the marginal distribution obtained by integrating over all but the dimension associated with the particular variable for which the marginal distribution is required. It is the distribution of that variable conditioned on the distributions of all the other variables in the multivariate distribution function.

Model A set of constructs, derived from explicit assumptions, about the how a system responds to specified inputs. Quantitative models are normally expressed as sets of assumptions and mathematical equations and implemented as computer codes.

Model Space A hyperspace defined by the ranges of feasible models and parameter values, with dimensions for each parameter within each model.

Monte Carlo Simulation Simulation involving multiple runs of a model using different randomly chosen sets of parameter values

Multiple Working Hypotheses A scientific method, in the earth sciences commonly assigned to T. Chamberlin (1896) and Gilbert (1897), that is based on considering all possible explanations of a phenomenon and then subjecting each hypothesis to test. Given the limitations of data and measurement techniques it may not always be possible to reduce the multiple hypotheses to a single explanation.

Nomological System A formally defined system of theories and concepts in science.

Non-identifiability A expression of the problem of identifying parameter values in a model, given limited observational data (see also equifinality)

Nonlinear A model is nonlinear if the outputs are not in direct proportion to the inputs but may vary with intensity or volume of the inputs or with antecedent conditions

Nonparametric Method A method of estimating distributions without making any assumptions about the mathematical form of the distribution

Nonstationarity A system in which the characteristics are expected to change over time; a model in which the parameters are expected to change over time.

Non-uniqueness A expression of the problem of identifying parameter values in a model, given limited observational data (see also equifinality)

Normally Distributed A variable is normally distributed if its distribution can be adequately fitted by the Normal or Gaussian distribution function that is symmetrical about the mean, bell-shaped and with infinite tails.

Objective Function (Performance measure, goodness-of-fit) A measure of how well a simulation fits the available observations

Open system A system defined by uncontrolled boundary conditions that involve the exchanges of fluxes (of mass, energy, momentum etc.) with the rest of the world. The boundaries of open systems are often defined for convenience (for example where a measurement is being made), rather because they can be physically well-defined (e.g. a lake).

Optimisation The process of finding a parameter set that gives the best fit of a model to the observations available. May be done manually or using an automatic calibration algorithm.

Over-parameterisation Problem induced by trying to calibrate the parameter values of a model that has too many parameters than can be supported by the information content of the calibration data

Parameter A constant that must before defined before running a model simulation.

Parameter Space A space defined by the ranges of feasible model parameters, with one dimension for each parameter

Pareto Optimal Set The set of models in a multi-objective evaluation that are not dominated by any other model on at least on one evaluation measure i.e. there is no model that performs better on that evaluation measure and on another evaluation measure (named after Vilfredo Pareto, 1848-1923, who originated the concept of Pareto efficiency in economics).

Pareto front The surface or manifold in the model space that joins all models in the Pareto optimal set.

Parsimony The concept, sometimes known as Occam’s razor, that a model should be no more complex than necessary to predict the observations sufficiently accurately to be useful

Perceptual Model A qualitative description of the processes thought to be controlling the hydrological response of an area

Performance Measure (Objective function, goodness-of-fit) A measure of how well a simulation fits the available observations

Possibility A non-statistical measure of the potential for an outcome or occurrence of an event as an alternative to probability theory. Used in fuzzy set theory, but also has a more general usage (see George Klir, 2006)

Posterior distribution The statistical distribution of a model parameter or output variable after conditioning on the basis of observed data, for example, using a calculated likelihood in Bayes equation. In a Bayesian learning process, the posterior distribution after one conditioning step may become the prior distribution when new observational data are made available.

Prior distribution The statistical distribution of a model parameter or output variable assumed or calculated on the basis of only knowledge about the characteristics of the system before data are collected.

Probability A statistical measure of the potential for an outcome or occurrence of an event. Many statisticians, most notably Dennis Lindley (2006), believe that probability is the only way of expressing uncertainty in a potential outcome. There are a variety of foundations for the estimation and interpretation of probabilities of which the main examples are the Frequentist and the Bayesian views. Frequentists hold that probabilities represent the likelihood of outcomes that would be found if it possible to take a large number of samples over all potential outcomes. Bayesians recognise that this can only ever be an ideal an that prior (often subjective) estimates of probabilities might be useful as an input to estimating probabilities based on limited amounts of evidence.

Random Sample A set of realisations of a model or variable generated by making choices from a feasible range of possibilities drawn by selecting pseudo-random numbers from specified distributions.

Real-time forecasting The use of a model to make predictions into the near future (over some lead time), taking account of data becoming available as the forecasting period progresses (see also data assimilation and updating). Includes numerical weather prediction and models used for flood forecasting (see Chapter 5).

Recursive estimation A form of model calibration and uncertainty estimation based on updating time step by time step as new observations become available. Commonly used in data assimilation and real-time forecasting.

Response Surface The surface defined by the values of an objective function as it changes with changes in parameter values. May be thought of conceptually as a surface with ”peaks” and ”troughs” in the multidimensional space defined by the parameter dimensions, where the ”peaks” represent good fits to the observations and the ”troughs” represent poor fits to the observations (see also Parameter Space)

Risk Uncertainty about responses of a real world system that can be characterised in terms of probabilities. There is an ISO Standard on Risk Management Terminology (ISO, 2002) that uses the definition that risk is the combination of the probability of an event and its consequence, but the term is often used more generally.

Risk Analysis The process of identifying sources of risk and assigning values of risk.

Risk Communication The process of exchanging and translating information about risk between different stakeholder groups and decision-makers

Robustness A decision is robust to future uncertainty if it leaves open the possibility of alternative strategies as the future unfolds.

Sensitivity The response of a model to a change in a parameter or input variable. Can either be assessed locally in the model space (when it is normally quantified as a gradient as a normalised rate of change of a model output to the rate of change of the parameter or input variable) or over a global sample of points in the model space.

Simulated annealing A method of optimisation based on an analogy with the organisation of molecules in cooling liquid metal. Initially there is a random scatter of parameter sets but as the “temperature” cools, the optimisation is becomes more and more structured. Useful when searching for a global optimum when there may be many local optima.

Stakeholder A individual, group or community who might be affected by the outcome of a decision making process

State Space The space of potential trajectories of a model (or, in a more limited sense, of a particular variable in a model).

Stochastic model A model that contains random elements as a way of expressing uncertainty in inputs, system characteristics, or model response. Often constructed by proposing a certain (simple) structure for perturbations around some mean or control simulation. Introduces additional parameters within the stochastic structure to define the magnitude of the likely perturbations. Additive perturbations are often assumed for simplicity (but multiplicative perturbations can be transformed to an additive form by taking logs). If it is expected that the nature of the perturbations will change (e.g. with heteroscedastic variance) then more parameters will be required to represent such changes.

Support of a Fuzzy Set The range of possible values that will have degree of membership greater than zero in a fuzzy set.

System A part of the world that has been identified for study or for a decision. Occupies some physical space, some time period, and is separated from the rest of the world by the specification of certain boundary conditions. Environmental systems are usually open systems.

Theory A set of constructs, derived from explicit assumptions, about the nature of the world. Formal theories are normally expressed as mathematical axioms and equations.

Trans-science The idea, proposed by Alvin Weinberg in 1972, that some subjects, generally held to be within the realm of science, may not be open to rigorous scientific study. Arguably, this applies to very many environmental systems where boundary conditions cannot be controlled, experiments are necessarily place and time specific (see uniqueness of place), and details of processes are not fully known.

Transfer Function A representation of the output from a system due to a unit input

Utility function An expression of the benefits arising from implementing different levels of investment (costs) in formal decision making.

Underdetermination thesis A theory (or model) is underdetermined if there is at least one other theory (or model) that is equally compatible with the available empirical evidence (observations with which a model can be compared).

Unknowabilty The concept that because of limitations of current measurement techniques there is much about environmental systems that cannot feasibly be known.

Updating The process of using data available now to condition forecasts of a variable into the future but changing values of parameters or state variables in the model. A form of data assimilation, commonly used in real-time forecasting. Includes different types of Kalman filter, and Variational data assimilation

Validation 1. A process of evaluation of models to confirm that they are acceptable representations of a system. Philosphers of science have some problems with the concept of validation and verification (e.g. Oreskes et al., 1994) and it may be better to use ”evaluation” or ”confirmation” rather than validation or verification (which imply a degree of truth in the model).

Validation 2. Validation is sometimes used in the much more restrictive sense of validation of a computer code (mode) to show that it does produce accurate solutions of the equations on which it is based. For complex models this can also be difficult to show in practice.

Variogram A spatial correlation function used in geostatistics, normally plotted as the variance at different distances between points in a random field. Normally increases with distance, up to the “range” where variance becomes constant. If variance continues to increase with distance, the field may be nonstationary (or fractal). Sometimes fitted by a functional form that includes a random effect at zero distance, the “nugget variance”. Underlies the Kriging method of spatial interpolation in geostatistics. To estimate a variogram without a significant degree of uncertainty requires a large number of measured values, more if the measurements suggest that the correlation might vary with direction.

Verification see Validation

Verisimilitude A term used to express the concept of relative truthfulness of false theories. The term was used by Karl Popper (1963), but his notion of verisimilitude has since been criticised (e.g. by Tichy, 1974).