Posted by Rick Kossik

As I noted in previous blog posts, I am currently working on an online CT Course. In order to provide some useful material for users in the meantime, I plan to post excerpts from time to time as I slowly progress through the development of the Course.

The first excerpt simply provided the outline describing how the Course will be organized. The second excerpt discussed the key decisions required before starting to build a contaminant transport model. In this excerpt from one of the introductory Units of the Course, I discuss the problem of uncertainty in contaminant transport models and then summarize the underlying philosophy of the GoldSim Contaminant Transport Module.

The Problem of Uncertainty in Contaminant Transport Models

For most real-world systems, at least some of the controlling parameters, processes and events are often uncertain (i.e., poorly understood or quantified) and/or stochastic (i.e., inherently temporally variable). It is for this reason that GoldSim was specifically designed as a powerful probabilistic simulator.

Although this “problem of uncertainty” applies to any kind of system you may want to simulate (and it is unfortunately often ignored), due to the nature of the systems involved, this issue is especially important when dealing with environmental systems, and particularly so when trying to carry out contaminant transport modelling.

Sources of Uncertainty in Contaminant Transport Models

If you give it a little thought, the reasons for the high uncertainty associated with contaminant transport modeling should be clear. Engineered systems (such as a factory or a machine) are, for the most part, well-defined and understood (e.g., you can typically measure the important variables and properties of these systems, may have many similar systems that you can look at to better understand performance, and can often build and test prototypes). They still have uncertainty in their performance (primarily due to exogenous environmental variables that may be important, or perhaps behavior over very long time periods that is difficult to prototype), but this uncertainty is typically not extremely large (and that is why complex machines like airplanes are quite safe and predictable).

For environmental systems, however, we often have a much poorer understanding of the system. A major reason for this is that they often cannot be easily characterized (i.e., the relevant parameters cannot be easily measured). For example, if trying to predict contaminant transport through groundwater, it is not practical or feasible to completely characterize the subsurface environment and determine the relevant properties (instead, you may have only a handful of data points). This is complicated by the fact that the properties themselves (e.g., hydraulic conductivity, chemical environment) are spatially variable. This spatial variability is important because some of the key parameters controlling mass transport for a contaminant, such as its solubility and partition coefficients to various solids, may be extremely sensitive to local environmental conditions that are difficult to characterize and potentially highly variable (e.g., pH and redox). Relatively small changes in local environmental conditions could result in order of magnitude changes in these parameters.

Some parameters used to describe contaminant transport are quite difficult to measure at all. An example of this is the dispersivity. Dispersivity is unusual in that unlike a property like porosity it is not really meaningful to say that it has some value at a particular point in space. This is because its value is typically considered to be scale dependent (due to the concept of macrodispersion). Hence, it can be thought of as a property of the entire system. This, of course, can make it difficult to quantify (doing so may require a large-scale field experiment that may not be feasible or practical).

In some cases, the processes themselves may be extremely difficult to quantify and poorly understood. For example, if your model included a pond with many chemical constituents, and during the simulation the pond went dry due to evaporation, concentrations in the pond would be very high (and spatially variable over small distances) and hence the various precipitation reactions taking place while the pond evaporates would be quite difficult to predict accurately. As a result, the behavior of such a system would have lots of uncertainty.

In many cases, extremely important variables required for predicting performance may be almost entirely unavailable and/or need to be estimated using very poor information. For contaminant transport models, particularly for existing waste sites, the classic example of this is the source term. There may be very limited information available regarding what was disposed and when it was disposed. This imposes a very large uncertainty on any predictive models.

In addition to these issues, it is often not practical or feasible to carry out experiments or evaluate alternative designs for environmental systems you are trying to model. In many cases, it is simply not possible to build and test alternative designs for a proposed system (such as a mine) – the system is simply too large to build and test realistic prototypes. The long time frames involved for some systems also makes this impossible. For example, when disposing of radioactive waste, highly engineered waste packages are often used. Laboratory tests can be carried out on these for months or perhaps years to evaluate their performance. But their design life is typically thousands of years, and it is very difficult to design experiments to extrapolate performance over such time frames. Models with long time frames have many other difficulties. For example, climatic factors (e.g., rainfall) will typically play an important role in environmental models. But predicting future climate for thousands (or even for tens or hundreds) of years in the future is difficult. And, of course, even for very short duration models (months or years), the stochastic nature of weather adds uncertainty to many environmental models.

All of these factors result in very large uncertainties (in some cases, several orders of magnitude) in many of the parameters, processes and events associated with contaminant transport models.

Why is Dealing with Uncertainty So Important?

Unfortunately, the large uncertainties discussed above are often ignored. If uncertainties are acknowledged at all, the modeler often does so by selecting single values for each parameter, labeling them as “best estimates” or perhaps “worst case estimates”. These inputs are evaluated in the simulation model, which then outputs a single deterministic result, which presumably represents a “best estimate” or “worst case estimate”. They may also carry out some simple sensitivity analyses on some of the parameters.

Unfortunately, dealing with large uncertainties in this way when making predictions about the performance of an environmental system is highly problematic.

Oftentimes, the potential behaviors you care about most (e.g., exceeding an environmental regulation) arise from an unknown combination of parameter values and assumptions. The “best estimate” approach may not capture this. Because of this, defending “best estimate” approaches is often very difficult. In a confrontational environment (e.g., demonstrating that a particular facility will meet certain regulations), “best estimate” analyses will typically evolve into “worst case” analyses. However, “worst case” analyses can be extremely misleading. Worst case analyses of a system are likely to be grossly conservative and therefore completely unrealistic (i.e., by definition, picking lots of unlikely “pessimistic” values will almost certainly have an extremely low probability of actually representing the future behavior of the system). And it is not possible in a deterministic simulation to quantify how conservative a “worst case” simulation actually is (i.e., define its probability). Using a highly improbable estimate to guide policy-making (e.g., “is the design safe?”) is likely to result in very poor decisions.

Due the large uncertainties in the parameters, processes and events associated with contaminant transport models, and the problems with deterministic approaches described above, GoldSim’s modeling philosophy revolves around the belief that explicitly representing these uncertainties (by carrying out probabilistic simulations) is critical. And as we will see below, realistically acknowledging and representing uncertainty has important implications for how you should build and structure your models.

GoldSim Modeling Philosophy

The large uncertainties associated with contaminant transport modelling discussed above strongly informs the modeling philosophy embodied in GoldSim. This philosophy revolves around the idea that the complexity and detail that you include in your model should be consistent with the amount of uncertainty in the system. In particular, when building a model and deciding whether to add detail to a certain process, you should not just ask if you can (e.g., can you use a more detailed equation or more discretization?), but you should ask if you should (does the amount uncertainty I have in this process justify a more detailed model?).

An easy way to illustrate this is to consider a trivial example. Imagine that we had a process that was described using the equation Y = A + B. A and be have similar magnitudes. A is a parameter that has three orders of magnitude of uncertainty (that cannot be easily reduced). B, on the other hand, only has an uncertainty of one order of magnitude. By using a more detailed model, the uncertainty in B could be cut in half. Should we add more detail to the model of B? What we need to ask is how reducing our uncertainty in B will reduce our uncertainty in the “answer” (i.e., Y). Of course, in this trivial example, reducing the uncertainty in B has no significant impact on the uncertainty in Y (because it is dominated by the uncertainty in A).

In the real world, of course, the models are not so trivial. But the basic idea still applies: we should only add detail (to reduce uncertainty) to a process or parameter if by doing so we can reduce our uncertainty in the ultimate measure that we are trying to predict or the question that we are trying to answer. (In fact, probabilistic modeling supports uncertainty and sensitivity analyses that allows you to specifically determine if this is the case.)

This discussion may seems obvious, but anyone who has reviewed environmental models will readily acknowledge that most models have details that simply cannot be justified. The problem is not just that you are wasting time and money by adding these details. The problem is that such models can be misleading. The extreme detail can often act to mask the fact that there are huge uncertainties in the model, making the model seem more “correct” than it really is.

Top-Down Modeling

To avoid this problem, GoldSim supports and embodies a top-down modeling approach.

In general terms, there are two ways to approach any kind of modeling problem: from the "bottom-up", or from the "top-down". “Bottom-up” modeling approaches attempt from the outset to model the various processes in great detail, and typically make use of complex process-level models for the various system components. The emphasis is on understanding and explaining all processes in great detail in order to eventually describe the behavior of the entire system.

While such an approach may seem at first glance to be "scientifically correct", for the following reasons it is generally not the best way to solve many real world problems:

The level of detail in a model developed from the bottom-up often becomes inconsistent with the amount of available information and the uncertainty involved. That is, a model is only as good as its inputs, and if you don't have much information, a detailed model is generally no better than a simple one.
It is often difficult to appropriately integrate and capture interdependencies among the various model components in a bottom-up model, since it is often impossible (or computationally impractical) to dynamically couple the various detailed process-level models used for the components of the system. As a result, important interactions in the system are often intentionally or unintentionally ignored in a bottom-up model.
It is easy for a bottom-up modeling project to lose sight of the "big picture" (i.e., to "not see the forest for the trees"). As a result, such an approach can be very time-consuming and expensive, with much effort being spent on modeling processes that prove to have little or no impact on the ultimate modeling objectives.
Finally, such models tend to be very difficult to understand and explain (and hence be used) outside of the group of people who create them.

In a "top-down" modeling approach, on the other hand, the controlling processes may initially be represented by approximate (i.e., less detailed or “abstracted”) models and parameters. The model can then evolve by adding detail (and reducing uncertainty) for specific components as more is learned about the system. Such an approach can help to keep a project focused on objectives of the modeling exercise without getting lost in what may prove to be unnecessary details. Moreover, because a properly designed top-down model tends to be only as complex as necessary, is well-organized, and is hierarchical, it is generally easier to understand and explain to others.

There are four key points in the application of a top down modeling approach:

Top-down models must incorporate a representation of the model and parameter uncertainty, particularly that due to any simplifications or approximations.
As opposed to representing all processes with great detail from the outset, details are only added when justified (e.g., if additional data are available, and if simulation results indicate that performance is sensitive to a process that is currently represented in a simplified manner). That is, details are only added to those processes that are identified as being important with respect to your modeling objectives and where additional detail will reduce the uncertainty resulting from model simplifications.
A top-down model does not have to be “simple”. Whereas a “simple” model might completely ignore a key process, a well-designed top down model approximates the process at an appropriate level while explicitly incorporating the resulting uncertainty that is introduced.
A top-down model lends itself to being a “total system” model, in which all relevant processes are integrated into a single coupled model.

Iterative Modeling

It is important to understand that in some cases a top-down model may become very complex indeed. The key point, however, is that it generally does not start out as a complex model. Rather, it evolves over time to a level of complexity that is appropriate given the modeling objectives, information about the system, and the uncertainty in that information.

This is accomplished by carrying out the various modeling steps iteratively, responding to new information and/or preliminary modeling results. That is, modeling any system should be an iterative process:

The basic concept is that as new data are obtained (e.g., through a data collection program or research) and/or as new insights into the behavior of the system are obtained (based on preliminary model results), you should reevaluate and refine the model. In some cases, you might “loop back upward” in the middle of the process. For example, development of a conceptual and corresponding mathematical model are dependent on the type of data available. If you determine that your available data is highly uncertain, the conceptual and mathematical models may need to be adjusted accordingly (e.g., if data is sparse and uncertain, a highly detailed model would likely be inappropriate).

Simulation models that are constructed and continuously modified in this manner can then do more than only provide predictions of performance; they can provide a systematic framework for organizing and evaluating the available information about the system, and can act as a management tool to aid decision-making with regard to data collection and resource allocation (what should be studied, when, and in what detail?).

How GoldSim Supports Iterative, Top-Down Modeling

So what does a “top-down”, iterative approach actually mean in terms of building contaminant transport models in GoldSim?

For those of you may be experienced with building detailed, high resolution models of environmental systems (e.g., using finite element or finite difference tools), you will find that GoldSim is not like those tools at all. It is different in three fundamental ways:

Spatial Resolution: Whereas a finite element or finite difference model would typically have a very high level of discretization (effectively thousands of finite volumes), a GoldSim model would have a much lower level of spatial resolution (perhaps tens or hundreds of finite volumes). Due to this lower spatial resolution, processes will typically be “abstracted”, “lumped” and/or averaged to some extent.
Integration: Detailed, high resolution models are typically designed to do one thing very well (e.g., model groundwater flow and transport or model geochemistry). It is often difficult to appropriately integrate and capture interdependencies among the various model components when using such an approach, since it is often impossible (or computationally impractical) to dynamically couple the various detailed models used for the components of the system. As a result, important interactions in the system are often intentionally or unintentionally ignored. GoldSim, on the other hand, allows you to build a single model that focuses on integrating and coupling all system components.
Uncertainty: For computational reasons, it is quite difficult to deal with uncertainty in detailed, high resolution models. GoldSim, on the other hand, allows you explicitly represent the (typically high) uncertainty present in contaminant transport models.

So how do we represent complex multi-dimensional environmental systems in GoldSim using a low level of spatial resolution? In GoldSim we will build abstracted and lumped representations of complex multi-dimensional systems by appropriately connecting together zero-dimensional (mathematically, well-mixed tanks volumes) and one-dimensional components. In subsequent Units we will illustrate how these components can be combined to represent higher-dimensional complex systems.

As will be seen in these later Units, the GoldSim Contaminant Transport Module can certainly be used to build very complex models. As discussed above, however, the key philosophy embodied in GoldSim is that in almost all situations, you should design your models from the top-down, keeping them as simple as possible, only adding complexity when it is warranted. That is, you are encouraged to keep in mind the words of the late British statistician George Box:

Since all models are wrong the scientist cannot obtain a "correct" one by excessive elaboration… Just as the ability to devise simple but evocative models is the signature of the great scientist, so overelaboration and overparameterization is often the mark of mediocrity.

Pages

October 5, 2020

GoldSim Contaminant Transport Module Online Course, Excerpt #3: The GoldSim Contaminant Transport Modeling Philosophy