Via Google Translate is deze tekst ook - onge-edit - in het Nederlands te lezen. Ook kun je direct via die versie reageren; je commentaren worden ook onder de Engelse versie getoond.

Before we describe what models are in science, it's best to know, and to never forget, that all models only say what they are told to say.

Models are lists of statements of the form "If this, then that". No matter how large they grow, or how sophisticated, or how mathematical, or how computerized, or how much data that is put into them, or from what sources, their natures are not altered. Models are always lists of "If this, then that."

This applies to all models. It does not matter what names those models are given: artificial intelligence, statistical, probability, physical, meteorological, air transport, crop production, chemical, sociological, psychological, genetic, quantum mechanical, machine learning, and on and on. All are the same in essence: all models in every field have the same nature. And all say just what they are told to say---and nothing more.

This is not a limitation or a flaw. It is the way things are.

A die
Here is a simple, common, and most useful model, used by casinos the world over: "If this die has six sides, then the probability any side that is up in a throw is one in six."

That model says exactly what we want it to say, and only what we told it to say. It is an accurate model, too. It matches reality well; indeed, it makes beautiful predictions, especially because casinos keep a watchful eye on dice throws. Vast sums of money are made using this model.

The model says nothing, not one thing, about what causes any side to be up on any toss. Cause cannot be inferred from examining the model. No cause was built into the model. That is, none of the "If this, then that" statements (and there is only one) mentioned cause.
But the model is still good.

A model can perform well in practice, but we cannot from that good performance conclude it has identified the cause of things
We conclude that models can be good and useful yet be silent on cause. The opposite is also true: a model can perform well in practice, but we cannot from that good performance conclude it has identified the cause of things. Ensuring cause has been identified is a much more difficult task, as anybody who has painfully designed an experiment only for it to go incomprehensibly wrong can attest.

Since all models only say what they are told to say, we can always create a model to say anything we want. We can have the model speak in the complex language of mathematics or physics, or we can have it discourse in arcane computer code; we can have it project in pictures, words, or numbers. We have the complete freedom to make any model say anything we want it to say.

We have the freedom to specify the "If this" parts of the model, from which we sometimes can deduce, and sometimes must guess, the "Then that" parts. Or we can work backward, starting from desirable "Then that", and picking compatible "If this" parts.

This freedom comes with a cost. Since any model can be made to say anything at all, it means models can't really be trusted until they are tested against reality
Complete freedom
We have the freedom to say which and how much and from where the "data" goes into the model, and what "If this, then that" they are married to. We have the freedom to embrace any simplifications or approximations we want. We can, and an increasing minority even do, cheat.

In short, we have complete freedom over all aspects of all models.

This freedom comes with a cost. Since any model can be made to say anything at all, it means models can't really be trusted until they are tested against reality.

Models certainly cannot be trusted because of the authority of who builds, or rather creates, them. That is a fallacy. And they can't be trusted because "We need to do something and there is nothing else." That is also a fallacy: there are always other options.

Models can only be believed when they are tested independent of the information used to create them, and independent of the people who created them.

If the model only works by those that created it, or built using only the information controlled by its creators, we likely have a case of confirmation bias. Everybody in science knows what confirmation bias is, but everybody also believes it always happens to the other guy. Again, models must be independently compared with Reality to prove themselves.

One way trip
Here's another simple thought experiment showing why all models must be independently tested.

A team of eminent engineers, all with many awards and high positions, claim to have discovered a new theory that can be used to build a machine to safely transport people (and only people) through space to distant habitable plants, by dematerializing them here, and rematerializing them there.

In our understanding and definition, there is no difference between "theory" and "model". So, will you step into the machine to be dematerialized? Or will you first demand some kind of proof the machine works?
The trip is one way, though, because there is no machine built to the theory's specifications on the other side. One day, if the machine works, and enough people can be transported successfully, an industrialized civilization capable of recreating the machine can grow, and people might be able to return. We also cannot communicate with the people on the distant planet, because again the machine is only oneway, and the planets are thousands of light years away.

In our understanding and definition, there is no difference between "theory" and "model".

So, will you step into the machine to be dematerialized?

Or will you first demand some kind of proof the machine works? That is, proof that the model matches reality. Or will you take the creator's word for it, and be convinced by their impressive demonstrations of math and science, and their insistence most of their colleagues (say, 97%) agree with them? If so, safe journey!

Even though it may not look like it, especially to non-mathematicians and non-coders, it is simplicity itself to create a complex model. An enormous list of equations, all accurate in themselves, can be written down with very little trouble. And, with even greater ease, we can say those equations apply to some physical thing, some measurable aspect of reality.

For it is we who say the X in our model means "atmospheric quantity of ammonia", "temperature", "crop yield", "income", or whatever we like. It is we who say all those fancy calculations mean "If this, then X does that."

This is always raw assertion. This assertion may have any number of good, and even excellent, reasons for believing it, or at least putting confidence in it. But the assertion about "X" cannot be believed finally until the model proves itself.

Meaning and costs
We must always remember the math and complex symbols inside a model are just that: math and symbols. They only take on meaning when we assign it to them. They have no meaning in themselves. Although you are likely tired of reading it by now, what this means is that the only true test of a model is when we witness its boasts about those symbols.

Testing a model against reality is not easy. If a model says, "If this set of conditions hold, then X does that", then we either wait until that set of conditions arises and then examine X, or we carefully as possible design an experiment to bring about those conditions.

That takes time, and is costly.

Which is why this step is usually skipped. Instead, something like the certainty of the model's "If this, then that" statements, or the model's inner coherence, are offered as proof enough of the model's worth. But this move does not work.

A model may be useful to one man, but useless to another. It depends on to what uses the model will be put
Perfect and useful
The ideal model is one which makes perfect predictions. This happens when every one of its "If this, then that" statements is true itself, and can be proved true, where that "every” is strict. And where the complete cause of “X" is within that list of statements. Since every "If this, then that" statements is true, and we know the complete cause, the model must predict perfectly.

Useful predictions can also happen when not every "If this, then that" statement is true; that is, when some of them are false or uncertain. Or when they all are true, but when we don't know the complete cause. This applies to the dice throwing model above.

Usefulness is a subjective criterion, but that it is subjective is not necessarily a shortcoming. A model may be useful to one man, but useless to another. It depends on to what uses the model will be put.

For an extreme example, a model that is wrong all the time can be very useful to the man who successfully markets and sells that model. Whereas the model will have no value to those who buy it. This is why buyers must insist on independent demonstrations of model performance.

Complexity
The more complex a model grows, the more difficult it is to witness enough or all the situations envisioned by the model.
A psychic (with her internal "mind" model) claims to be able to guess if you are thinking of an odd or an even number. You think of odd, she predicted odd, a success. Is her model correct; i.e. does the woman truly have psychic powers?

Maybe. But one test is clearly insufficient to tell. And even if she guessed right a large number of times, we would not have decisive evidence her model was correct. Winning poker players know why: a cause other than that asserted by the model (psychic ability) might also account for good results. This is why testing models against reality is so difficult.

The more sophisticated the model, the greater its number of "If this, then that" statements. If the model cannot be tested everywhere, it should never be wholly trusted. However, more complex models are trusted more, even when imperfectly tested
Again, the difficulty only grows with the model's complexity. A model of guessing odd or even numbers is trivial. A model which purports to say what the temperature or level of some atmospheric chemical will be a year hence is hideously elaborate.

The more sophisticated the model, the greater its number of "If this, then that" statements. For testing, compromise in limiting that set to something manageable is almost always required. With that compromise should come an increasing uncertainty of the model's value, though, even after it is tested. If the model cannot be tested everywhere, it should never be wholly trusted.

Sometimes something very like the opposite is true in practice. More complex models are trusted more, even when imperfectly tested.

Pygmalion
We have to consider those who create these complex models love those models. And why not? Creators spend vast quantities of time and energy and honest sweat in designing, shaping, and polishing those models, like Pygmalion adoring his statue. Their models are beautiful creations, at least in their eyes.

The people involved in these large modeling efforts are intelligent, even highly intelligent. It is always therefore an affront to cast doubt on their work, a kind of insult. Nobody likes to be suspected of error, least of all those who believe themselves our best thinkers. Harsh questions that could be asked about a model are, for this reason, sometimes not asked.

The temptation to bypass independent testing is often not resisted
And, as mentioned above, the cost and time required to do robust testing balloon with model complexity. When model users feel model-based decisions have to be made, and perhaps the funds and will to do so are low, the temptation to bypass independent testing is often not resisted.

Even when this is not resisted, the temptation with independent testing to resort to shortcuts is too often taken. Or certain mistakes in verification creep in unnoticed.

Suppose we have a model that predicts the atmospheric deposition of some thing. An independent test is done, as is proper. Many points never before used in building the model, in any way, are predicted. These predictions turn out to be good---with good defined by
the decisions model users make, and the gains and losses they experience with the model.

A quick judgement comparing the averages of the predictions and the observations appears to say the model is fine. But a closer examination reveals that the majority of points tested were of trivial size, concentrations that no one really cares about, because these happened
to be the bulk of observations taken during the test. Those small amounts are what happened in the world.

A complete look, using better measures than just comparing averages, reveals larger values of deposition, the ones important to model users, are not predicted well; indeed, the error in the model is seen to grow as the value of the deposition grows.

This means we have to do the independent tests at points where important decisions would be made by model users. We can't rely on simple tests of goodness, because it might easily turn out that any model would have done well, and we just happened to test ours.

Sophistication versus the average
This brings in the last idea of skill. It is simple to grasp. Suppose we had a good guess of the average values of, say, nitrogen deposition over a time period of interest to model users. Maybe the mean of old observations are taken. We could use that average and use it as if it were a model. We can make predictions with it, just as we can with any model. All the predictions happen to be the same value, which is the old average.

The final lesson is simple: put all models to the test
Now the deposition model we supposed a moment ago is considered to be a sophisticated, physically and chemically accurate model. It appears to explain the physics and chemistry and other components of the atmosphere in a satisfying way. A large group of important users implement this sophisticated model. Much is invested in it. Important decisions are made using it. Lawmakers embrace the model and require it to be used.

But then suppose we compare that sophisticated model to the average model and the average model beats the sophisticated model, using the verification measures we thought important.

Which model is better? Well, as just said: the average model. The sophisticated model does not have skill with respect to the average model.

Since we would be better off using the average model, we should not use the sophisticated model, no matter how important it was thought to be, or how much was invested in it, or how "official" it is. Or even because it is believed the sophisticated model explains the physics and so forth better than the average model.

In truth, the sophisticated model, because it cannot, and did not, beat the simple average model, cannot explain the physics better. It can only appear that way to those who love the model. Because if it did explain the physics better, it would have also made better predictions. It is as simple as that.

The final lesson is simple: put all models to the test.

In Wat is ...? gaan we met bekende en minder bekende mensen op zoek naar wat hen motiveert om te ontdekken of we elkaar van daaruit weer kunnen vinden. De introductie tot de modellenreeks vind je hier. Waarom we dit doen lees je in De ontdekking van de ander.
Dit artikel afdrukken