Before we describe what models are in science, it's best to know, and to never forget, that all models only say what they are told to say.
Models are lists of statements of the form "If this, then that". No matter how large they grow, or how sophisticated, or how mathematical, or how computerized, or how much data that is put into them, or from what sources, their natures are not altered. Models are always lists of "If this, then that."
This applies to all models. It does not matter what names those models are given: artificial intelligence, statistical, probability, physical, meteorological, air transport, crop production, chemical, sociological, psychological, genetic, quantum mechanical, machine learning, and on and on. All are the same in essence: all models in every field have the same nature. And all say just what they are told to say---and nothing more.
This is not a limitation or a flaw. It is the way things are.
A die
Here is a simple, common, and most useful model, used by casinos the world over: "If this die has six sides, then the probability any side that is up in a throw is one in six."
That model says exactly what we want it to say, and only what we told it to say. It is an accurate model, too. It matches reality well; indeed, it makes beautiful predictions, especially because casinos keep a watchful eye on dice throws. Vast sums of money are made using this model.
The model says nothing, not one thing, about what causes any side to be up on any toss. Cause cannot be inferred from examining the model. No cause was built into the model. That is, none of the "If this, then that" statements (and there is only one) mentioned cause.
But the model is still good.
A model can perform well in practice, but we cannot from that good performance conclude it has identified the cause of thingsWe conclude that models can be good and useful yet be silent on cause. The opposite is also true: a model can perform well in practice, but we cannot from that good performance conclude it has identified the cause of things. Ensuring cause has been identified is a much more difficult task, as anybody who has painfully designed an experiment only for it to go incomprehensibly wrong can attest.
Since all models only say what they are told to say, we can always create a model to say anything we want. We can have the model speak in the complex language of mathematics or physics, or we can have it discourse in arcane computer code; we can have it project in pictures, words, or numbers. We have the complete freedom to make any model say anything we want it to say.
We have the freedom to specify the "If this" parts of the model, from which we sometimes can deduce, and sometimes must guess, the "Then that" parts. Or we can work backward, starting from desirable "Then that", and picking compatible "If this" parts.
This freedom comes with a cost. Since any model can be made to say anything at all, it means models can't really be trusted until they are tested against realityComplete freedom
We have the freedom to say which and how much and from where the "data" goes into the model, and what "If this, then that" they are married to. We have the freedom to embrace any simplifications or approximations we want. We can, and an increasing minority even do, cheat.
In short, we have complete freedom over all aspects of all models.
This freedom comes with a cost. Since any model can be made to say anything at all, it means models can't really be trusted until they are tested against reality.
Models certainly cannot be trusted because of the authority of who builds, or rather creates, them. That is a fallacy. And they can't be trusted because "We need to do something and there is nothing else." That is also a fallacy: there are always other options.
Models can only be believed when they are tested independent of the information used to create them, and independent of the people who created them.
If the model only works by those that created it, or built using only the information controlled by its creators, we likely have a case of confirmation bias. Everybody in science knows what confirmation bias is, but everybody also believes it always happens to the other guy. Again, models must be independently compared with Reality to prove themselves.
One way trip
Here's another simple thought experiment showing why all models must be independently tested.
A team of eminent engineers, all with many awards and high positions, claim to have discovered a new theory that can be used to build a machine to safely transport people (and only people) through space to distant habitable plants, by dematerializing them here, and rematerializing them there.
In our understanding and definition, there is no difference between "theory" and "model". So, will you step into the machine to be dematerialized? Or will you first demand some kind of proof the machine works?The trip is one way, though, because there is no machine built to the theory's specifications on the other side. One day, if the machine works, and enough people can be transported successfully, an industrialized civilization capable of recreating the machine can grow, and people might be able to return. We also cannot communicate with the people on the distant planet, because again the machine is only oneway, and the planets are thousands of light years away.
In our understanding and definition, there is no difference between "theory" and "model".
So, will you step into the machine to be dematerialized?
Or will you first demand some kind of proof the machine works? That is, proof that the model matches reality. Or will you take the creator's word for it, and be convinced by their impressive demonstrations of math and science, and their insistence most of their colleagues (say, 97%) agree with them? If so, safe journey!
Even though it may not look like it, especially to non-mathematicians and non-coders, it is simplicity itself to create a complex model. An enormous list of equations, all accurate in themselves, can be written down with very little trouble. And, with even greater ease, we can say those equations apply to some physical thing, some measurable aspect of reality.
For it is we who say the X in our model means "atmospheric quantity of ammonia", "temperature", "crop yield", "income", or whatever we like. It is we who say all those fancy calculations mean "If this, then X does that."
This is always raw assertion. This assertion may have any number of good, and even excellent, reasons for believing it, or at least putting confidence in it. But the assertion about "X" cannot be believed finally until the model proves itself.
Meaning and costs
We must always remember the math and complex symbols inside a model are just that: math and symbols. They only take on meaning when we assign it to them. They have no meaning in themselves. Although you are likely tired of reading it by now, what this means is that the only true test of a model is when we witness its boasts about those symbols.
Testing a model against reality is not easy. If a model says, "If this set of conditions hold, then X does that", then we either wait until that set of conditions arises and then examine X, or we carefully as possible design an experiment to bring about those conditions.
That takes time, and is costly.
Which is why this step is usually skipped. Instead, something like the certainty of the model's "If this, then that" statements, or the model's inner coherence, are offered as proof enough of the model's worth. But this move does not work.
A model may be useful to one man, but useless to another. It depends on to what uses the model will be putPerfect and useful
The ideal model is one which makes perfect predictions. This happens when every one of its "If this, then that" statements is true itself, and can be proved true, where that "every” is strict. And where the complete cause of “X" is within that list of statements. Since every "If this, then that" statements is true, and we know the complete cause, the model must predict perfectly.
Useful predictions can also happen when not every "If this, then that" statement is true; that is, when some of them are false or uncertain. Or when they all are true, but when we don't know the complete cause. This applies to the dice throwing model above.
Usefulness is a subjective criterion, but that it is subjective is not necessarily a shortcoming. A model may be useful to one man, but useless to another. It depends on to what uses the model will be put.
For an extreme example, a model that is wrong all the time can be very useful to the man who successfully markets and sells that model. Whereas the model will have no value to those who buy it. This is why buyers must insist on independent demonstrations of model performance.
Complexity
The more complex a model grows, the more difficult it is to witness enough or all the situations envisioned by the model.
A psychic (with her internal "mind" model) claims to be able to guess if you are thinking of an odd or an even number. You think of odd, she predicted odd, a success. Is her model correct; i.e. does the woman truly have psychic powers?
Maybe. But one test is clearly insufficient to tell. And even if she guessed right a large number of times, we would not have decisive evidence her model was correct. Winning poker players know why: a cause other than that asserted by the model (psychic ability) might also account for good results. This is why testing models against reality is so difficult.
The more sophisticated the model, the greater its number of "If this, then that" statements. If the model cannot be tested everywhere, it should never be wholly trusted. However, more complex models are trusted more, even when imperfectly testedAgain, the difficulty only grows with the model's complexity. A model of guessing odd or even numbers is trivial. A model which purports to say what the temperature or level of some atmospheric chemical will be a year hence is hideously elaborate.
The more sophisticated the model, the greater its number of "If this, then that" statements. For testing, compromise in limiting that set to something manageable is almost always required. With that compromise should come an increasing uncertainty of the model's value, though, even after it is tested. If the model cannot be tested everywhere, it should never be wholly trusted.
Sometimes something very like the opposite is true in practice. More complex models are trusted more, even when imperfectly tested.
Pygmalion
We have to consider those who create these complex models love those models. And why not? Creators spend vast quantities of time and energy and honest sweat in designing, shaping, and polishing those models, like Pygmalion adoring his statue. Their models are beautiful creations, at least in their eyes.
The people involved in these large modeling efforts are intelligent, even highly intelligent. It is always therefore an affront to cast doubt on their work, a kind of insult. Nobody likes to be suspected of error, least of all those who believe themselves our best thinkers. Harsh questions that could be asked about a model are, for this reason, sometimes not asked.
The temptation to bypass independent testing is often not resistedAnd, as mentioned above, the cost and time required to do robust testing balloon with model complexity. When model users feel model-based decisions have to be made, and perhaps the funds and will to do so are low, the temptation to bypass independent testing is often not resisted.
Even when this is not resisted, the temptation with independent testing to resort to shortcuts is too often taken. Or certain mistakes in verification creep in unnoticed.
Suppose we have a model that predicts the atmospheric deposition of some thing. An independent test is done, as is proper. Many points never before used in building the model, in any way, are predicted. These predictions turn out to be good---with good defined by
the decisions model users make, and the gains and losses they experience with the model.
A quick judgement comparing the averages of the predictions and the observations appears to say the model is fine. But a closer examination reveals that the majority of points tested were of trivial size, concentrations that no one really cares about, because these happened
to be the bulk of observations taken during the test. Those small amounts are what happened in the world.
A complete look, using better measures than just comparing averages, reveals larger values of deposition, the ones important to model users, are not predicted well; indeed, the error in the model is seen to grow as the value of the deposition grows.
This means we have to do the independent tests at points where important decisions would be made by model users. We can't rely on simple tests of goodness, because it might easily turn out that any model would have done well, and we just happened to test ours.
Sophistication versus the average
This brings in the last idea of skill. It is simple to grasp. Suppose we had a good guess of the average values of, say, nitrogen deposition over a time period of interest to model users. Maybe the mean of old observations are taken. We could use that average and use it as if it were a model. We can make predictions with it, just as we can with any model. All the predictions happen to be the same value, which is the old average.
The final lesson is simple: put all models to the testNow the deposition model we supposed a moment ago is considered to be a sophisticated, physically and chemically accurate model. It appears to explain the physics and chemistry and other components of the atmosphere in a satisfying way. A large group of important users implement this sophisticated model. Much is invested in it. Important decisions are made using it. Lawmakers embrace the model and require it to be used.
But then suppose we compare that sophisticated model to the average model and the average model beats the sophisticated model, using the verification measures we thought important.
Which model is better? Well, as just said: the average model. The sophisticated model does not have skill with respect to the average model.
Since we would be better off using the average model, we should not use the sophisticated model, no matter how important it was thought to be, or how much was invested in it, or how "official" it is. Or even because it is believed the sophisticated model explains the physics and so forth better than the average model.
In truth, the sophisticated model, because it cannot, and did not, beat the simple average model, cannot explain the physics better. It can only appear that way to those who love the model. Because if it did explain the physics better, it would have also made better predictions. It is as simple as that.
The final lesson is simple: put all models to the test.
Op 2 oktober krijg je nieuwe kado-artikelen.
Als betalend lid lees je zoveel artikelen als je wilt, én je steunt Foodlog
Piet, modellen zijn gedachtenbepalingen. Dat kun je in deze serie tot hier en toe duidelijk lezen.
Wat jij hun aard noemt, is nou juist de gevaarlijke illusie.
Dick #17 , modellen horen gemaakt te worden met betrouwbare data. Met de gecombineerde opgave/meitelling verstrekken boeren allerlei data waarvan voor de landbouw hele series modellen worden gemaakt . Data van boeren zijn betrouwbaar want er zwaait wat wanneer een individuele boer een verkeerde opgave van data doet.
Leg je daar dan het stikstof model naast, dan rijst de vraag hoe betrouwbaar de data zijn waar het stikstofmodel op gebaseerd is. Er is heden ten dage nogal wat gesteggel over de betrouwbaarheid van data voor het stikstof model. Van boerenzijde, waaronder ikzelf, vragen voortdurend om aanvullende data voor meer betrouwbaarheid van het stikstofmodel. Data die helaas niet verstrekt worden. Boeren moeten wel betrouwbare data verstrekken voor de landbouw modellen, terwijl andersom, stikstof data aan boeren verstrekken niet gedaan/geweigerd wordt.
Word je als boer alleen maar boos van.
Piet, #15, modellen over complexe systemen zijn met aan zekerheid grenzende waarschijnlijk niet te valideren.
Belangrijk is de opmerking die Dennis zojuist maakt: de arrogante ontkenning daarvan maakt het pas gevaarlijk. Modellen zijn immers ook onze enige manier om complexiteit in kaart te brengen.
Het betoog pikt het punt eruit dat ook voor mij bovenaan staat: zonder validatie is de waarde van een model(uitkomst) niet te beoordelen. Dat betekent overigens niet dat het onjuist is, het betekent slechts dat je niet weet hoe accuraat het model de realiteit benadert.
De auteurs maken nog een verdere stap: als je meerdere modellen valideert cq. naast de realiteit legt, is het model dat het dichtstbij zit het beste, ongeacht hoe simpel of geavanceerd het is. Toetsing aan de realiteit is rigoreus! Als we dat niet zouden doen, laat je feitelijk de realiteit los.
Wel valt er de kanttekening bij de maken dat geavanceerde modellen vaak meer tuning nodig hebben, en dat het ook toevallig kan zijn dat een simpel model dichterbij de realiteit zit. Een eenvoudig voorbeeld is de weersverwachting. Over een heel jaar genomen, is het model dat zegt "het weer van morgen is identiek aan het weer van vandaag" veelal heel accuraat. Maar goed, die nauwkeurigheid neemt snel af als je verder vooruit probeert te kijken. Wat dit voorbeeld laat zien is dat, zoals de auteurs ook schrijven, het valideren van modellen ook niet eenvoudig is een óók een interpretatieslag omvat. Er wordt een keuze gemaakt welke uitkomst gezien wordt als juiste weergave van de accuraatheid van het model.
In de praktijk zijn dit soort afwegingen uiterst belangrijk, maar zie je ook dat modelvalidatie steeds vaker überhaupt achterwege gelaten wordt. Het besef van de noodzaak van validatie blijft juist voor dat simpele onderscheid (is er wel of niet iets aan gevalideerd?) dus heel nuttig. De verdere nuances komen pas aan bod als er geclaimd wordt dat een model gevalideerd is: dan is de cruciale vraag: hoe?
Wat ik tot slot een hele waardevolle toevoeging van dit stuk vind, is dat het ook aansnijdt dat menselijk gedrag de omgang met modellen bepaalt. Het feit dat er vaak gevoeld wordt dat het bevragen van geavanceerde modellen not done is, zegt genoeg. Het omgekeerde zou waar moeten zijn: het niet bevragen zou not done moeten zijn. Maar aanzien, autoriteit, macht, etc. spelen een grote rol in de manier waarop er met dit soort dingen omgegaan wordt. Dit gevoelige punt wordt vaak toegedekt door te smijten met 'trust the experts', of zelfs met belachelijk maken van de bevrager. Het gedrag rondom Covid-maatregelen was daar een groot voorbeeld van. Buiten het begrijpen van de mogelijkheden en beperkingen van modellen, is het bespreekbaar maken van menselijk gedrag dus ook een cruciaal punt.
Beleid maken aan de hand van modellen is nog tot daar aan toe. Maar hoe zit het met modellen uit het verleden die niet op deugdelijkheid getest zijn maar waar wel wetten mee gemaakt zijn. Wet is wet, en dan heb je te maken met ondeugdelijke wetten die niet meer terug te draaien zijn.