With so many people freaking out about computer models, this seems like a good time for some perspective!
Here is JG’s Comment of the Day on the post, “Saturday Ethics Warm-Up, 4/4/2020: Letting The Perfect Be The Enemy Of The Good, And Other Blunders,” (Item #2):
I’m a math major from the late 60s so I do love numbers but wasn’t subjected to subsequent generations of educators preaching the genius of computer generated models (CGM).
My first job after college was in the research department of a well-known consulting company. One of the whiz kid consultants from Stanford was all about computer models (this is when we were still using teletypes to communicate with the mainframe).
I’ll never forget how impressed I was with his modeling and he, very wisely, cautioned me that it was difficult to remove personal bias from a computer model. He told me that our brains work more efficiently than any computer and, when faced with a problem that may require analyzing huge data series (i.e. perfect project for a computer), we, with our human brains, already have a “gut” conclusion which creeps into the modeling and very well may influence its neutrality. That piece of advice has stayed with me lo these 50 years and makes me skeptical about the reliability of any computer-generated model.The discrepancy that we are observing in the virus models is remarkable because the medical field has had for years the benefit of the most advanced statisticians who are specifically trained in recognizing the nature of different types of medical data and how differently they behave which allows for generally pretty accurate results. This is one case where the data has not been available and, thus, garbage-in, garbage-out.
Now, to my main point: climate science models are much less reliable because there is not an army of statisticians, thus far, trained specifically in the analysis of climate-associated data. My understanding about many of the models that support the IPCC reports is that there are numerous variables for which there is not an adequate time series available, so they’re left out of the model all together. Thus, we don’t even have garbage-in to criticize; rather, we have half-baked results kind of like baking a cake and leaving out half of the ingredients.
Great COD!!!
Nicely put! Bias in the model is important to keep in mind for anyone using science mindset. I also like the alt-text in xkcd’s Scenario 4 from Friday: “Remember, models aren’t for telling you facts. they’re for exploring dynamics.”
Great comment, JG!
I really enjoyed this. Models always seem to be the grandest of all appeals to authority. When someone says Dr.X states that . . . we might question the good Dr’s qualifications yet we take computer models as unassailable.
I am always amused at the hurricane models that prognosticate its trajectory. They all have a similar path initially but the deviations get wider the farther into the future. These deviations all reflect the differences in the underlying assumptions and the assumptions reflect our biases and the depth of our understanding.
I recall Dr. Fauci telling a badgering congressperson that all the projections he was basing his assumptions upon (along the lines of 20,000 dead or 200,000 dead) were based on computer models and computer models are the direct result of the assumptions they make. Of course the media simply focused on “Dr. Fauci said 200,000 will DIE!”
“Monger” is a term I’ve come to hold in high regard. Good German term, I assume.
There was not this level of panic during the swine flu pandemic.
When looking at a simulation/model, the first question you must ask is what is the purpose of the model, and then whether the conclusion is appropriate based on the purpose of the model.
The models so far have had a very narrow public health purpose, to estimate the absolute worst case scenario to get any sort of feel of the magnitude of resources needed. Information is limited, so the conclusions that can be drawn from the models must also be limited.
Lately, I’ve been hearing in the news media about the dire new predictions that up to 100-200 Thousand Americans could die from the virus. However, I don’t think the models are being used appropriately. Earlier models showed up to 1-2 Million (1,000-2,000 Thousand – ie 10 times more) Americans might die.
The new models should be reassuring to the public that the limited data is showing the virus to be less deadly than initially predicted. Instead, the models appear to be used to worry the public.
The newer models are still limited, and the conclusions are not necessarily stronger. We should not assume the upper is limit is more accurate just because the model is “newer”; we should instead be relieved that the upper limit appears to be lower as better data because available. Always interpret a model within its original context, and be skeptical of predictions being used inappropriately.
Heh. Great analogy. All the good data in the world won’t help a model that has variables you have to fill with question marks, or worse, SWAGs.
The part that’s really the most annoying isn’t the insufficiency of the data or the models, it’s that so many are willing to accept them as meaningful guidance while knowing that they simply cannot be.
This is the bias that your consultant friend was worried about — the bias of hubris, and “close enough-ism.” Add an uncritical and uninformed media willing to bow down to “experts” as though they had some kind of mysterious divinely-provided insight, and you have the cult of faux science that now dominates our body politic.
It would be great if models could provide real-world guidance, but they often can’t, especially models of complex chaotic systems like the environment and epidemiological transmission of diseases. Like most chaotic systems, these processes have far too many significant and complex variables unable to be accurately measured to model with any confidence.
What we are really modeling is guesses, and when we can’t admit that, it is just another case of the blind leading the blind — authoritatively.
One of my biggest frustrations with this whole epoch is that a bevy of information is given without context to understand it, and a bunch of arm chair apologists swoop in to try to make the politicians not look like jerks. This is how we got “flattening the curve” (Oh, we’re not trying to prevent infections, we trying to slow the rate of exposure so hospitals are not overwhelmed), and other buzz phrases. Sadly, I am about to become one such second hand apologist for modeling.
When you have a complex phenomenon, you need to under stand the limits of the issue, and an ability to compare various options. When little information is available, you have to keep the model simple, so that factors of unknown significance don’t drown out potentially meaningful variables.
Lets look at a simple public health model (note, I am not a public health expert, this is my lay opinion of how to approach such a model).
We are concerned about the potential number of infected, “N”, and the number of deaths “D”. To build our model, we need to make some assumptions about what variables will be relevant.
Let us assume that on a daily basis, the average person comes into close contact (Less than 6′) from “X1” number of people, of this, “Y1” will get infected, and of this “Z1” will die.
We have a population size of P (in the US this would be 300 Million), and an initial percentage of the population infected, i (number of cases/population)
Over a number of days “T”, the following can be predicted:
Number of infected: N = P*(i*X1*Y1)^T
Number of deaths: D = P*(i*X1*Y1*Z1)^T
(There would also be dummy variables for calibration, but we will ignore those.)
In the earliest models, N=100 million, and D=1-2 million. These number may or may not be accurate but that is not relevant. These numbers are a baseline to compare the effectiveness of potential interventions.
If people for instance, stay home or stay more than 6′ apart, X1 can be reduced – ie, the number of people someone comes into contact with daily goes down. Plugging “X2” into our formula changes the results by a factor of 10.
This is shown in the newest models: N=1 million, D=100-200 thousand. These numbers again may not be accurate, and again that is not relevant. But it shows that so-called “social distancing” can reduce the number of deaths by 90%.
Unlike, say, political polling, these models are not meant to perfectly predict mortality. They are kept deliberately crude to test the “sensitivity” of the model to different factors. It is irresponsible for these models to be thrown around by the media, treating them like “polling” numbers when they serve a much more limited purpose. I doubt most in the media are capable of understanding this.
The model’s do provide “real world” guidance, but only through allowing a rough “order of magnitude” estimate of how different public health measures might impact the spread of the disease. At the start of a pandemic, the percentage of deaths may be higher, or at lease appear higher, because the total number of infections is not known yet. Thus early models will trend towards higher infection Y and mortality Z rates. They can be refined over time, such that better estimates for Y (percentage of people who get infected by a contagious person), or Z (percentage of people who die if infected can be plugged into the model to get a better estimate.
Over time, better treatments and medicines can also affect the variables. The models provide insight into which treatments and interventions will have the most impact. The goal is to intervene, such that the worst case scenarios do not come true!
“Many Americans agree that something hasn’t smelled right about this pandemic from the beginning. Besides there not being sufficient data to justify the extremity of the reactions, and without downplaying the threat to the lives of the sick and elderly, there is something in the air that smacks of calculated panic and political engineering. It is not the stuff of conspiracy theories to suspect—or even expect—that movement leftists will take full advantage of this upending of American calm, American lives, and the American economy to lay a steaming pile of blame at the President’s feet in their determination to destroy his chances at a second term.”
Article here.