Rich (in CT) adds a superb and learned enhancement to the day’s post about President Obama’s dubious rape claims during the Grammy Awards. It raises a question I hadn’t considered before: is part of the problem that researchers are as clumsy in their understanding of language as liberal arts types are in their use of statistics and numbers? The word “rape” has meaning; this is no place for Humpty Dumpty’s habit of using words to mean whatever one pleases. [“When I use a word,’ Humpty Dumpty said in rather a scornful tone, ‘it means just what I choose it to mean — neither more nor less.’ ’The question is,’ said Alice, ‘whether you can make words mean so many different things.’’The question is,’ said Humpty Dumpty, ‘which is to be master — that’s all.”—― Lewis Carroll, “Through the Looking Glass” ] Rich writes, “This data is important, as mental health and sexual disease propagation is affected by such contact, even if the traditional criteria imaged for “rape” is not met. ” I’ll concede that the data is important, but shouldn’t important data be clearly and accurately described? The data isn’t about rape! It’s about a variety of conduct linked by the researchers that they chose to call “rape,” knowing, presumably, that people who never read the data will take the misleading “rape” description and use it to confuse, persuade, deceive, and engage in scaremongering for political gain.
Rich writes that “not enough evidence is given to suggest that either study is unethical in and of itself.” Isn’t using vague, overly broad and misleading terminology for a study that is going to be made public intrinsically unethical—irresponsible, incompetent, untrustworthy?
Here is Rich (in CT)’s enlightening Comment of the Day on the post “The President’s Irresponsible And Untrue ‘One in Five Women Are Raped’ Claim”:
“No survey’s perfect.”
This is the truest statement in the article, although not when used as an excuse for using misleading data.
“Later in the article, Kessler reveals that the CDC study was based on interviews with more than 14,000 people, with a response rate of 33 percent….less than 5,000 people, from which the study extrapolated what would happen to 23 million women. Res ipsa loquitur.”
Statistically speaking, it is perfectly valid to use a sample of 5000 people to draw conclusions about a very large population (even in the millions). However, all statistical methods produce a confidence interval, which is vitally necessary for interpreting the data.
Even the two apparent conflicting government surveys are not totally unreasonable estimates when considered together. As a very rough and crude explanation, 350,000 total rapes might be the low end estimate; 1,900,000 total rapes the high end. The actual number of rapes for 2012 might be somewhere in between.
(I acknowledge a clinical tone in the following when speaking of rape; I do not wish to diminish the suffering caused by rape, but rather to focus on discussing the underlying statistics in the two studies.)
Ideally, both studies independently published the confidence intervals for their estimates. To clarify my rough example above, each study produces a predicted number of annual rapes (0.35 and 1.9 million for the NCVS and CDC studies respectively). They then each publish a confidence interval, which produces a high end and low estimate, based on the size of the surveyed group. The size of this interval varies, based on how “confident” the researchers wish to be that the actual number of rapes falls between the upper and lower bounds.
For the 2012 data, we see rates of 0.35% of women raped and 1.9% (given a population of 100 million used by the CDC). Given sample size for the CDC study of 5000, we expect a confinence interval of +/- 1.1% or 1.4%, depending on the our needs.(http://www.surveysystem.com/sscalc.htm). Assuming a similar sample size for the the former, we estimate ranges of [0% to 1.7%] and [0.4% to 2.3%]. There is thus considerable overlap between the two studies.
The two studies further appear to use separate criteria to serve two different purposes (NEITHER, I should stress, meant to give the president facile political talking points…) The CDC study wishes to capture all poorly planned and/or unwanted sexual contact, and used broadly phrased language to capture this. This data is important, as mental health and sexual disease propagation is affected by such contact, even if the traditional criteria imaged for “rape” is not met. The NCVC, meanwhile wished to capture the number of victims of criminal sexual contact that may need services.
Yet despite the different purposes, the two confidence intervals still overlap considerably. The absolute values admittedly appear to fluctuate considerably, but when when considered as relative percentages of the total, they only differ by a few percentage points. A meta-study combining the two may be able to narrow the range further.
The numbers are estimates of the needs of slightly different populations. Using the unqualified results, without clear explanation, to scare the public into accepting the political cause du jour is certainly unethical; however not enough evidence is given to suggest that either study is unethical in and of itself. I do not dispute that there may be flaws, such as the vague timeline alluded to in some questions.
Statistics is a complicated and nuanced field. There are rarely hard numbers produced that are easily digested by the public, but ranges of values that likely contain true value sought. It is very distressing that as many as 2.3% of adult women have had some sort of unwanted sexual contact in 2012.
Is also distressing, although less distressing in the absolute sense, that the President extrapolated ” has some sort of unwanted contact” to “were raped”.