According to the headlines on the internet, a new study by Todd Kendall of Clemson University (using regression analyses of panel data) finds that having computer access to the internet at home, and thus better access to internet porn, is consistent with the hypothesis that more porn REDUCES the incidence of rape (tip to Instapundit).
What neither the author of the study nor online commentators seem to have noticed is that the data also are consistent with the hypothesis that owning a computer INCREASES the incidence of rape. Indeed, the observed rape increasing effect of computer access is even more highly significant than the observed rape decreasing effect of internet access (p. 43).
Since by 2003 87% of households with computers had access to the internet, and since the two effects are about even in size, the "rape reducing" effects of internet access (-.730, p. 43) are almost completely offset by the "rape increasing" effects of owning a computer (.641, p. 43). Since it appears that everyone in the study who lived in a household with computer internet access also lived in a household with a computer, the net effect on rape of having household access to a computer and household access to the internet was nearly zero and probably not even close to being significant.
The supposed rape reducing effect of internet access that the study and commentators are talking about is the effect of household internet access, CONTROLLING for the observed rape inducing effect of household computer access. That these two highly intercorrelated variables tend to have implausible offsetting effects when significant is supported by other models in Kendall's paper; for example, one model shows a huge increase in prostitution arrests associated with computer access and a huge offsetting drop in prostitution arrests associated with internet access (p.51).
Without noting that internet and computer access go together and that the observed effects seemingly offset each other, one might wrongly conclude (as Todd Kendall appears to do) that his reported regression analyses of rape are consistent with the hypothesis that internet access reduces the measured incidence of rape. IMO, he should rerun his analyses using his internet access variable WITHOUT including the variable computer access, and then post all the coefficients in his main models. I would be interested in seeing those results.
It is amazing how many scholarly papers I've read in the last year either fail to report, misreport, or misleadingly present the use of control variables.
Just to be clear, the title of this post is tongue-in-cheek. Just from Kendall's regression analyses, I am skeptical of any meaningful effect one way or the other.
UPDATE: There are some excellent comments below, and a few that don't fully understand the main problem with the Kendall study. Although it is possible to have a 10% increase in internet access without ANY increase in computer access, that is extraordinarily unlikely.
Since by 2003 about 87% of those with computer access have internet access, what would be the likely effect on the incidence of rape of a state having a 10% increase in computer access and a corresponding 8.7% increase in internet access? One would multiply .641 by 10 and add it to -.730 multiplied by 8.7. The result would be an insignificant INCREASE in the incidence of rape. With different assumptions, one can get a slightly different net result, but there is no plausible combination that should lead to an overall effect that significantly reduced or increased the incidence of rape.
In my econometrics classes it was always strongly asserted that tests for multicollinearity should be performed, because correlated variables lead to divergent and spurious results. In one regression I ran, I (mistakenly) included two variables that were nearly perfectly correlated. The result was a very large positive coefficient on one of those variables and a very large negative coefficient on the other, as the two varibles were "working against each other" in the regression. Elimination of one variable led to a much smaller coefficient on the remaining variable.
In this case, I would like to see the scatterplot of home-based internet use versus home computer ownership.
However, I am not as willing to toss out the whole study, for three reasons:
(1) the "rape versus internet use" relationship appears to exist in a bunch of simple, "back of the envelope" tests.
(2) There are significantly different results for different age gropus, and
(3) T Internet useage is not a significant variable in other violent crimes, only rape.
Besides, the author includes many, many caveats about the potential reliability of the study. He merely says that the results appear to support the underlying hypothesis. I suggest that he is not being deceptive. But if he did not examine the correlation between all pairs of exogenous variables, he made a very amateurish omission.
http://www.slate.com/id/2152487/?nav=ais
"The bottom line on these experiments is, "More Net access, less rape." A 10 percent increase in Net access yields about a 7.3 percent decrease in reported rapes. States that adopted the Internet quickly saw the biggest declines."
The article cites Kendall actually, but not toward any end he would approve of, apparently. This is all just as specious, but I certainly prefer the conclusion that internet porn saves the world from rape.
I'd have to check with some experts, but I feel reasonably certain that people who are napping are not simultaneously committing violent acts.
2. The fact that computer access looks as though it increases violent crime is neither here nor there. It appears as though owning a computer without access to internet pornography increases rape. This, if anything, is completely consistent with the author's conclusion that internet access has a negative effect on rape.
Great comment.
Aaron C.,
You wrote:
1. Given the high correlation between computer access and internet access, including computer access in the model, if anything, makes it more difficult to find a significant coefficient on internet access.
No, it doesn't. Imagine instead that these coefficients were for individuals, not state averages. To get the effect on rape of internet access for an individual, you would have to add the effect of an individual having internet access to the effect of having a computer and these would cancel out. This is more complicated for state averages, but a similar problem exists.
Consider an analogy. If I predict income using two highly correlated predictors of education: years of education and highest degree, I sometimes get a negative effect for one measure of education and a positive effect for the other measure of education. But the effect of either measure alone is really positive. You can't have a high degree without a lot of years of education, just as you can't have computer internet access at home without computer access at home.
Really, all Kendall needs to do is delete the computer access variable and show that the coefficient for internet access is little changed. I have done tens of thousands of regression analyses, and when I get a result such as Kendall got (highly correlated offsetting effects), the effect almost always disappears (or is GREATLY reduced) when the model is specified properly.
I mean I suspect even controlling for socioeconomic status having your power cut off for lack of payment would correlate highly with criminal activity of all kinds. Internet access might be similar.
Yah it is amazing what kinds of studies get published.
You cannot apply the model in that fashion, at least not without certain restrictive (and highly unrealistic) assumptions.
Mr. Lindgren's argument suffers from the same problem (as does the original researcher's article.)
This problem is known as the "ecological fallacy". See this article by my former dissertation advisor:
http://www.stanford.edu/class/ed260/freedman549.pdf
I see where you're going with this and, I agree that it would have been nice to see the model without both highly correlated variables in there. But all that being said, the odds are very good that removing 'computer access' from the model will only serve to strengthen the coefficient on 'internet access.'
In your example of 'yrs. of education' and 'highest degree', including both variables in the model may indeed change the sign of one of the variables as compared to a model containing that variable alone. But what typically happens is that the coefficient that changes sign becomes insignificant. This is because the model cannot parse out the effect of a from the effect of b on the dependent variable. But if the coefficient on yrs. of education is significant even after controlling for 'highest degree', then that's pretty informative. It means that even though much of the useable variation in 'yrs. of education' is being sucked out by 'highest degree', we STILL can find a significant impact on the dependent variable.
Bryan Caplan, a professor of Economics at GMU, has a nice discussion of this point here:
The (potentially) larger problem with the model in this paper is that computer access and internet access are endogenously related to each other (internet access is determined by computer access) which may result in biased standard errors.
It depends on what statement the author is making. If the author generalizes to individuals based on aggregated data, you might have a case for the ecological fallacy. If the author is simply saying the relationship holds true at the county level, he's safe.
http:// econlog.econlib.org/
archives/2005/09/
multicollineari.html
For some reason, I can't seem to post the URL....
I was well aware of the ecological fallacy. Indeed, I discuss it briefly in my latest manuscript on another topic.
But doing state by state analyses is so common that to single out Kendall for this would be to dismiss much of several subfields of economic and legal research.
-dk
But if the author wasn't generalizing to individuals, then what in the world would the point of the study be?
No need to speculate though, just read the article.
What a peculiar take on science. "We know this method produced flawed results, but so many people use it that we keep using it anyway."
Is it any wonder that the hard sciences don't take social science seriously?
The county-level analysis has implications for predicting future crime rates in various jurisdictions.
Then why fit a model in which most of your predictors are contemporaneous with the variable you're trying to predict?
If you really wanted to predict future crime rates, you'd fit a model with Y(t) predicted by X at (t-1) (or some such lag value). Out of all the models presented here, only one (teen birth rates) is set up in this fashion, and clearly that isn't the focus of the paper.
Look, I thought it was pretty clear from the first sentence of his conclusion that he's interested in generalizing to individuals:
"VII. Conclusion
The results above suggest that potential rapists perceive pornography as a substitute for rape."
There is a bit of a gap between the experimental and nonexperimental studies on this. Comparisons of different countries and states, looking for evidence that an increase in pornography causes an increase in rape rates suggest that it doesn't. But some of the experimental work finds that exposure to pornography--and apparently violent pornography--makes both men and women more accepting of rape, and more prepared to see women as being like porn actresses. It might well be that for a healthy, well-adjusted adult, porn won't make much of a difference in behavior or attitudes. I wouldn't be so sure about a 14 year old kid, or a messed up adult.