Making Data Available:

The Tulane Law Review controversy brings up an important point, which commenter Frog Leg noted: Shouldn't law reviews make a practice of including the raw data supporting an article's assertions in an Appendix, at least so long as the data wouldn't take more than several pages?

That way, law reviews would be reminded of their responsibility to check the data, and readers will find it more consistently accessible. Putting on the Web is nice, but it involves various risks, including a risk that the law review editors won't feel it to be their responsibility to check it -- and the risk that it will get taken down. As it happens, though the law reviewarticle states that "The table will be available without charge on the Tulane Law Review's Web site ... for one year from publication," the law review has taken down the original table in the wake of the errors that have been discovered. This makes it harder for future researchers to closely follow the course of the controversy; even if the revised table is eventually posted, it will be hard for people to see what errors had originally been made.

Had all the cases been included in a short appendix, the data would have been permanently available the same way the text is permanently available. Of course, if there were a practice of putting the data online while still having it be cite-checked, and still having a firm promise on the law review's part that the data will be permanently retained -- perhaps in some centralized repository from which the data couldn't vanish as a result of law review decisions, or for that matter law review technical errors -- that might be as good or better. But for now, putting the material in the article's text remains the most traditional and most reliable way of preserving the data, and seems quite sensible for datasets that don't take more than several pages.

Frank M Howland (mail):
Making raw data available is of course a good idea. I just wish to point out that in my field, Economics, making available the data one uses is only gradually becoming the norm. The fact that only a few journals followed this practice until quite recently is something of a scandal.
9.22.2008 6:20pm
A. Zarkov (mail):
Researchers hate to provide the raw data-- and for a good reason. They're afraid someone will prove them wrong. Hansen tried to keep everything secret until Congress forced him to provide his computer source code.
9.22.2008 6:26pm
Dan Weber (www):
Researchers also do not like to present their data because someone else may scoop them on a big result they haven't finished researching yet.
9.22.2008 6:36pm
I vote for not filling pages with useless data. Provide the relevant raw data in digital format on the internet and leave the paper unsullied.

There's nothing interesting you can learn by staring at numbers anyway.
9.22.2008 6:37pm
Curt Fischer:
Law reviews don't have "supplementary information" files or materials, available (exclusively) with online versions of articles, for all subscribers to read?
9.22.2008 7:22pm
Sevesteen (mail):
How about publishing the data on a third party website, where the law review can't later take it away?
9.22.2008 7:53pm
gerbilsbite (mail):
I'm just transitioning from the world of political science to the world of law, and I'm fairly stunned that statistical arguments made in law reviews wouldn't already be providing the raw data. It seems like such a natural part of the academic dialog to allow someone access to the same set of givens that you've worked from, that I have a difficult time imagining why ego would ever be allowed to trump thoroughness.

If, for example, I had first read Mansfield &Snyder's "Democratization and the Danger of War" without the accompanying statistical arguments, I don't know that I ever could have really brought myself to believe their theses. But the numbers made it a lot simpler to clarify their claims and hone their actual arguments into something workable and functionally debatable. Otherwise, isn't it simply an issue where the doubters will deny and the readers predisposed to the premise will swallow the lot, and those looking for rational discourse will tap their toes and look at their watches, waiting for verifiable (or falsifiable, if you're into that sort of thing) data to come down the pike?
9.22.2008 7:56pm
A. Zarkov (mail):

"There's nothing interesting you can learn by staring at numbers anyway."

Tell that to Johannes Kepler to stared at Tycho Brahe's numbers, more formally known as The Rudolphine Tables, and came up with the three laws of planetary motion.
9.22.2008 8:13pm
Dan Weber makes a point well-known in experimental science: researchers mine their data sets over time, and would like to publish serially, before the whole analysis is complete. The norm should be that data becomes public once the researcher is "done" with it.
9.22.2008 9:14pm
theobromophile (www):
Would it be possible to make the data sets available in the Lexis-Nexis and Westlaw versions, if not the print ones? Subsequent corrections and amendments could be posted in later issues, thereby eliminating the problem of losing the original data.
9.22.2008 9:16pm

Tell that to Johannes Kepler to stared at Tycho Brahe's numbers, more formally known as The Rudolphine Tables, and came up with the three laws of planetary motion.

You know he stole that data -- Brahe didn't want to give it to him . . .
9.22.2008 11:13pm
Curt Fischer:
I take issue with the claims of Lior and Dan Weber that scientists do not completely disclose their data in publications describing it.

At least in my field, this is not the norm. Of course no researcher publishes all the data they have ever collected in their most recent article. One reason for this trivium is that most researchers have several pots on the stove, and aren't sure how to dish out all their results into publications yet. They also like having more publications rather than having fewer.

But, that strikes me a lot differently than what Lior and Dan Weber are suggesting. Here, the data withheld from publication appears to be a complete data set. Not even the subset of the data necessary to replicate the authors' statistical analyses was provided.

Withholding data may be more common in other areas, such as clinical medicine (Vioxx?) or environmental field studies (Hansen). The norm in my field, and to my knowledge, most of laboratory-based experimental science, though, is that the level of detail in a publication should enable readers skilled in the art to replicate the published work.

If you don't have the raw data, you can't possibly have replication.
9.23.2008 1:21am
Dan Weber (www):
I don't seek to excuse, merely to explain.
9.25.2008 11:21am