Correlation Between Grades and Essay Answer Length:

Last semester, I for the first time recorded in my exam scoring spreadsheet the length of each answer. This let me figure out the correlation between the length and the grade.

Note that my exam had 13 multiple choice questions (which amounted to 1/3 of the grade) and one long essay (which amounted to 2/3 of the grade, and for which the median answer was about 3750 words). The students had four hours to do the exam, and the exam was open book and open notes.

The correlation coefficient of the total score (which combined the essay score and the multiple choice score) and the essay word count was 0.60, which is huge as correlations go. So longer is better, by a lot, right? The correlation between the total score and the word count for exams longer than the median exam was basically zero.

In fact, I sorted the spreadsheet by word count, and then added a column for each exam that measured the correlation between total score and essay word count for all exams this exam and longer (the Excel formula for the 5th shortest exam of the 81 total, for instance, was =CORREL(B5:B$81,K5:K$81), where column B was the total score and column K was the essay length). The column started at 0.60, got steadily smaller until the median, and then immediately past the median exam the column fell to basically 0 (-0.01, to be precise) and pretty much stayed that way as the exams got longer.

I also did the same with the correlation between the essay score and the word count. For that, the inflection point didn't appear until exam 50 out of 81, rather than 42 out of 81. That makes sense: Time spent on the essay is time not spent on the multiple choice, so there's some tendency for the longer essays (past a certain length) to have slightly smaller multiple choice scores.

Likewise, the correlation between essay word length and multiple choice score was mildly positive if we looked at all exams (0.12), but fell to basically 0 once one set aside the 17 shortest exams -- and once one set aside the 35 shortest exams, the correlation between essay word length and multiple choice score got to be -0.10 and stayed pretty much there (with some fluctuations).

Is this of any use to students? I highly doubt it -- it's hard to act on the advice, "write at least as many words as your median classmate," and in any event simply trying to make your exam longer is unlikely to make it better (even if longer is usually better, up to a point). Still, it struck me as an interesting data point; and perhaps some students might be happy to know that, past a certain level, quantity and quality aren't even correlated.

In any case, this is just one set of data; in past years, I didn't include the word counts in my spreadsheets, so I couldn't do the same analysis. But I'd love to see what other law professors find.

That sounds right to me, and is something I have wondered about.

The lesson, I think: you need to write enough to provide a thorough analysis of the basic points. But once you've done that, extra writing can just as easily help as hurt. True, you might come up with a new and helpful point. But you might add something wrong, irrelevant, or repetitive.
2.1.2008 7:48pm
Michael Guttentag (mail):
Maybe you've talked about this elsewhere, but I'm always curious about the correlation coefficients between student performance on different elements of the exam. For Business Associations, I give an exam that is 1/3 multiple choice, 1/3 short answer questions, and 1/3 essay. The correlation coefficients are around .3 between the multiple choice questions and the short answer questions, .4 between the short answer questions and the essay, and .2 between the multiple choice questions and the essay. These are decent correlation coefficients, but low enough to make me hesitant to drop any one category of questions.

I have not tried to segregate correlations coefficients by whether the students are above or below the median. Now that I have the quick way to do it in Excel I'll give it a try.
2.1.2008 7:55pm
D. Thomas (mail):
When I was a student I always found that how much I wrote on an essay was dependent on what professor I was in class with. Students DO compare answers and it's not difficult to find out what a professor is grading on. My overall approch was to follow what Mr. Kerr suggests, but there were certain professors that I knew needed ....ahem...additional help.
2.1.2008 9:21pm
Bill Dyer (mail) (www):
This is interesting. But what's odd is that you haven't commented on how the data compares to your intentions.

I'm assuming that your intention is to reward thorough answers, but not to award length per se, and to penalize inadequately thorough answers. Do you also have an intention to penalize students who ramble (i.e., include everything that a model "thorough" answer would include, plus filler)?
2.1.2008 9:37pm
Tracy Johnson (www):
Seems like the old adage of tossing the papers off the roof to see which one lands furthest. More ink should should decrease the drag on the paper.
2.1.2008 9:58pm
Curt Fischer:
Sounds like Prof. Volokh is doing some fast and loose statistics. I somewhat agree with the qualitative conclusion, but...

If the correlation coefficient declines to zero *at* the median, and was steadily decreasing until then, it implies to me that even essays well below the median length were *weakening* the correlation. That is, the breakdown in the correlation between length and grade would seem to be increasing far below the median.

I'm not a statistician, but before concluding too much more I would want to know the distribution of essay lengths (and essay scores), and ideally, just see a scatter plot of the data.

Also, if I remember correctly the correlation coefficient will be large only for *linearly* correlated data. What if there is a more complex (but not random) relationship between the essay length and the score?
2.1.2008 10:14pm
theobromophile (www):

Note that my exam had 13 multiple choice questions (which amounted to 1/3 of the grade) and one long essay (which amounted to 2/3 of the grade, and for which the median answer was about 3750 words). The students had four hours to do the exam, and the exam was open book and open notes.

Generally, the advice I've heard is to allocate time based on number of points - so you would do an hour twenty minutes for multiple choice and two hours forty minutes for the essay.

That said, 3,750 sounds really high for that amount of time. Maybe most students don't spend ten minutes each on multiple choice (the few law school exams I've taken were usually paced out for about four minutes per question), but, even so, 3,750 words for three hours twenty minutes just seems like a lot. I looked through my old exams, and I've usually done about 1,000 words/hour. Of course, I'm one of the few people who generally takes about 1/4th the allotted time to read the question, write an outline, and (if open note) research in my notes to make sure I haven't missed anything and/or nail down grey areas.
2.1.2008 10:35pm
jsalvati (mail) (www):
Why would you not just post the graph of score vs. word length?
2.1.2008 10:46pm
Brian K (mail):
Being of a hard sciences and medicine background, i'm curious to know what kind of multiple choice question would require so much time. On any of my test I have on average 50-70 seconds to answer a single multiple choice question. What do the legal multiple choice question entail?
2.1.2008 11:43pm
I think it all depends on the professor. Some are "check mark" graders--that is, they just put a check next to each correct point and then count up the check marks. There is no penalty for mistakes. In that case, it certainly pays to write as much as possible.

I generally write long exams and have noted the more I write, the better I do. To that end, I've written 7,000 words for a three exam and done quite well. Alternatively, exams that are word limited (for example, a 3000 word limit over 3 hours) I do less well.
2.2.2008 12:15am
Simon Spero (mail) (www):
Information Science to the rescue
Larkey, L. S. 1998. Automatic essay grading using text categorization techniques. In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Melbourne, Australia, August 24 - 28, 1998). SIGIR '98. ACM, New York, NY, 90-95. DOI=

Our results differ from previous work, which always found some kind of essay length variable to be extremely important. In [12], a large proportion of the variance was always accounted for by the fourth root of the essay length, and in [5], a vector length variable was very important, In contrast, our results only found length variables to be prominent when Bayesian classifiers were
not included in the regression. In all three data sets, the regression selected Rootwds, the fourth root of the essay length in words, as an important variable when only text complexity variables were included. In contrast, when Bayesian classifiers were included in the regression equation, at least two Bayesian classifiers were always selected, and length variables were not consistently selected. We speculate that our Bayesian classifiers captured the same variance in the data. An essay that received a high scores from a Bayesian classifier would contain a large number of terms with positive weights for that classifier, and would thus have to be long enough to contain that large number of terms.
(Larkey, 1998)

[5] T. Landauer, D. Laham, B. Rehder, and M. Schreiner. How well can passage meaning be derived without using word order? A comparison of latent semantic analysis and humans. In Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society, 1997.

[12] Ellis B. Page. Computer grading of student prose, using modern concepts and software. Journal of Experimental Education, 62(2):127-142, 1994.
2.2.2008 10:23am
DiverDan (mail):
As a former law student, I would suggest that this correlation and the results in terms of median length says as much about the predelictions of the professor doing the grading than it does about the quality of the student essays. The good students in law school almost always did their homework on their professors - a 1L talks to several 2L &3L students, not just about what topics are most likely to come up on the exam, but also about "what does Professor ________ like in an exam essay". We knew before the test that some professors were the "check graders" -- organization and coherence meant little, and it was not necessary or even desireable to focus only on the meat of the issue; in Torts, for example, our professor gave the highest grades to the students who merely listed every possible tort claim that could possibly be asserted based on a factual scenario, even if some claims were really a stretch, and then listed every possible affirmative defense or privilege, regardless of how well they fit the facts. Complete sentences were unnecessary; a simple numbered list would do. Any reasoned analysis of which claims or defenses best fit the facts and which ones were really iffy was unnecessary and would not improve your grade. On the other hand, my Professors in Contracts and Property wanted coherent, well organized analysis; a short, but well organized essay that got right to the heart of the issue and demonstrated sufficient knowledge to understand the issue was much more kindly graded than a rambling essay that touched on every issue the student could think of. Writing ability mattered almost as much as knowledge of the subject matter (which is entirely appropriate, since lawyers are expected to be able to communicate effectively in writing), and organization and a coherent argument mattered more than essay length. I would venture to guess that many of your students already had a good idea of what you were looking for in terms of essay length, issue spotting, analysis, organization, etc., before the test, and your stitistical analysis simply reflects a feedback loop. I would be more interested in knowing how the correlation works if you gave your exams to another professor, one who knows the subject matter as well as you, but has different idiosyncracies in test grading.
2.2.2008 11:29am
Doug Berman (mail) (www):
Great topic, Eugene, and I hope you do a lot more blogging/discussion on this topic because I have long feared that, for in-class exams, "quantity and quality" are highly positively correlated (whereas they often seem to me to be negatively correlated in the practice of law).

You state from your small sample that "past a certain level, quantity and quality aren't even correlated." But this to mean that, up to a certain level, the amount of words written is a (strong?) predictor of exam success. In turn, this suggests a real (big?) risk that (below the mean) slow exam writers are hurt simply for being slow exam writers.

I suppose it is not entirely troubling that (below the mean) slow exam writers, all else being equal, get worse grades on your exam. But is that your intent when deciding to give an in-class exam? Do you think it provides a good/fair/justifiable predictor of lawyering talent/knowledge?
2.2.2008 11:52am
Benjamin Davis (mail):
We have in first year courses a two side 30 minute question on the exam. That tends to favor the folks who can crisply do an analysis and tries to mirror the bar exam.
On long essays in the first year or other classes, I never put word limits. My personal view is that a student should be able to say as much as they think they need to say and not be bound by some word limit that I arbitrarily place on them. The match between what the question asks and what the response needs to be must vary from essay to essay. Some students do excellent jobs in few words and others do excellent jobs with many words. Literary types versus engineers. I started to write more about "length doesn't matter" but it seemed I was unconscioulsy going a place that I really did not intend to go.
2.2.2008 1:19pm
theobromophile (www):

Being of a hard sciences and medicine background, i'm curious to know what kind of multiple choice question would require so much time. On any of my test I have on average 50-70 seconds to answer a single multiple choice question. What do the legal multiple choice question entail?

I have a similar background, so I was thrown off to see law school multiple choice exams. The most recent one I've taken was in business law - 40 questions for about 2.5 hours. Not nearly enough time - and I'm usually very, very good at multiple choice.

The fact pattern is about a half of a page long. There is then a question. The answers are things like, "Yes, because this rule applies." Often, the answers for A-D or A-E will ALL be "yes" or "no," and you have to distinguish the rationale. So you go to your code book (UPA, RUPA, MBCA, etc), look up the relevant section, and eliminate wrong answers.

I had a similar exam on international intellectual property, where the answers were all about clauses in various treaties and conventions. I think that one was 20 questions, 3 hours, but we were allowed to write our rationale in the margins.

I've had two word-limited exams (both were 24 hour take-home exams, 3,000 words) and one space-limited exam (handwritten, in class, handwriting must stay on lines provided). The 24-hour take homes were pure hell; it was, IIRC, 19 hours straight of looking up footnotes in the book, writing, drafting, re-writing, and then condensing to get it all within the word limit.

Some of my professors have said that they prefer a well-organised exam; one of them even encouraged bolded, enumerated topic headings. Some don't seem to care.
2.2.2008 3:51pm
Brian K (mail):
thanks for the answer, theobromophile.

the multiple choice questions sound very similar in format to many of the ones i have to answer (esp. the clinical ones). but it seems like the legal questions require you to base your answer on vastly more information and resources than the questions i answer. it now makes complete sense to me why legal multiple choice questions require so much time.

haha...and i'm very glad non-multiple choice exams for me are few and far between. they do not sound enjoyable at all.
2.2.2008 4:21pm
MR (mail) (www):
Responses to DiverDan and Doug Berman: "I don't think so" on both. My final last fall was the first final I have ever given, anywhere. There was no ability for the students to know what I would want. Further, I made it an 8 hour take home with a 4,000 word limit. While it is possible that those who wrote less were slower writers, I think it unlikely that any of my students was so slow as to only be able to write 3000 words in 8 hours - if they understood the material.

My results? For word count under 3,000 the correlation between word count and exam score was .79 (!). For word count over 3000 (up to 4,000), correlation was -.01

I note that if I exclude the two highest word counts - both of whom exceeded the limit, the correlation goes to .16. I am not inclined to discount the result with the whole data set - if you write too much, it should negatively correlate with quality.
2.2.2008 4:23pm
Doug B. (mail) (www):
MR: An 8-hour take home with a word limit is a much different --- and I think much more valid --- test mechanism than the traditional in-class exam Eugene and I are talking about.
2.2.2008 6:32pm
lucia (mail) (www):
Here's my theory: students who can't think of anything to say at all, generally get poor grades. Likely, this group includes many who don't know the answers, and so can't put together and essay.

However, among students who know the answers, essay style makes less of a difference. You may be wordy, or concise. But if you put together a decent answer, it's a decent answer.

Does that fit the student work?
2.4.2008 10:11pm