pageok
pageok
pageok
Responding to Critics (2): "Second-choice" students

This is the second in a series of postings further explaining my work on the use and effects of racial preferences in law schools, and responding to critics of my work. One of the central claims in my research is that black law students are often "mismatched" by large racial preferences, placing them at schools where they do poorly and actually learn less than they would at a school with a smaller preference or no preference at all.

On Friday, I posted a new analysis that strongly corroborates the "mismatch" story: for a large sample of blacks admitted to law schools, those who passed up their "first choice" law school and went to a lower-ranked school -- in other words, going to a school where they would have been admitted with a smaller preference -- had dramatically better outcomes (grades, graduation, and bar passage) than blacks who made no such choice. Today I want to address some questions raised by this analysis.

First, are the results significant and reliable? The database for this analysis includes 1,757 black students entering law school in 1991. Just under one-tenth of these students (171) were admitted to their first-choice law school but chose to go to another school. This is a pretty large sample, and it means that any outcome where the success rate of the two groups of blacks is more than six or seven points apart (e.g., 80% vs. 87%) will be statistically significant. Pretty much all of the outcomes for black second-choice students are, in fact, better than the outcomes for other black students, by at least that margin (and sometimes by as much as 20 percentage points). So the answer to the first question is a resounding Yes.

Second, are there differences between the black second-choice students, and other black law students, that might account for their different rates of success? There is one important difference -- the blacks who chose their second-choice school have, as a group, slightly higher average credentials than other black students. That difference accounts for about one-seventh of their higher performance. Otherwise, the black second-choice students are largely indistinguishable from other blacks at the outset of their law school careers. They are about equally likely to have a parent who attended law school (6% for the second-choicers vs. 7% for other blacks), to have a "burning desire" to become a lawyer (30% vs. 30%), to be "very concerned" about getting good grades (89% vs. 88%), and to believe they experienced discrimination during college (68% vs. 64%).

The factor that makes second-choice blacks truly different is simply that they are less mismatched with their classmates than other blacks are. Because they have turned down their "first-choice" school, they are at a school where, on average, their "academic index" is only 93 points below the class mean, compared with a 140-point deficit for other blacks. This in turn means that they get significantly higher grades, on average -- and that, in all likelihood, makes all the difference for their future outcomes.

Going back to the technical discussion, controlling for differences in entering credentials makes one of the six interesting outcomes for these two groups statistically insignificant (ultimate bar passage). But the other five (first-year grades, third-year grades, graduation rate, first-time bar passage, and rate at which matriculants become lawyers) are significant, and all six outcomes are much higher for the second-choice blacks. One can debate what the proper controls should be -- which factors and comparison groups provide the fairest comparison -- but I have seen no analysis in which the second-choice blacks do not substantially outperform the comparison black group, and in which at least some of the differences are highly statistically significant.

Moreover, since the findings of the mismatch theory came from an entirely different analysis (comparing blacks and whites), but predict with great precision the actual improvements in outcomes for the black second-choice students, it would be hard to imagine a more compelling confirmation of its basic theses.

Responding to comments:

"Mahan Atma" says the results are "nonsense" because the blacks going to second-choice schools are not randomly selected; without randomization, there can be no true statistical significance. Not so. It is of course possible to determine the signficance of a difference between two groups that have not been randomly selected—all that significance in this context means is that the difference almost certainly is not due to randomness, but to some real distinction between the two samples. The crucial issue then is what variable accounts for this difference. The point of all regression analysis in the social sciences is to control for plausible differences that might explain why two groups have different outcomes. I find that when one uses these controls, the performance gap between the black second-choice students and others is largely intact -- and statistically significant.

"Michael" contends that the BPS dataset is too noisy to be useful; some respondents do not understand the questions properly and miscategorize themselves. But I counted as "second-choice" students only those who said that they had been admitted to more than one law school, and who did not attend their first-choice school for an identified reason (usually geographic or financial constraints). Moreover, we can accurately estimate the size of the mismatch these students faced at their schools. Certainly it's possible that some of the students I've identified as black "second-choice" students had their hearts set on going to UCLA, but went to their second-choice, Harvard, because Harvard offered them more money. But there can't be many such students (or the average size of the mismatch these students face wouldn't show up as being as small as it does), and to the extent such noise exists in the data, it simply implies that the results were strong enough to show through that noise.

"Donald" and several others wondered how the "second-choice" effects would play out for whites. I discuss this issue in some detail in my "Reply to Critics". Here's a short answer. The substantial number of whites who indicated they turned down their first-choice school (largely for the same reasons as blacks) tended to end up with a "positive mismatch" -- that is, they had higher credentials than most of their classmates. This led, predictably, to higher grades in law school -- well above the class median. In the top half of the class at most law schools, however, there isn't much difference in graduation and bar outcomes -- the vast majority of students graduate and pass the bar. Consequently, the benefits from a "positive mismatch" are a lot smaller than the harms of a large "negative mismatch". So "theory" predicts that whites going to second-choice schools will see little if any improvement in graduation and bar passage rates, and that's borne out by the data. (The white "second-choice" students may see significant job market benefits, but I haven't tested that idea yet.)

More coming up….

Mahan Atma (mail):
""Mahan Atma" says the results are "nonsense" because the blacks going to second-choice schools are not randomly selected; without randomization, there can be no true statistical significance. Not so."

It isn't just that the two groups aren't randomly selected; the problem is that NONE of the respondents in the LSAC-BPS dataset were randomly selected AT ALL.

" It is of course possible to determine the signficance of a difference between two groups that have not been randomly selected—all that significance in this context means is that the difference almost certainly is not due to randomness, but to some real distinction between the two samples."

You don't get it: In order to even consider the null hypothesis that the difference is due to randomness, you have to have assume the statistics you are comparing have probability distributions. If your data were gathered by a random sample of some sort, that would be a good assumption.

But with the LSAC-BPS data, there isn't a random sample anywhere in sight. It's meaningless to even consider the null hypothesis.

This is why your reference to statistical significance is so silly. You apparently fail to understand the framework in which the test works. The framework assumes you're dealing with probability distributions -- e.g. observational data gathered in a random sample, OR experimental data where the control group is randomly assigned. You don't have either in your dataset. So you have no business applying inferential statistical tests; they simply don't make any sense in this context.

"The point of all regression analysis in the social sciences is to control for plausible differences that might explain why two groups have different outcomes."

When social scientists do this, they usually have data generated from a random sample of some sort. If they don't then what they are doing is nonsense (and there is plenty of it).
6.14.2005 10:36am
Anonymous Law Student:
Someone's never seen a distribution approximating normal in social science research, it appears.
6.14.2005 11:08am
Mahan Atma (mail):
"Someone's never seen a distribution approximating normal in social science research, it appears."

Are you referring to me? I've seen plenty of normal distributions, and I've got many publications in peer-reviewed social science journals that do statistical analysis using normal approximations.

One way you get approximately normal distributions is by taking a random sample from a population, in which case certain statistics calculated from your sample will be normally distributed. That's why you can use a normal distribution when doing statistical inference.

But if your data aren't taken from a random sample, or you aren't randomly assigning subjects to a control group, or if there isn't any randomness anywhere at all, then doing statistical significance tests is meaningless.

That's the problem with trying to do statistical inference using the LSAC-BPS dataset. It wasn't a random sample taken from anything. It was simply data supplied by a (self-selecting) group of schools on a voluntary basis. This is what statisticians call "a sample of convenience".

Look, you could apply statistical significance tests to a bunch of data gathered on internet-based surveys like you see on CNN's website, or whatever. Of course, it isn't a representative sample from anything (like the LSAC-BPS data, it's self-selecting). And in fact, such data isn't even random in any meaningful sense of the word. But you can apply statistical tests to it nonetheless.

Just because you can apply a statistical test, does that make the test appropriate or meaningful?
6.14.2005 11:24am
Mahan Atma (mail):
Suppose you stand up in appellate court one day, and you make an eloquent, beautifully constructed argument to the panel that your client's First Amendment rights were violated. You present yourself with perfect grace. Your argument is full of completely accurate references to the facts and the case law. You do a wonderful job of distinguishing the cases that go against you, and so on and so on.

Then one of the judges interrupts you and says: "Look, you've brought an action against your neighbor, a private individual. The First Amendment only applies when there's state action; there isn't any state action in sight! What are you doing here???"

You'd feel (and you'd be) pretty stupid, right?

Well, guess what: A statistical inference test only applies when you have a probability distribution; and here, there isn't any probability distribution in sight!
6.14.2005 11:44am
Dirk Jenter (mail):
What Mahan Atma seems to be referring to is the question whether sample selection bias affects the results in the Rick Sander studies. For sample selection bias to cause the observed results, the underlying selection mechanism behind the LSAC-BPS data set would have to somehow select for black students which do better in their second choice school than in their first choice school. Said differently, one would have to argue that the inclusion criterion for the LSAC-BPS data set leads to more black students with this particualr outcome than present in the population. This seems unlikely.

The more severe problem of the study in question results from the endogeneity of the black students' decision to take their second choice school over their first choice school. What Rick Sanders finds is that a group of black students unexpectedly chooses a lower ranked school over a higher ranked school, and that these students then do better than predicted at the lower ranked school. The endogeneity bias comes from the non-randomness of the first decision, i.e., the decision to attend the lower-ranked school. Whatever unobserved factor causes that first unpredicted decision may very well also cause the performance difference at the lower ranked school. To put it very simply, observing that someonwe does unexpectedly well after enrolling in a tournament that person was not predicted to enroll in is not surprising: the person likely had a good reason to enroll in the tournament, and that reason is correlated with (or even causal for) the subsequent performance.
6.14.2005 12:02pm
Mahan Atma (mail):
What Mahan Atma seems to be referring to is the question whether sample selection bias affects the results in the Rick Sander studies.

That's certainly a potential problem, but the problem I'm referring to is far more fundamental.

And it's easy to recognize if you think through it carefully:

(1) State the assumptions required for the test of statistical significance used here (probably a standard t-test).

(2) Note that the test assumes there are probability distributions for the statistics involved.

(3) Now ask yourself: Exactly how are those assumptions satisfied? More precisely, where do the probability distributions come from?

If you have data that are generated in a random sample, that's one answer to (3). Or, if you have a randomly assigned control group, that would be another answer.

But here, you don't have a random sample. And the control group isn't randomly assigned. So tell me -- what's the answer to (3)?
6.14.2005 12:21pm
Dirk Jenter (mail):
Mahan Atma: Are you suggesting that one cannot perform statistical analysis of two samples unless someone somewhere explicitly tosses a coin? This would imho be a misunderstanding of applied statistics.

Whenever the assumption of random sample selection underlying the basic tests for statistical significance are violated, one gets some form of sample selection bias in your estimates and standard errors. This is the standard situation in the social sciences. The task is then to think harder about the sample selection issues at hand, and to figure out the likely direction of the bias. This is what I started doing in my above post.
6.14.2005 12:50pm
Michael @ CIR (mail):
I'm not sure you entirely responded to my concerns.

I think the problem with your efforts to control for problems with the LSAC-BPS database is that you have no guarantee that your "second choice" students were actually admitted into their first choice. (Unless there was a question asking them that question that you are not identifying in your response to me.) It could be that they never even applied to their "first-choice" school. There seems to be enough differences of opinion among the respondents about what "first choice" means so that someone may have simply had Harvard as their first choice, did not apply for "geographic reasons" or "financial constraints," and then nonetheless went to their first choice among the several schools that admitted him or her. You would include this individual in your "second choice" group, but (s)he has no business being there. Given the fact that there were only 171 blacks in your "second choice" group, a significant amount of "noise" suggests a problem.

It's also a bit disingenous for you to say that the "second-choice" group of blacks had average academic index "only 93 points below the *class* mean, compared with a 140-point deficit for other blacks." (My emphasis). As you point out in your reply, the LSAC destroyed the data for individual schools, so you don't have any information about people's scores relative to others in their specific law school class (which is how most readers would interpret the above phrase). Your "averages" refer to the Tier Means, the group of clusters that Linda Wightman created. Your reply accurately points out that there is a lot of overlap in the tiers, and they are a highly imperfect measure of eliteness. (Your draft reply states that the weakness of the "tier" variable in measuring eliteness was one of the two "fundamental errors" you made in your initial analysis, at least as a method of selecting individual students to compare.) I'm not a statistician, but couldn't at least some of the variation that you identify in "index gap" be attributable to the random distribution of blacks within a given tier? And since relative credentials explain law school grades under your theory, this could explain some of the differences in outcomes.
6.14.2005 1:00pm
Mahan Atma (mail):
"Mahan Atma: Are you suggesting that one cannot perform statistical analysis of two samples unless someone somewhere explicitly tosses a coin?"

If by "statistical analysis" you mean "tests based on statistical inference", then you need some source of randomness — it may or may not come from "tossing a coin".

Modern samples are usually drawn by computer-generated selection out of a population (e.g. a randomly selected list of names, randomly generated phone numbers, etc). Look at any major polling organization, for instance, using random-digit dialing. Or, look at major social science datasets like the Current Population Survey, which are based on carefully-drawn random samples from the population.

Look — What do you think a "sampling distribution" of a statistic is? How can you have a sampling distribution when the sample isn't random???

"This would imho be a misunderstanding of applied statistics."

Well I'm sorry, but you're wrong. Go get advanced degrees in probability theory and statistics, like I did, and you'll find out for yourself.

If you still think I'm wrong, then please address my question (3) above. State the assumptions of a test of statistical significance, and explain how they can possibly satisfied when you have no probability distribution anywhere in sight.

"Whenever the assumption of random sample selection underlying the basic tests for statistical significance are violated, one gets some form of sample selection bias in your estimates and standard errors. This is the standard situation in the social sciences. The task is then to think harder about the sample selection issues at hand, and to figure out the likely direction of the bias. This is what I started doing in my above post."

Bias (by which I mean selection bias — there is another technical definition of "bias" as statisticians use it) is certainly a problem in any sample, random or otherwise.

But please explain what you mean by "standard errors" when you have no random sample whatsoever.

You all need to think more carefully about the probability theory underlying the statistical analysis. Statistics based on random samples have "sampling distributions", which is a type of probability distribution. If you have no probability distribution, you have no sampling distribution, and hence no statistical inference tests.

Look at any explanation of what a "t-statistic" is. It usually talks about two "independent samples". How can samples be statistically independent (necessarily a probability-based concept) when there are no probabilities involved (because there is no randomness involved)?
6.14.2005 1:23pm
Dirk Jenter (mail):
Mahan Atma: Compare the following two situations:

(1) You have a random sample of the underlying population. You create two subsets of the sample based on an observable criterion (e.g. fist-choice blacks versus second choice blacks). You perform a test of equality of means for some variable, for example a t-test.

(2) You have a non-random sample of the underlying population. You again create two subsets of the sample based on an observable criterion (e.g. fist-choice blacks versus second choice blacks). You again perform a test of equality of means for some variable, for example a t-test.

You are arguing that the t-test is valid in the first case, but invalid in the second. I am arguing that the t-test may be valid in the second case as well, and that its validity depends on the nature of the selection criterion for the sample in case (2). If the (unobserved) variable(s) the sample selection is based on are uncorrelated with the difference in means between the two subsamples then the t-test would still be valid.

None of this has anything to do with there being probability distributions in the first case and none in the second. On a different note, arguing from authority (advanced degree in blah....) is generally not a good thing to do, but it definitely does not work when posting anonymously on the internet.
6.14.2005 1:50pm
Mahan Atma (mail):
Look, here's an easier-to-understand example of "nonsense" using statistical inference. Here are two lists of 10 numbers:

(1) 1, 4, 1, 5, 9, 2, 6, 5, 3, 5

(2) 8, 9, 7, 9, 3, 2, 3, 8, 4, 6

Dataset (1) has a mean of 4.1. Dataset (2) has a mean of 5.9. The difference in the two means is 5.9 - 4.1 = 1.8

If I wanted to, I could plug those two datasets into my computer, and conduct a statistical significance test to see whether the difference in the two means is "statistically significant". The computer would spit out some P-value, which I could interpret (if I was careless) as "the probability that the difference between the two means is due to random chance."

But that would be ridiculous. Why? Because the numbers in Dataset (1) are simply the first ten digits following the decimal point in the constant Pi. The numbers in Dataset (2) are the next ten digits. Hence, they are completely determined; they have no probability distribution whatsoever.

So it's meaningless to talk about the "probability" that the difference between the two means is due to random chance, because THERE IS NO RANDOM CHANCE INVOLVED! Actually, the "probability" that the difference is due to random chance is 0, regardless of the P-value, because the numbers are and always will be what they are: They are the first twenty digits following the decimal point of Pi!

But of course, the computer that crunches the numbers probably doesn't recognize them as such. And the computer cannot test whether the data was taken from some random sample, so it doesn't tell me that what I'm doing is nonsense. It simply spits out some P-value, and it assumes I know what I'm doing with that P-value. With a statistically naive researcher, that's obviously a big mistake.

Now, the LSAC-BPS data aren't quite as set in stone as are the digits of Pi, but the logic of the argument is the same: Without some probability distribution underlying the statistics, it makes no sense to consider the null hypothesis whether the differences are due to random chance.
6.14.2005 2:11pm
Mahan Atma (mail):
"If the (unobserved) variable(s) the sample selection is based on are uncorrelated with the difference in means between the two subsamples then the t-test would still be valid."

No, it wouldn't. For one thing, lack of correlation alone doesn't do it for you. You can have two uncorrelated variables that are still statistically dependent. (What you really mean is statistical independence, not lack of correlation.*)

Second, you have absolutely no way of proving that the unobserved variable the sample selection is based on is statistically independent of the difference in the means.

But more importantly, the only way you can do a valid statistical test on these two groups is by assigning the subjects randomly to the two groups. Otherwise, you have no probability distribution, and hence you have no way of defining "statistical independence"; it's a meaningless concept in the absence of any probability distributions.

So I ask you again: State the assumptions of the t-test. Now tell me how those assumptions are satisfied. Specifically, tell me where the probability distributions are coming from.

You can't do it.

* Definition of statistical independence: Random variables X and Y are statistically independent if and only if Prob(X given Y) = Prob(X), and Prob(Y given X) = Prob(X).
6.14.2005 2:23pm
Mahan Atma (mail):
Whhops, typo there: The last line should read:

* Definition of statistical independence: Random variables X and Y are statistically independent if and only if Prob(X given Y) = Prob(X), and Prob(Y given X) = Prob(Y).
6.14.2005 2:24pm
Mahan Atma (mail):
"On a different note, arguing from authority (advanced degree in blah....) is generally not a good thing to do but it definitely does not work when posting anonymously on the internet."

Unfortunately, neither does logic.
6.14.2005 2:32pm
John Jacobs Barton (mail):
I wonder whether there is a directly proportional relationship between the ranking of law school where a law professor teaches and the prestige of the law reviews where he publishes (i.e., the lower-ranked the law school, the lower-ranked the journals where he publishes) and an inverse relationship between how controversial his law review articles are and the rank of the law review where they are published (i.e., the lower-ranked the law review, the more controversial the article). If my hypothesis is true (let's call it the Sander Mismatch Hypothesis), then the most controversial law review articles should be published in lower-ranked journals by law professors who teach at lower-ranked law schools. By contrast, law professors with great prestige and professional esteem would write rather boring and staid journal articles about obscure topics in highly-ranked journals of the most elite law schools.
6.14.2005 2:54pm
Mahan Atma (mail):
"On a different note, arguing from authority (advanced degree in blah....) is generally not a good thing to do, but it definitely does not work when posting anonymously on the internet."

Of course if you don't think me or my purported credentials are meaningful, you can always look at the literature in the science of statistics.

Here's a particularly good article on this very issue by a good friend of mine who was once my dissertation advisor:

Statistical Assumptions as Empirical Committments
6.14.2005 3:19pm
Jake (mail):
But that would be ridiculous. Why? Because the numbers in Dataset (1) are simply the first ten digits following the decimal point in the constant Pi. The numbers in Dataset (2) are the next ten digits. Hence, they are completely determined; they have no probability distribution whatsoever.

This is true, but running the battery of statistical tests will at least inform you that there are significant differences between the two samples. This is all that Professor Sander has done here. Figuring out what the difference means (or, as in the digits of pi example, that it is meaningless) requires further thought.

The sample of (black students at second choice schools) performs differently than the sample of (black students at their first choice schools). This difference can not be chalked up to random noise. The difference could come from how the students are selected for each group, or it could come from some effect caused by being in each group (such as the mismatch effect proposed by Prof. Sander).

If you want to make the claim that selection bias is responsible for the difference in results, you need to make some argument for why this is so. For example, in the digits of pi sample the result is entirely due to selection effect, as you simply chose numbers that are different. In the case of the black law students, however, there is no obvious reason that the second-choice attendees should differ from the first-choice attendees in some way not captured in the academic credentials discussed by Prof. Sander.

The fact that a sample is self-selected is not enough to make it useless- before you start throwing around the word "nonsense," you need to offer some reason to believe that the self selection has thrown off the results.
6.14.2005 3:52pm
Mike1830 (mail):
Responding to Mahan's example taking 2 sets of ten digits from Pi, what if we didnt know whether you had chosen those randomly or from Pi? Could we do a t-test to see what the probability of getting those number would be if you draw those numbers out of a hat?

Getting back to the topic of Professor Sander's study, even taking all of Mahan's criticisms into account, doesnt Sander's study at least tell us that the two groups' differences are more likely not simply caused by randomness? And that coupled with the other information we know about the groups tells us something useful.
6.14.2005 3:59pm
Dirk Jenter (mail):
Mahan Atma: Thank you very much for the article, which is indeed neat. It exactly describes the sample selection issues I discussed in my 12:50pm post (substituting "independent" for my sloppy "uncorrelated"), and confirms my analysis of the situation under discussion here.
6.14.2005 4:07pm
Mahan Atma (mail):
"It exactly describes the sample selection issues I discussed in my 12:50pm post (substituting "independent" for my sloppy "uncorrelated"), and confirms my analysis of the situation under discussion here."

Exactly what part of the paper are you referring to?

And how can you have statistically independent variables in a dataset that is not systematically random in any way?

Remember, when one talks about statistically independent variables, one is necessarily assuming the variables have probability distributions (see the definition above). So without defining a probability distribution, the concept of statistical independence is meaningless. That's why your argument falls apart.
6.14.2005 4:21pm
Mahan Atma (mail):
"This is true, but running the battery of statistical tests will at least inform you that there are significant differences between the two samples."

What do you mean by "significant"? The "battery of tests" that test for statistical significance imply something very specific: The tests are designed to determine the probability that the difference in statistics is due to random chance. That's it.

As I already demonstrated with the Pi numbers (and as you seem to admit), that's an absurd question if you aren't dealing with data that incorporate randomness of some sort.

"The fact that a sample is self-selected is not enough to make it useless- before you start throwing around the word "nonsense," you need to offer some reason to believe that the self selection has thrown off the results."

Be careful: My use of the word "nonsense" specifically applied to Sander's use of statistical inference tests, not the percentages or tables he put together based purely on descriptive statistics. I have no problem with the use of descriptive statistics, per se.

But apart from that, it seems to me that as a scientist purporting to advance a certain hypothesis, the burden is on Mr. Sander to demonstrate the representativeness of his data. I don't think it's very scientific to shift the burden onto his critics to prove a negative by showing that the sample is biased through self-selection (unless Sander has already made the "prima facie case", so to speak).
6.14.2005 4:41pm
Shelby (mail):
Speaking as a law-school graduate and NOT a statistician, I think Dirk Jenter (in his first post) and Mike1830 have an important point, which also occurred to me independently. The group of subjects attending lower-ranked schools CHOSE to do so. That choice signifies something -- I would characterize it (provisionally) as indicating sophistication or judgment different from (superior to?) that of those attending the higher-ranked school. Assuming they are, as a group, self-knowledgeable and aware of the challenges of law school, they selected a school more appropriate for their abilities. The characteristics leading to such a choice, seem likely to be linked to the characteristics making for better law-school students and better bar-passage rates.

Thus, Rick's study may simply identify students who, as a group, tend to display characteristics that are associated with law-school success.
6.14.2005 10:05pm