Ninth Circuit Appellate Clerk Gender Split:

For the clerkship year that's just ending, the Ninth Circuit Clerk of Court's office reports that it's 73 men and 73 women.

UPDATE: Just to clarify, these are "elbow clerks," who are hired by each individual judge to work for him or her, rather than "staff attorneys," who are hired by the court to perform somewhat different roles than the elbow clerks serve. My sense is that to most people "clerk" means "elbow clerk," and staff attorneys are called staff attorneys; but some commenters asked about this, so I thought I'd note it.

And should that change the count?

What are the odds that this result could be reached through a random sample, even considering general guidelines of equal opportunity and gender preference considerations? Greater than 0, I'm sure, but quite small. It points to an overeager bean counter, I'd say.

Did'tthey break it out further?

Didn't they make sure they had the right numbers of ALL the groups?

Stupid Lib thinking and they wonder why anyone with a clue laughs at them.

has the formula.

It turns out that staroffice's spreadsheet has a "binomdist()" function to calculate this. (I suspect other spreadsheets also have this function).

binomdist(73;146;0.5;0) = 0.0659204

In other words, the chance that exactly 73 of 146 flips of an unbiased coin will turn up heads is just shy of 7%.

if the coin turns up heads 55% of the time, the probability that you'll get 73 of 146 is 0.0316511 (3.1%)

if the coin turns up heads 60% of the time, the probability that you'll get 73 of 146 is 0.0033483 (0.3%)

if the coin turns up heads 65% of the time, the probability that you'll get 73 of 146 is 0.0000675 (0.007%)

it the coin turns up heads 70% of the time, the probability that you'll get 73 of 146 is .. extremely small

Well, gee, Bobbie, that is sort of my question.

Also, after we resolve the gender pigeonholing (that sounds like a more interesting exercise than I had in mind, by the way) we can move on to making sure that the ethnic/racial/religious numbers work out right.

Were they 12.3% African American, 12.5% Hispanic and 3.6% Asian?

With a 50% coin, a constant number of clerks, and one-year clerkships, the likelihood of this happening once in ten years of clerk hiring is

1 - ((1 - 0.0659204)^10) = ~0.5056 (50%)

That this would happen sometime is hardly freakish or suspect.

Or, you know, it could be a massive conspiracy of federal appellate judges to wind up exactly 50-50.

Incidentally, Bill... what are the odds of the Supreme Court's distribution, assuming a roughly 55-45 distribution of top-qualified law grads?

Do federal judges really need to worry about such things? Is it not true they can select anyone they want regardless of the outcome? Is it not true they serve for life, and almost never get removed from the bench?

Bench Memos: +1 per, +2 if over 15 pages, -3 if over 30 pages.

Oral Argument Questions: +1 for questions that stump oralists, +2 for questions that make own judge look brilliant, -3 for questions that stump own judge.

Assorted personal tasks (fetching drycleaning, washing car, babysitting kids, etc.): +15 per.

Opinions: +1 per, summary denials of habeas petitions +0.001 (score normalized), +30 if reversed by Supreme Court.

Actually, what you need to know is that each individual judge hires on their own without coordination, and so this is a complete coincidence.

Actually, what you need to know is that each individual judge hires on their own without coordination, and so this is a complete coincidence.Query: if this is a complete coincidence, is the Supreme Court clerk hiring mentioned in the recent Linda Greenhouse article also a complete coincidence?

The same principle applies to each individual judge. Now you would get the sum of hypergeometric counts. That sum has some distribution. It would be interesting to see how the individual judges break down in their hiring. But even if we found a strong bias on some or the whole Circuit as a group, no one could do anything. They do what they want.

If you are asking whether I think there was a conspiracy on the Supreme Court to hire fewer female law clerks this cycle, then my answer is definitely "no, I don't think there was any such conspiracy".

If you want to try to evaluate the hiring patterns of each individual judge, knock yourself out. My point was just that since there is no such thing as the whole Circuit hiring clerks as a group, there is no significance to this happening to be an exact 50-50 split Circuit-wide.

Bill Sommerfeld (and Freddy Hill),

Your math proves nothing, nada. Any specific breakdown of clerks by gender has a low probability. Try 74M-72F, or 80F-66M. You'll find even lower probabilities. So what? There has to be some breakdown.

Zarkov,

Why presume that the applicant pool is as disproportionate as that? And why set up equal numbers as the alternative hypothesis. Again, of course it has a low probability, like any other specific breakdown.

I choose those numbers merely as an illustration that any valid calculation must incorporate information about the applicant pool. The equal outcome (73,73) does not provide enough information.

You are correct in that any specific outcome might have a low probability, and you need to calculate the probabilities for a range of outcomes.

any valid calculation must incorporate information about the applicant pool.True. Law school enrollments are about 52.5% male. Assuming, plausibly, that the applicant pool is the same, then getting a number of female clerks equal to or greater than the number of male clerks is wholly unremarkable.

And in fact, in some sense we can apply Bayes' Theorem already because we know that the prior probability that the Ninth Circuit is deliberately fixing the number of male and female clerks is negligibly small (because the Ninth Circuit isn't controlling the process at all). So, applying Bayes' Theorem will simply result in what we would expect: even if this particular distribution is unlikely as an a priori matter, it won't do much to change the negligibly small probability that the Ninth Circuit is actually fixing the number of male and female clerks.

You meant, I assume, "gay, lesbian, bisexual, transgendered." And why shouldn't it changed the count?

If you assumption is correct then yes the outcome is unremarkable. My point was one of methodology. Data on the applicant pool is important as the following story demonstrates.

At one time men outnumbered women in applying to law school. However some schools, like Georgetown, thought their entering class should be roughly 50% women, and they adjusted their admission policy accordingly. As a result the men admitted were of higher caliber and this impacted things like class standing and law review. Then the women screamed discrimination. A clerk in the admissions office leaked the admissions data showing the grades and LSAT scores of the applicants and admitted students proving Georgetown has lower admissions standards for women. The administration went ballistic even though the leaked data did not identify individuals by name. Ultimately they identified the clerk and really came down hard on him. Of course Georgetown was really embarrassed because they would never admit that they had lower standards for women.

I am a fan of Bayesian methods myself. However in this case it's easier to use the frequentist approach because we can use the hypergeometric distribution as the null hypothesis. Under this hypothesis the clerks are a simple random sample without replacement from the applicant pool. Then we reject the null hypothesis if the observed outcome exceeds some threshold. For example we could reject the null hypothesis is the number of women exceeds (say) 80. Or we can assign a significance level to the actual outcome by appropriate summing over the hypergeometric probabilities.

To take the Bayesian approach we would need a distribution that had a parameter(s) that determine the inclusion probabilities. In other words, the inclusion probabilities would not all be equal to n/N as there are in simple random sampling without replacement. Then we would need a prior for the parameter(s). I would use the non-informative or Jeffrey's prior because this provides invariance under different parameterizations. Using a different prior would make the analysis somewhat dependent on how you parameterized the model make the result vulnerable to criticism. Using a prior that assumes the discrimination is small is not recommended because this is the very thing under investigation. You would be accused of stacking the deck.

But how many each are gay? And should that change the count? You meant, I assume, "gay, lesbian, bisexual, transgendered." And why shouldn't it changed the count?Please do share the scientific reasons your assumptions are based on here. If you're going to be offensive, at least provide us a laugh.

There is no way you can avoid being accused of "stacking the deck" if any assessment of the prior probability of the thing being investigated is "stacking the deck". The basic problem is that there is a huge amount of information on this subject already (things like the actual way in which clerks are hired). So, if you attempt to set your priors by ignoring that other information, then you are "stacking the deck" just as much as if you set your priors in light of that other information.

Indeed, your proposal ("Then we reject the null hypothesis if the observed outcome exceeds some threshold. For example we could reject the null hypothesis is the number of women exceeds (say) 80.") is clearly "stacking the deck" in its own way, once you properly understand inductive inference, because it effectively assumes certain values for the terms in Bayes' Theorem. So, haven't escaped the problem of "stacking the deck", but rather have just buried it in your act of threshold-setting.

In short, there is no way out of this basic problem: you simply cannot use the a priori probabilities of your observations by themselves to draw inferences about these sorts of matters. Instead, one way or another, you have to evaluate your observations within the greater context of the available information about the contested issues.

If you use a prior that puts a high probability on the value of the parameter you favor then you have indeed "stacked the deck." As an extreme case you could put a probability mass (a Dirac delta function) right at the value you want. On the other hand, if you use a uniform prior, a maximum entropy prior or a non-informative (Jeffrey's) prior, you have taken a conservative approach. You don't need to use prior information about the way the clerks are selected to get a valid defensible analysis. Here's why. As you get more data the prior washes out. Think of it as a kind of initial condition that creates a transient in the posterior distribution. Using a diffuse prior will cause that transient to die out more slowly then a more realistic prior. In other words, ultimately the data dominates posterior. With very little data you merely spit back the assumptions as represented by the prior. Now I agree it's always better to use the correct prior if you have a high confidence in it. But in an adversarial situation it's often better to use the most conservative prior.

"For example we could reject the null hypothesis is the number of women exceeds (say) 80.") is clearly "stacking the deck" in its own way, once you properly understand inductive inference, because it effectively assumes certain values for the terms in Bayes' Theorem."

What you say is true within the framework of a Bayesian approach. But not everybody accepts the Bayesian framework as the appropriate way to do statistical inference. Remember something like 90% of statisticians are frequentists, not Bayesians. They put the sample space on the data, not the parameter(s). They don't accept the fundamental notion of subjective probability and so on. For example in simple random sampling the implicit prior is the uniform distribution. But I can tell you from experience most statisticians won't accept that. To them all the Bayesian notions don't even exist!

There is nothing wrong with what I proposed (if the hypergeometric model applies). The method of hypothesis testing is what people use most of the time, including the legal arena.

It is true that what you are calling a "conservative" approach amounts to overweighting new evidence and underweighting (or simply ignoring) whatever old evidence you might have, which means that new evidence will more quickly (or even immediately) come to dominate your a posteriori probability assessment.

But how is this not "stacking the deck"? And what makes it "conservative"? Indeed, given most definitions of "conservative", it seems awfully UNconservative to assume away the significance of older evidence and go just with whatever the latest evidence indicates in isolation.

Incidentally, I am well aware that there are many people with reservations about Bayesianism in general, but much of that has to do with the subjectivism in what is known as Bayesianism. And in any event, you don't need to decide the merits of subjectivism to recognize the validity of Bayes' Theorem itself, because Bayes' Theorem is just a deductive consequence of the Laws of Probability. And thus what Bayes' Theorem indicates about the interaction between new evidence and old evidence remains valid no matter what general probability theory you accept.

Finally, even if it were true that many statisticians and legal authorities completely ignored these basic principles, at that point I would simply cite authorities such as Benjamin Disraeli and Mark Twain, who uttered variations on the theme that "there are three kinds of lies: lies, damn lies, and statistics." And while I don't think that is completely accurate, I do think it captures the common misuse of statistics in making various sorts of argument.

"It is true that what you are calling a "conservative" approach amounts to overweighting new evidence and underweighting (or simply ignoring) whatever old evidence you might have …"No, a conservative approach has nothing to do with "overweighting" new evidence. If you really believe you know the distribution of the parameter under consideration, then there is no need to take data. Usually the prior simply reflects your state of knowledge about the parameter absent any data or any new data. The data serves to update your knowledge about the parameter as expressed in terms of the posterior distribution.

Suppose we have a coin and we went to know something about the probability of "heads." What do I pick for a prior? If I think the coin is biased towards heads, then I can use an asymmetrical prior to reflect that belief. After I toss the coin a number of times, the distribution for the "heads" probability will converge to a delta function at the true value of the parameter. But suppose my prior belief was faulty and the probability of heads not greater than 50%. Then a flat prior would have been better because I get a more accurate approximation to the heads probability for a given number of tosses. In this sense the flat prior is "conservative." If I choose a very peaked function centered at p= 0.8, then it would take many tosses to converge is the real probability was p= 0.2. This would not be a conservative prior. But if the real probability were p= .84 then it was a wise choice because I would refine that initial p= 0.8 very quickly. So it all depends on how confident you are of your prior. If physics gives you the prior, use it.

"And thus what Bayes' Theorem indicates about the interaction between new evidence and old evidence remains valid no matter what general probability theory you accept."

Bayesian inference is much more than simply using Bayes' formula. The objections people have go beyond the idea of subjective probability. The reason Bayesian methods have become more popular over the last ten years has to do with the Markov-Chain Monte Carlo computing technique. Now we can generate sample from the posterior distribution numerically when we can't do the integrals. As I said I'm a big fan of Bayesian inference, and I use it all the time.

I hope that clears up my position.