pageok
pageok
pageok
DNA Matches and Statistics:

Patterico criticizes the use of statistics in this L.A. Times article:

[I]n 2004, a search of California's DNA database of [338,000] criminal offenders yielded an apparent breakthrough [in a 1972 rape/murder case]: Badly deteriorated DNA from the assailant's sperm was linked to John Puckett, an obese, wheelchair-bound 70-year-old with a history of rape.

The DNA "match" was based on fewer than half of the genetic markers typically used to connect someone to a crime, and there was no other physical evidence.

Puckett insisted he was innocent, saying that although DNA at the crime scene happened to match his, it belonged to someone else.

At Puckett's trial earlier this year, the prosecutor told the jury that the chance of such a coincidence was 1 in 1.1 million.

Jurors were not told, however, the statistic that leading scientists consider the most significant: the probability that the database search had hit upon an innocent person.

In Puckett's case, it was 1 in 3.... In every cold hit case, the [scientific expert advisory] panels advised, police and prosecutors should multiply the Random Match Probability (1 in 1.1 million in Puckett's case) by the number of profiles in the database (338,000)..

I'm not knowledgeable enough about these things to speak with confidence about just how these things can be explained accurately and comprehensibly to a jury. I may also be mistaken even about the more basic things (I've forgotten far too much about statistics, I'm sorry to report). Still, I'm pretty sure that both "the chance of such a coincidence was 1 in 1.1 million" and "the probability that the database search had hit upon an innocent person ... was 1 in 3" aren't quite right.

To begin with, if I'm right that the "1 in 1.1 million" number means roughly that 1 in 1.1 million people have the particular DNA markers that Puckett had, that's about 6000 people worldwide. To say that the defendant is one of only 6000 people who may have committed a crime (or one of only 3000, if the 1 in 1.1 million figure means 1 in 1.1 million males) doesn't by itself tell you much. It certainly doesn't tell you that there's only a 1 in 1.1 million chance that "although DNA at the crime scene happened to match his, it belonged to someone else."

Now it may well be that, coupled with other evidence, the DNA match information might be quite probative. To take a simple example, imagine that only 1 in 100 men have red hair, and it's discovered that the killer had red hair. That the defendant had red hair is surely relevant evidence, coupled with various other evidence. But it doesn't say that there's only a 1% chance that "although the hair color at the crime scene happened to match [the red-haired defendant's], it belonged to someone else."

On the other hand, "the probability that the database search had hit upon an innocent person ... was 1 in 3" also strikes me as wrong. Most obviously, I don't think it can't be the case that you should just "multiply the Random Match Probability (1 in 1.1 million in Puckett's case) by the number of profiles in the database (338,000). That's the same as dividing 1.1 million by 338,000" to yield the "chance that the search would link an innocent person to the crime." Say that the database had 2.2 million profiles; under that calculation, the chance that the search would link an innocent person to the crime would be 200%, obviously nonsensical.

Now I think that the multiplication might be someone's oversimplification of a different formula — the chance that a database of 338,000 people would yield a match with an innocent person, when there's a 1/1,100,000 chance that any particular innocent person would have the DNA markers. That formula is 1-(1-1/1,100,000)^338,000, which yields 26.5%, rather than 338,000/1,100,000 (30.7%). Nor is it an accident that the two percentages are close; when n is relatively low compared to a, 1-(1-1/a)^n is relatively close to n/a.

But even taking account of this oversimplification, this strikes me as mistaken. 1-(1-1/1,100,000)^380,000 is the probability that, if the rapist is not in the database, a database search would still come up with someone (who would then be innocent, since by hypothesis the rapist is not in the database). It is not "the probability that the database search had hit upon an innocent person."

Here's one way of seeing this: Let's say that the prosecution comes up with a vast amount of other evidence against Pickett — he admitted the crime in a letter to a friend; items left at the murder site are eventually tied to him; and more. He would still, though, have been found through a search of a 338,000-item DNA database, looking for a DNA profile that is possessed by 1/1,100,000 of the population — and under the article's assertion, "the probability that the database search had hit upon an innocent person" would still have been "1 in 3."

Despite all the other evidence that the police would have found, and even if the prosecutors didn't introduce the DNA evidence, there would be, under the article's description, a 1/3 chance that the search had hit upon an innocent person (Pickett), and thus a 1/3 chance that Pickett was innocent, presumably more than enough for an acquittal. That can't, of course, be right. But that just reflects the fact that 1/3 is not "the probability that the database search had hit upon an innocent person." It's the probability that a search would have come up with someone innocent if the rapist wasn't in the database.

So, as I said, I'm not sure what juries should be told about these statistics, and how to weigh them together with the other probative evidence that's introduced at trial. But it seems to me that both of the options given in the quote — "the chance [that although DNA at the crime scene happened to match [defendant's], it belonged to someone else] was 1 in 1.1 million" and "the probability that the database search had hit upon an innocent person ... was 1 in 3" — are incorrect.

Zathras (mail):
Your math is correct throughout. And without some odds on the probability that the actual criminal is in the database, no precise number can be done.

The only additional relevant thing which might help the prosecution here is a location restriction: how many of all the people in the DB were in the relevant area at the time of the crime? It would bring the numbers towards the prosecutor's, although probably not near to the 1 in a 1.1 million number used.

BTW, most calculators have trouble (1-1/x)^y precisely, when x and y are both very large. Fortunately, there is a very good approximation in this case: (1-1/x)^y ~ e^(-y/x), where e=2.7182818......
5.5.2008 5:41pm
Ron Billings (mail):
The government lies with statistics??!! Say it ain't so...

http://fieldsobrietytest.info/raw.html
5.5.2008 5:58pm
Curt Fischer:

On the other hand, "the probability that the database search had hit upon an innocent person ... was 1 in 3" also strikes me as wrong. Most obviously, I don't think it can't be the case that you should just "multiply the Random Match Probability (1 in 1.1 million in Puckett's case) by the number of profiles in the database (338,000). That's the same as dividing 1.1 million by 338,000" to yield the "chance that the search would link an innocent person to the crime." Say that the database had 2.2 million profiles; under that calculation, the chance that the search would link an innocent person to the crime would be 200%, obviously nonsensical.


The direct multiplication by 338,000 makes sense to me if we interpret the result as the expected number of matches in the database. For example say the Random Match Probability is 1 in 3. If we check 12 people, we would expect, on average, 4 of them to match. If the RMP is 1 in 1.1 million, and we check 338,000 random people, sometimes we get a match, sometimes we don't. If we can repeat the experiment multiple times, on average a given sample of 338,000 people will yield a match about a third of the time.


Now I think that the multiplication might be someone's oversimplification of a different formula -- the chance that a database of 338,000 people would yield a match with an innocent person, when there's a 1/1,100,000 chance that any particular innocent person would have the DNA markers. That formula is 1-(1-1/1,100,000)^338,000, which yields 26.5%, rather than 338,000/1,100,000 (30.7%). Nor is it an accident that the two percentages are close; when n is relatively low compared to a, 1-(1-1/a)^n is relatively close to n/a.


I do not understand the 1-(1-1/a)^n formula. If 1/a is the chance of a single random person matching the pattern, then (1-1/a) is the chance of a single person not matching the pattern, and (1-1/a)^n is the chance of not a single one of n people matching the pattern. 1 - (1-1/a)^n is therefore the chance of getting one or more matches, period. It isn't clear to me why it has anything to do with guilt or innocence.
5.5.2008 6:04pm
Christopher Hundt (www):
What the "expert"-provided calculation does compute is the expected number of "cold hits" on innocent people when you search with that level of DNA match.

So you are absolutely correct that it would be wrong to say there was a 1 in 3 chance that he was innocent. However, it would be correct to say that they could expect, on average, to get a hit on 1 innocent person for every 3 times that they do a similar search.
5.5.2008 6:05pm
Christopher Hundt (www):
Beaten by one minute :)

But there is an inaccuracy in your post, Curt:

If we can repeat the experiment multiple times, on average a given sample of 338,000 people will yield a match about a third of the time.

I'm sure that's not what you meant to say, since it's equivalent to the false "one in three" claim. Rather, "on average a given sample of 338,000 people will yield one match for every three repetitions" would be more accurate.
5.5.2008 6:09pm
David Chesler (mail) (www):
Say that the database had 2.2 million profiles; under that calculation, the chance that the search would link an innocent person to the crime would be 200%, obviously nonsensical.

The expected number of matches would be 200% of a person, or 2 people. That doesn't strike me as nonsensical. (If it's a given that the actual rapist is not in the database, that's N innocent people; if it's a given that he is in the database, that's N-1 innocent people plus 1 guilty person.)

What percentage of the population understands Bayesian probabilities?

The relevant probability might be "Given the match" [DNA markers or red hair] "what are the odds that the match happened randomly?" If the defendant is only there because of the match, then it becomes a Monty Hall problem: He represents all of the closed doors, because the prosecutor was going to charge whoever in that database matched. But if he's there independently, the chances that a random individual matched is relevant. And if they investigated anybody who matched, and looked for inculpating evidence on those people only, I don't know.

And if you use magic pig bladder powder to regrow your finger tips, do you keep your same fingerprints as before? Are there really 60 billion distinguishable fingerprint patterns?
5.5.2008 6:11pm
Christopher Hundt (www):

I do not understand the 1-(1-1/a)^n formula. If 1/a is the chance of a single random person matching the pattern, then (1-1/a) is the chance of a single person not matching the pattern, and (1-1/a)^n is the chance of not a single one of n people matching the pattern. 1 - (1-1/a)^n is therefore the chance of getting one or more matches, period. It isn't clear to me why it has anything to do with guilt or innocence.

It is the chance of a match to an innocent person because the chance of matching the guilty person is (modulo stuff about errors in matching) simply the probability that the guilty person is in the database, not 1/a.
5.5.2008 6:14pm
Aultimer:
Stats aside, is a DNA match that's 1:1.1M odds to be random standing alone "beyond a reasonable doubt"? I hope not, but fear so.
5.5.2008 6:20pm
Realist Liberal:
I worked on the Puckett case at the SF District Attorney's Office as an intern so I have to be careful what I say but everything I will say is public record (reporting what experts testified and what we put in our briefs, etc.)

The Times article really misleads it readers when it talks about the 1 in 3 statistic. It makes it seem as if the DNA comparison that was done in the cold-hit (the 1 in 3 probability of coincidence) is the same comparison as the final comparison (the 1 in 1.1 million probability). In fact, a computer does the modeling in the cold hit case and the odds are affected merely by changing the number of profiles examined. Because the crime lab searched every male who has ever been in prison and was alive in 1972, the odds are much greater of a match and thus greater that it is a coincidence because only 3 markers are used. Once the lab found the potential match, a DNA analyst performed another comparison using a different statistical method, known as Random Match Probability. It is the second comparison that matters. If the crime lab had said it was a 1 in 3 chance of coincidence and then the subsequent testing would have said Puckett was not a potential donor, we would not have prosecuted him.

It seems to me that Patterico does a very effective job of explaining the facts that the L.A. Times leaves out. In terms of what juries should be told, I think the California Court of Appeals put it best when it said "the database search merely provides law enforcement with an investigative tool, not evidence of guilt." "The means by which a particular person comes to be suspected of a crime . . . is irrelevant to the issue to be decided at trial, i.e., that person's guilt or innocence, except insofar as it provides independent evidence of guilt or innocence." People v. Johnson (2006) 139 Cal. App. 4th 1135, 1150.
5.5.2008 6:22pm
Eugene Volokh (www):
I agree that n/a may be the expected number of matches in a database, even if the guilty man isn't in the database. But the article spoke of "the probability that the database search had hit upon an innocent person," and not of the expected number -- unsurprisingly, because the question in a criminal case is indeed "how likely is it that the defendant is an innocent person?," and in particular whether the likelihood rises to the level of reasonable doubt about guilt.
5.5.2008 6:24pm
Ben P (mail):
Ah, I see now.

The argument is that if one person in 1.1 million will have those same markers, than if there were 2.2 million people in the sample you'd theoretically find 2 people with those markers, and if there were 4.4 million people you'd theoretically find 4 people with those markers.



I can see how that would be difficult to present to a jury in a relatively unbiased light. Even if a defense attorney were to tell me that "that means there are 6000 other people in the world that could have also committed this crime." Or even if there were 10 other people in Los Angeles that could have committed this crime. I don't think I would find it convincing. But if the only evidence were that DNA, I suppose I could find reasonable doubt.

That's why I guess it has to be coupled with at least a minimum of outside evidence. Not just that this match occurred, but that this match occured and the defendant was in the area at the time.
5.5.2008 6:44pm
Cornellian (mail):
I've come round to the view that an understanding of basic statistics is so fundamental to operating in modern society that it really ought to be a major, required component of high school math courses, and I'd even toss trigonometry to make room for it, if necessary. It goes without saying that it should be required as part of any hard science or social science degree as well.
5.5.2008 6:50pm
Christopher Hundt (www):

It seems to me that Patterico does a very effective job of explaining the facts that the L.A. Times leaves out. In terms of what juries should be told, I think the California Court of Appeals put it best when it said "the database search merely provides law enforcement with an investigative tool, not evidence of guilt." "The means by which a particular person comes to be suspected of a crime . . . is irrelevant to the issue to be decided at trial, i.e., that person's guilt or innocence, except insofar as it provides independent evidence of guilt or innocence." People v. Johnson (2006) 139 Cal. App. 4th 1135, 1150.

I'm not so convinced. Consider just the following hypothetical fact:
- The defendant's DNA matches the perpetrator's in a way that is very unlikely to occur by random chance (say 1:1,000,000)

Now consider two alternative "methods of investigation" by which the defendant could have been identified and his DNA tested:
1. The defendant was found in a DNA database of 3,000,000 people
2. An acquaintance of the defendant heard him say something mildly incriminating (but not admissible as evidence against him), the police questioned him, and he consented to having his DNA tested.

I would consider there to be a massive difference in the probability of guilt between those two cases. The idea being that there is only a very small pool of leads that the police can investigate, so in case 2 the fact that they were suspicious specifically of him couples (in the mind of a reasonable person, if not according to the law) with the fact of DNA match to create overwhelming probability of guilt. But of course my reaction is not the same in case 1.

So if you simply present the "evidence" and not the "method of investigation" a juror is left to speculate how the defendant came to be investigated in the first place, and they may come to opposite conclusions regarding guilt depending on whether they assume something closer to case 1 or case 2.
5.5.2008 7:00pm
Lior:
The issue here is that not everyone understand the difference between a-priori probabilities and statistical inference.

As Curt Fisher points out, we should drop the word "innocent" from the calculation. What Prof. Volokh calculated exactly, and the newpaper (following a recommendation from a board of statisticians) calculated using a linear approximation, is the probability of a random match in the database, under the following assumptions:

--- People were chosen to be included in the database independently and uniformly at random. Independently is both from each other and from the fact that we are searching for the match for the DNA at that crime scene.

What we would really like to know, on the other hand, is the probability that X is a criminal, given the other data. That is harder to determine -- it requires knowing the probability that the criminal is in the database.

There are two reasons for the positive match (persumably it was unique):

1. The criminal is in the database
2. The criminal was not there, but someone else matched the profile.

Normally, police start with a suspect (for simplicity, not in any database). Then, when the DNA match is positive, they can say: the events of "this person matching the DNA sample", and "this person becoming the suspect" are independent. Then, the probability of this match being an innocent match is more-or-less the probability of a random person off the street matching the profile. This is very low.

Here, the situation is different. The suspect was found by searching the database. The best way of looking at the situation is called "statistical significance" or "confidence level" in science, where the question studied is "is the perpetrator in the database"? and the null hypothesis is "the perpetrator is not in the database".
Since, even if the null hypothesis is true, we have a 30% chance of a positive result, we cannot say that the match is significant (that the database contains the suspect).

You can't publish a scientific paper at the 70% confidence level. That someone got convicted based on such evidence boggles the mind.

The person directly to blame here is the judge, who denied the jury the information. The reasoning was that the information might confuse the jury, and that explaining why the defendant was in the database would expose the jury to the fact of his prior conviction. If a member of the public might be confused by the analysis above, then that person is lacks basic competence to serve as a juror, and the judge shouldn't have seated that person. The most appalling part is that the jury specifically asked how the police came by the suspect, and the judge refused to tell them this was by searching the database. Compounded with the judge only allowing them to hear the estimate that the a-priori odds of a single person matching the profile being 1/million, this prevent the "fact finder" from knowing all the relevant facts.

Indirectly to blame are the people who put the jury in the box in the first place. The jury system requires so many contortions because of the complicated rules designed to make the jury more amenable to manipulation by the lawyers. Moreover, the jury is not accountable. Had they produced a written justification to their verdict (like any judge is required to do as a matter of course), it would have been easy to tell if they understood the statistics correctly or not.

An honest verdict not accompanied by its attendant reasoning cannot be distinguished from an arbitrary and capricious verdict. As such I think a verdict without justification should be presumed to be arbitrary and not given any deference.
5.5.2008 7:23pm
MikeM (mail):
I'm not up on the statistics, but what if the 338K individuals aren't independent? Suppose that 500 of the people in the sample are from the same family -- how many of the markers are likely to be the same?
5.5.2008 7:24pm
Curt Fischer:

But even taking account of this oversimplification, this strikes me as mistaken. 1-(1-1/1,000,000)^380,000 is the probability that, if the rapist is not in the database, a database search would still come up with someone (who would then be innocent, since by hypothesis the rapist is not in the database). It is not "the probability that the database search had hit upon an innocent person."


This is related to the point I was making about formula not being able to resolve guilt or innocence. 1-(1-1/1,000,000)^380,000 is the probability of finding a random person. Even if the guilty person is in the database, the chance of having an extra match come up to someone random remain one in 1-(1-1/1,000,000)^380,000.

I see now from Mr. Hundt's correction to my earlier comment, and from further thought, that the idea that the database search return exactly one match may be important. If multiple matches are returned, will juries be convinced that the test in inculpatory? Will investigators even introduce DNA evidence if more than one match is returned?

If not, the 1-(1-1/a)^n formula is inadequate as it yields the probability of at least one random match. The formula for exactly one match is n*[(1-1/a)^(n-1)]*(1/a).

I suppose this distinction is largely academic, unless someone can find an example of a case where a database search result was used even though there were multiple hits.
5.5.2008 7:25pm
Lior:
@Realist Liberal:
In terms of what juries should be told, I think the California Court of Appeals put it best when it said "the database search merely provides law enforcement with an investigative tool, not evidence of guilt." "The means by which a particular person comes to be suspected of a crime . . . is irrelevant to the issue to be decided at trial, i.e., that person's guilt or innocence, except insofar as it provides independent evidence of guilt or innocence." People v. Johnson (2006) 139 Cal. App. 4th 1135, 1150.


This is well said, but in that case the DNA evidence should have been left out. Since the DNA was only used for the database search, i.e. to make the person a suspect, it should not be used again as evidence as guilt. The point is that, if you find the suspect by searching the DNA database, the probability of the suspect matching the DNA sample is 1, not one in a million.
5.5.2008 7:30pm
Lior:
@Prof. Volokh:
I agree that n/a may be the expected number of matches in a database, even if the guilty man isn't in the database. But the article spoke of "the probability that the database search had hit upon an innocent person," and not of the expected number -- unsurprisingly, because the question in a criminal case is indeed "how likely is it that the defendant is an innocent person?," and in particular whether the likelihood rises to the level of reasonable doubt about guilt.


The LA Time article explains that the "formula" $n/a$ was proposed to the justice system by statisticians, and makes it clear that they realized that this is an approximation. This approximation, however, is easier to explain to a jury of laypeople than the exact calculation. As long as $n$ is sufficiently small compared with $a$, then $(1-1/a)^n$ is very close to $1-n/a$. As you can see, even here where $n/a$ is about one-third, the difference is not very large. For the purposes the number is going to be used, the difference between 25%, 27%, or 35% is not important. It's not like $n$ and $a$ are known to such great accuracy.

"About 30%" is the right answer here. The lower-order digits are not significant anyway.
5.5.2008 7:35pm
Hedberg:
I don't believe that I like convicting someone on this evidence at all. If the 1.1 million figure is correct, then there are somewhere in excess of 100 people in the United States, and maybe 10 or 15 in California, against whom the exact same evidence could be presented (alive and old enough in 1972, same match, etc). The only reason that Puckett stands accused and none of the others does is that his DNA happened to be in the database that was searched.

So, we need to evaluate the probability relationship between guilt and presence in the database. If there is no discernible reason that it is more likely that the rapist would be in the database than not in the database, the database should be considered to be a random sample of 338,000 drawn from the population at large. If that is the case, any of those 100 (or so) in the US or 10-15(or so) in California have the same likelihood of being identified. If this test were repeated many times, the actual rapist would only rarely be identified. The probability of any particular match actually being the rapist would be about 10% -- maybe much less.

It seems very likely to me that the probability that the rapist is in the actual database of 338,000 is greater than the probability that the actual rapist would be in a database of 338,000 chosen at random, but I don't know this to be true, and if it is true, I don't know how much more likely. Because it is the prosecution that is using the statistical argument that it is likely that Puckett is guilty based on the DNA evidence, it seems to me that it is the prosecution's responsibility to demonstrate this. Have they?
5.5.2008 7:36pm
Curt Fischer:
The biology as well as the statistics of DNA testing is also worth examining. The biology underlies calculation of the 1-in-1.1 million number. To make sure I was understanding what a "Random Match Probability" was, I googled the term.

One result was this one, where the discussion made it seem that calculation of this number relied on a number of biological assumptions. Chief among these are that the database of alleles are at Hardy-Weinberg equilibrium, and that all the alleles in the database are at linkage equilibrium with each other. As a previous commenter has already suggested, if many members from the same family are in the database, these assumptions will almost certainly be unwarranted. The assumptions could be violated in far less obvious ways as well.

Violation of either assumption would likely decrease the 1.1-million number and thus increase odds of a random match.
5.5.2008 7:37pm
Shaun Martin (mail):
Timely post, Eugene. I hope you saw the 9th Circuit case on nearly this precise issue that came out this morning (Brown v. Farwell):
5.5.2008 7:41pm
byomtov (mail):
This is well said, but in that case the DNA evidence should have been left out. Since the DNA was only used for the database search, i.e. to make the person a suspect, it should not be used again as evidence as guilt.

I don't understand. Isn't whatever makes someone a suspect evidence hat shouldbe considered.

Suppose a security camera records a crime and the police think they recognize the criminal and make an arrest. Shouldn't the tape be introduced, along with whatever other evidence there is?

What's the difference?
5.5.2008 7:42pm
Ryan Waxx (mail):

Had they produced a written justification to their verdict (like any judge is required to do as a matter of course), it would have been easy to tell if they understood the statistics correctly or not.


You want to have 12 separate people provide a single, written justification for their group decision? Not even the supreme court manages to do that.

Or do you want to have 12 different opinions for defense lawyers to scrutinize for retrial grounds? Or for newspaper articles to 'expose'? Ordinary folks do not have the necessary training to make their words sound damning when reduced to a sound-byte. Not even politicians can do this consistently.

If you want to ban juries, perhaps you should just suggest it forthrightly.
5.5.2008 7:46pm
Ryan Waxx (mail):

I don't understand. Isn't whatever makes someone a suspect evidence hat shouldbe considered.


You'd think so, but as you may know past convictions can make a person a suspect, but it cannot be considered by a jury.
5.5.2008 7:52pm
Christopher Hundt (www):
Shaun, good catch on that Ninth Circuit case. Here is a great quote that precisely identifies the problem:

Here, [DNA expert Renee] Romero initially testified that [defendant Troy Don Brown]'s DNA matched the DNA found in [rape victim Jane Doe]'s underwear, and that 1 in 3,000,000 people randomly selected from the population would also match the DNA found in Jane's underwear (random match probability). After the prosecutor pressed her to put this another way, Romero testified that there was a 99.99967 percent chance that the DNA found in Jane's underwear was from Troy's blood (source probability). This testimony was misleading, as it improperly conflated random match probability with source probability. In fact, the former testimony (1 in 3,000,000) is the probability of a match between an innocent person selected randomly from the population; this is not the same as the probability that Troy's DNA was the same as the DNA found in Jane's underwear, which would prove his guilt. Statistically, the probability of guilt given a DNA match is based on a complicated formula known as Bayes's Theorem, see id. at 170-71 n.2, and the 1 in 3,000,000 probability described by Romero is but one of the factors in this formula.

Although I would be hesitant to call Bayes' Theorem "complicated." :)
5.5.2008 7:59pm
Fub:
byomtov wrote at 5.5.2008 6:42pm:
Suppose a security camera records a crime and the police think they recognize the criminal and make an arrest. Shouldn't the tape be introduced, along with whatever other evidence there is?

What's the difference?
I think that if the court is allowed to say "The police identification of the person on the tape is accurate to one part in a million, so you are not allowed to know if anyone else looks like the person on the tape", then the analogy to this case would be closer.
5.5.2008 8:12pm
byomtov (mail):
You'd think so, but as you may know past convictions can make a person a suspect, but it cannot be considered by a jury.

Ryan,

OK. You got me. But shouldn't the video tape in my example, and some other bases for suspicion, be admitted as evidence?
5.5.2008 8:12pm
Lior:
Patterico is almost right, but not quite. He gets the point that, once you found the person using the DNA database, the odds of a DNA match are 100%. However, when he tries to frame the question as
2. What are the chances that any one person whose DNA matches a DNA profile is indeed the person who left the DNA from which the profile is taken?

He misses the fact that we are not talking about "any one person", persumably meaning a person chosen independently of the DNA profile.

The truth of the matter is that the probability of a database match being guilty is exactly equal to the probability that the perpetrator is a member of the database: given that we know already that there is a match in the database, the only question left is if the match is the perpetrator.

While these odds have nothing do to with the odds of a false positive, they are rarely available. I doubt that the police has done any studies trying to see what fraction of reported rapes in 1972 were committed by people who would be in the DNA database today.

Not knowing this probability, we take a common scientific fallback: we ask a different question, that being "does the signal we have distinguish the innocent from the guilty". Note that this is a different question from "what is the guilt probability given the data?".

The latter question we know how to answer: if the perpetrator was in the database, we'd see a signal 100% of the time. Even if the perpetrator was not in the database, we'd see a signal about 30% of the time.

A better way to look at the data is to raise yet another question: "how many living people were there in 1972 who matched the profile?". Say there were no more than 5000 (this depends on whether 1/10^6 is among men or among all people). Then no matter how we found him, our suspect is one of these 5000 [here it is important that we are assuming that "match/not match" is a binary metric with no errors]. Probably most of these people have never been to California, were not of the right age etc. We also know (because we found him in the database), that he had other convictions for similar crimes. Ignoring the prior convictions first, and assuming enough independence of the genetic markers among all humanity, probably you can say the suspect is one of tens with high degree of confidence. Now the question of reasonable doubt centers around: several tens of people match the markers and could have been the perpetrator. We found one of them and he has priors for rape. Is this enough to convict?
5.5.2008 8:17pm
anym_avey (mail):
OK. You got me. But shouldn't the video tape in my example, and some other bases for suspicion, be admitted as evidence?

Except the analogue of the situation you are calling for would be to examine the low-resolution camera output with a computer algorithm, compare it to the records of 338,000 high-resolution suspect photos previously examined by the same algorithm, and then claim a successful match at some very high probability. I should think that would change the ways in which the evidence can be handled and presented to a jury?
5.5.2008 8:44pm
Realist Liberal:
@ Lior~


This is well said, but in that case the DNA evidence should have been left out. Since the DNA was only used for the database search, i.e. to make the person a suspect, it should not be used again as evidence as guilt. The point is that, if you find the suspect by searching the DNA database, the probability of the suspect matching the DNA sample is 1, not one in a million.


You are correct in claiming that the database search should not be evidence of guilt. In fact it was not evidence of guilt, it was excluded as being misleading and would require an undue consumption of time under Evidence Code 352 (CA's equivalent to FRE 404). The second separate examination which uses a different statistical model (called random match probability). This model compares the number of occurrences of that DNA profile in the population at large, not within a set database (such as the CODIS database). In fact, the odds are not 1 because CODIS only identifies possible matches. I have also been involved in cases where CODIS identified a possible suspect but after using standard DNA comparison, the analyst excluded the suspect as being a potential donor (meaning that the suspect was not the person who left the DNA).
5.5.2008 8:46pm
Lior:
@Ryan Waxx: of course I was making an argument for eliminating the jury.

@Realist Liberal: We are all assuming that the "random match" probability is the probability for a match with a random person from the population. The problem is that the suspect here was not a random person from the population -- he was that member of the database that matched the sample.

As I said above, once you know there is a match in the database, the probability that the matching person is guilty is precisely equal to the probability that the guilty person was included the in the database (to see this condition on the two cases of the guilty person in and out of the database). Note that this has nothing to do with the random match probability.

Given that a full DNA test match is not independent of the match in the CODIS search, the "random match probability" number is misleading. What should be given (and is relevant) is the "probability of a random match conditioned on the CODIS data", that is: what are the odds that a random person off the street will match the full DNA profile, given that he person matches the part of the profile that's included in CODIS". Knowing that number would tell us a lot, given that we don't know the probability of a 1972 rapist to later committed another offence and end up in the database today.
5.5.2008 9:00pm
Lior:
In the my above post, "condition" is used as a verb in the imperative mood, not a noun. This may be confusing.
5.5.2008 9:03pm
Christopher Hundt (www):
@Lior: although I'm not sure I agree with you as to what the precise quantity of interest is (there are still a few steps to get from "odds that a random person off the street will match the full DNA profile" to "odds that the defendant is guilty"), but you are certainly correct that by itself "random match probability" tells you almost nothing useful to a juror about the probability of guilt, and in fact can easily (although mistakenly) be interpreted to tell you more than it does.
5.5.2008 9:32pm
Crackmonkeyjr (www):
I recently came across a very similar problem. A woman came into my office having been accused of having someone else forge her name on an attendance sheet (it wound up being a ethics violation and she was risking losing a shot at her professional license). The attendance sheet contained names and the last 4 digits of each person's social security number. there where three relevant names on the sheet, person A with a fairly common Hispanic name, my potential client and person B with the same fairly common Hispanic name and the last 4 digits of my clients SS#. Somehow this was interpreted as my client having someone forge her name (I don't quite understand that).

My theory was that there just happened to be 2 people with the same name and one of them happened to have the same last 4 digits in her SS# as my client. No one bought it, they claimed that the 1 in 10000 chance of having the same last 4 digits was too improbable (especially when combined with the chance of two people having the same name). In fact, the probability wasn't outrageous. The chance of two people in a room of 30 having the same last 4 digits is about 1 in 20. I'm not sure what the chance of two people having the same fairly common name is, but even if it is 1 in 100, that's only 1 in 2,000 classes. The chance of this happening to my client is slim, but the chance of it happening to someone at some class is probably not all that high. My guess is if I ran the numbers it would almost certainly happen to someone at some point, and whomever that is would be the one who walked into my office.

People really need to learn that even if something is improbable, if you have enough iterations, it will almost certainly happen.
5.5.2008 10:16pm
Patterico (mail) (www):
Patterico is almost right, but not quite. He gets the point that, once you found the person using the DNA database, the odds of a DNA match are 100%. However, when he tries to frame the question as

2. What are the chances that any one person whose DNA matches a DNA profile is indeed the person who left the DNA from which the profile is taken?

He misses the fact that we are not talking about "any one person", persumably meaning a person chosen independently of the DNA profile. The truth of the matter is that the probability of a database match being guilty is exactly equal to the probability that the perpetrator is a member of the database: given that we know already that there is a match in the database, the only question left is if the match is the perpetrator.


When I said "any one person whose DNA matches a DNA profile" I wasn't referring to a database match but rather a match between a suspect's DNA and the DNA from the reference sample from the crime scene.
5.5.2008 10:32pm
Sean O'Hara (mail) (www):

I've come round to the view that an understanding of basic statistics is so fundamental to operating in modern society that it really ought to be a major, required component of high school math courses, and I'd even toss trigonometry to make room for it, if necessary.


The problem runs far deeper than that -- the human brain simply isn't wired to comprehend probabilities. Even statisticians have trouble wrapping their minds around things like the Monty Hall Problem, because it runs counter to our hard-wired instincts.
5.5.2008 10:41pm
jccamp:
From reading the entire article about the trial, the prosecution did get additional evidence introduced. Although there's no way to tell what weight the additional evidence was given by the jury, this might be the way the prosecutor framed his argument:
DNA tells us that about 138 (or 275 if the sample was already adjusted for gender) people in the U S could have committed this crime. Now, how many of those people were in San Francisco (?) where this murder occurred at the same time as the murder? The def. was. How many people were in the same neighborhood as the victim's place of work, and could have followed her home? The def. was. How many had committed other rapes where nearly identical injuries to those on the murdered woman were left on the victims? The def. did, not just once but 3 times. The man who murdered the victim told a witness "It's OK. We're making love." What did the def. tell his last victim? "It's OK. I just want to make love."

I believe that much was made of the similar injuries (scratches to the neck) of the dead woman and the def.'s surviving victims. The prosecutor called these "his signature." Without seeing the actual descriptions or photos, this is hard to gauge, but a judge let it in.

There was some argument about the def.'s similarity to the witness's description. The witness died before trial and the police never showed her the def.'s photo.

All that was introduced at trial. I'm not sure it's "beyond a reasonable doubt", but it is more compelling that just the naked DNA.
5.5.2008 10:52pm
Patterico (mail) (www):
Sorry, I mean "evidence sample" from the crime scene. "Reference sample" has a different meaning.
5.5.2008 10:55pm
jccamp:
"...a match between a suspect's DNA and the DNA from the reference sample from the crime scene."

If I understand the thrust of Bayes' Theorem, at least when referring to statistical values of, say, DNA at a crime scene matching a specific individual, this suggests that the DNA occurrence does not occur in a vacuum, and that the statistical value is influenced by the statistical value of other variables, such as, what are the odds that the suspect was...whatever (alive, in the area of the crime at the time of the crime, had an alibi, etc). I am probably grossly oversimplifying Bayes', but this does conform with DNA evidence being only one part of a larger picture painted by the totality of evidence - which is as it should be.

Again, I'm probably oversimplifying, but if the DNA suggests that only one person in 250 million would be a match, this number is not absolute. If the suspect had a photo of himself posing with the Pope in Rome at the time and date of the murder in San Francisco, I'd say the value of the DNA is around...oh, maybe zero (assuming the veracity of the photo and an accurate time of the crime).

In looking up Bayes' Theorem, I did see formulae descriptive of multiple variables, but in trying to follow the logic, I got a headache, and decided instead to go with the "You don't believe me? Ask the Pope." logic.
5.5.2008 11:23pm
Lior:
@Patterico:
When I said "any one person whose DNA matches a DNA profile" I wasn't referring to a database match but rather a match between a suspect's DNA and the DNA from the [evidence] sample from the crime scene.


It was exactly this assertion that I was criticizing. In this case, referring to the person as a "suspect" omits the (crucial) information that the suspect already matches the DNA sample. You should stop thinking of him as "Mister X" (a fixed person) and start thinking of him as "the person in the database that matched the profile". These two are the same person, of course, but the second point of view will clarify the probabilistic analysis you need to do.

Let's think of it another way. Say we tested the DNA of every human being on earth, looking for the person who committed the rape. At 1/million we expect about 6000 matches. Are you staying that each of these 6000 matches is almost surely guilty because the probability of "any person" matching the sample is 1/million? Of course not. Some other information will be required to decide who among these 6000 is guilty. Beyond this, it is possible that the criminal has died, and isn't available to be tested today. So we aren't even sure that the criminal is actually one of these 6000 positive matches. In other words, since we essentially know there are many people alive today who match the profile, just finding a person that matches the profile doesn't tell us anything we didn't know yet.

Now in the case at hand, only about 300,000 people were tested, so only one match was made. As you say on your blog, this means that (before developing other evidence) our level of confidence in this person's guilt is even lower than our level of confidence that some person living today is the criminal. In fact, since there was an a-priori 30% probability of a positive match even if everyone in the database is innocent, the fact of finding the match in the database is not very surprising and by itself tells us little — since the police could have said from the start "let's only investigate that person in the database that matches the profile" and be 30% certain of investigating some-one, you need to be pertty sure that the guilty person is actually in the database; otherwise you are likely investigating an innocent man.

Thus, the genes of "Mister X" were (in some sense) randomly chosen at his conception. The genes of "the person in the database that matches the sample", on the other hand, are literally fixed. Since our suspect is the latter person, the DNA evidence tell us little about him.

Carrying this analysis a little further, you get the result that the probability that the "person in the database that matched" is guilty is exactly equal to the a-priori probability that the guilty person was in the database (if the guilty person was in the database, then we found him; if not then the person at hand is innocent).

At the end, the prosecution seems to have had a smattering of other evidence beyond the DNA. According to the LA Times, however, this evidence significantly included claims about past rapes by the "suspect". That is very problematic because that was already to be expected given the database that was searched. Consider the following scenario: in order to solve a rape, go over the database of sexual offender DNA samples and prosecute the first match you find. This person will be known to match the DNA and have a history of sexual assault, which is almost enough for conviction. Since a bit of police digging will find dirt on anyone, you are probably good to go: given what the judge allowed the jury to hear, this strategy seems likely to succeed in closing the case with a conviction — with approximately 30% chance of working even if the perpetrator isn't in the database (assuming 1/million random match probability and a database of 300,000 people).

What do I learn from this? That "beyond a reasonable doubt" does not mean in practice what it should mean in theory.
5.6.2008 12:25am
Patterico (mail) (www):
It was exactly this assertion that I was criticizing. In this case, referring to the person as a "suspect" omits the (crucial) information that the suspect already matches the DNA sample.

I don't see how.

Let's think of it another way. Say we tested the DNA of every human being on earth, looking for the person who committed the rape. At 1/million we expect about 6000 matches. Are you staying that each of these 6000 matches is almost surely guilty because the probability of "any person" matching the sample is 1/million? Of course not.

Indeed. Of course I am not saying that.

You really have lost me entirely.
5.6.2008 12:47am
Realist Liberal:
Lior~

I'm sorry for not adequately addressing your previous point. I don't think I quite understood but I think I do now. It seems that your complaint (I'm sorry if that is not the correct word) is that the use of RMP after a CODIS hit is misleading because in essence we already had a suspect (namely John Puckett). What do you make of the fact that RMP is always used even when the police/ prosecution already have a suspect? In fact, that is the only way to do the comparison. RMP never truly compares one person's DNA to the entire population for the reason that no database includes the entire world's DNA because so many of us have never given a DNA sample. It seems to me (and again sorry if I am still misunderstanding you) that you are arguing that the CODIS hit will always result in the suspect being a potential donor (the term that DNA analysts use). This is not accurate, there are times when the second analysis either results in such a weak match (for example a 1:1,000) or not a match at all.

You are completely correct in the argument that the essential flip side of the 1:1.1 million statistic is that one would expect that there are about 6,000 people (assuming my horrible math skills are correct and estimate of the world's population is accurate). That is because of the degradation of the DNA in our case (for what it is worth the talk at the office and of several DA's in the area was that our DNA was the exception not the rule and that even old DNA usually has much less degradation than ours). That is the reason that we had to prove more to the jury. One question to ask is, of those 6,000 people, how many are white males who were alive in 1972 and in San Francisco. Further, how many of those 6,000 are rapists who used knives to the throat of the victim. In addition, Puckett lied in several ways to the inspectors (I know the obvious counter argument was that he just forgot/ got mixed up. The problem with that is that he actually admitted that he lied in at least a few ways to the inspectors.) Our office has tried 3 cold hit cases so far. Puckett had the "worst" DNA match by far. The other two cases were 1:49 million Caucasian males and 1:97 million African American males. Still not the 1 in a quadrillion of fresh DNA but certainly much better.
5.6.2008 12:59am
Laura S.:
IMO, this is potentially a growing problem. Consider in the UK where they've effectively decided to create a DNA database of everyone in the country. It really is a serious issue that a search against such a database will nearly always lead to hit by mere chance alone.

This is far cry from targeted testing. If you test ONE suspect and get a match that's a very significant result. Conversely the database match is essentially meaningless.

Second, we should be concerned about learning from this case. This man is in the database because he's a prior rapist. Thus, the fact that he is a prior rapist is a piece of circular information.

Lastly, there is another element of probability here. The real calculation we want to is: what is the chance that defendant is guilty given the evidence. Interestingly, this involves the prior odds of being guilty--i.e., the chance that any man is a rapist.

There is a great discussion of this on wikipedia http://en.wikipedia.org/wiki/Prosecutor%27s_fallacy

This should be taught in every crim pro class.
5.6.2008 1:53am
Lior:
Realist Liberal:

Apologies if I'm misusing the terminology.

1. What I am saying is that a CODIS hit increases the chances of a person being the donor, compared to a random person in the general population. This makes the "RMP" number misleading in a case where you reached the suspect through a CODIS search. But this is a really minor point. The important point is the next one.

2. Since even if the real perpetrator was not in the database, there was still (by your figures) a 30% chance of finding a database hit at RMP of 1/million, the DNA didn't tell a lot you didn't know before: before starting the search, you were 30% sure to find a rapist in the database with a 1/million RMP match even if the real culprit was not in the database (compared to 100% sure if the culprit was in the database). In other words, the DNA evidence should have been though of as "what led you to the suspect" and not "evidence the suspect is guilty" (unlike, say, presence in SF).

The fallacy is the following: knowing nothing about John Puckett's DNA, before the search you could say "John Puckett has a 1/million chance of matching the DNA as well as he did". But you were not specifically searching for John Puckett. It's not the case that you were either going charge him or no-one. Rather, you were pretty (30%) sure that someone in the database would be a match even if they were all innocent. Once you found that match, you can't go back and say: hey -- it's pretty unlikely this particular person would match.

It's the same as with the lottery: any individual ticket has a small chance of winning the lottery, but somebody has to win. Does winning the lottery make the winning ticket any different than any other ticket? Once a person has won the lottery you can't go back and say: a-ha, since they won they must have had particularly good chances of winning from the start.

(Regarding prior convictions: as I mentioned in my reply to Patterico, and also discussed by Laura S., if you search a database of rapists [perhaps CODIS is more general, I don't know] you should not be surprised that the matching person is a rapist, so this comes under the same heading: if CODIS consists of sexual offenders, then you were sure of finding a sexual offender matching the DNA profile).

3. This, of course, says nothing about the rest of your evidence: lying to police, presence in SF, etc. But you should have presented the evidence to the jury fairly by saying:

Assume that everyone in the CODIS database is innocent. It's true that we'd then have had a 30% chance of finding a in the database who ``won the lottery'' by being a 1/million RMP match to the degraded DNA sample we had. So let's assume John Puckett is this innocent hit and see where it takes us. Do you think it's likely that the a random person from the database would be of the right age? Would have lived in SF at the time? Would have lied to police? I don't think so.


The point is that since you were pretty certain from the start that there would be a ``lottery winner'' in your search, the fact that a particular person has ``won the lottery'' does not make them any different from all the other participants. It's just that somebody had to win. Here by ``winning the lottery'' I mean being CODIS hit followed by a DNA match at RMP of 1/million. I know that I'm repeating myself, but this is a point worth repeating: you weren't shining the spotlight on John Puckett. You were going to shine the spotlight on whoever would ``win the lottery''. While John Puckett had an (a-priori) 1/million chance of winning, the chance that somebody would be a winner was 30%. Thus the DNA match you found was not really a great coincidence. It's the other evidence that was the significant coincidence that can convince you the guy is guilty.

4. A final aside: note that race is a genetic trait. I don't know how much the markers used for DNA matching correlate with race, but I would not be surprised if they do, at which point you should be careful about treating race and DNA match ``distance'' (RMP) as cumulative indicators.
5.6.2008 4:17am
Econometrician:
Hmm..It seems California lawyers (and judges) have short memory:
The California Supreme Court decided a similar question 40 years ago in People v. Collins: A classical example for a real life application of introductory probability.
The whole decision can be e.g. found at

link

In particular I would like to draw your attention to the appendix: The math is cute and quite similar to the one used in the comments here...
5.6.2008 9:17am
David Chesler (mail) (www):
Second, we should be concerned about learning from this case. This man is in the database because he's a prior rapist. Thus, the fact that he is a prior rapist is a piece of circular information.

This is beginning to sound like the philosophers and the sheep, but all we know is that he was convicted of rape.

(A number of philosophers from Oxford decide to take a vacation to Scotland. They travel by train. As the train enters Scotland they look out and see a black sheep. One remarks "Oh, I didn't know the sheep in Scotland are black." A second corrects him and says "You still don't know that, you only know that there is at least one black sheep in Scotland." The third says "No, we only know that at least one side of one sheep in Scotland is black." And the fourth says "No, all we know is that one side of one sheep in Scotland is black at least some of the time.")
5.6.2008 11:25am
byomtov (mail):
I think that if the court is allowed to say "The police identification of the person on the tape is accurate to one part in a million, so you are not allowed to know if anyone else looks like the person on the tape", then the analogy to this case would be closer.

On reflection, the difficulty with my analogy is that there is a difference between presenting the videotape and presenting the conclusion the police drew from it.

In the DNA case the size of the database is essentially irrelevant to the question of guilt. What DNA tells us is that there are 6000 people, worldwide, who might be guilty. That's true whether the database has 330,000 records or 330,000,000.

I think a lot of the confusion (including maybe mine) arises from failure to be clear about the difference between finding that someone's DNA matched the evidence vs finding that it did not.
5.6.2008 12:06pm
htom (mail):
In this case, it appears that all of the other evidence was lost, destroyed, or not gathered ... by the prosecution. So all there was was the cold case DNA hit. Now if the blood on the parking ticket had been tested that would have been more useful, to both prosecution and the defense, but Baker was not available to be tried. Why he couldn't be mentioned I do not understand. I would think that pointing to the case of Raymond Easton would be enough to get Puckett a new trial, at least.

---
Stealing an entire post from Tillers on Evidence

Jennifer Mnookin, "Fingerprint Evidence in an Age of DNA Profiling," 67 Brooklyn L. Rev. 13, 49-50 (2001)(footnotes omitted):

[I]n a 1999 case in England ... Raymond Easton was charged with burglary after authorities made a "cold hit" with his DNA in a DNA database. His DNA matched the crime scene DNA at six loci. Because there was only a one in thirty-seven million chance that a randomly selected person's DNA would match, Raymond Easton was charged with burgling a house 200 miles from where he lived. However, after Easton, who had advanced Parkinson's disease and was unable even to drive a car, offered an alibi for the night in question, the DNA was eventually tested at four more loci. This more sophisticated test showed there was no DNA match after all. All charges were dropped.
5.6.2008 12:32pm
Karl Lembke (mail) (www):
I've commented at length about the statistics involved here, here, and here.

If the DNA "match" is the only evidence in the case, it's not enough. If other evidence is compelling, the DNA "match" may constitute probable cause to search for that other evidence, but the existence of 6000 other people in the world who would also match the sample should constitute reasonable doubt. (IMO, YMMV)

But then, I know something about probability theory.
5.6.2008 1:31pm
FWB (mail):
Science lies through the manipulation of the scientist. All in all science is opinion built upon the opinions of others (sometimes called theories). Everything in all sciences (in fact all knowledge) is nothing more than human opinion.

The only valid answer is "two or more witnesses" to the crime.

Laura S: An Dubya along with the current Congress just signed a bill into law (April 24) making a DNA database a priority over the next 6 mos albeit this database is for newborns.
5.6.2008 2:36pm
Eugene Volokh (www):
FWB: I'm puzzled -- what exactly do you mean by "'two or more witnesses' to the crime"?
5.6.2008 5:07pm
Lior:
The "two or more witnesses" is a biblical rule (Deuteronomy 19:15; translated here)
One witness shall not rise up against a man for any iniquity, or for any sin, in any sin that he sinneth; at the mouth of two witnesses, or at the mouth of three witnesses, shall a matter be established.
5.6.2008 6:20pm
Karl Lembke (mail) (www):
FWB wrote:
Everything in all sciences (in fact all knowledge) is nothing more than human opinion.

I guess that's your opinion.
5.7.2008 12:14am
byomtov (mail):
Isn't the proper thing to tell the jury simply the fact that there are 6000 people whose DNA matches the criminal's? Leave out the size of the database - it soesn't matter and when you start citing numbers like 1 in a million chance it overstates the strength of the evidence.

Is there a problem of some sort with even mentioning the database? After all, it's a felon database. Following up on Ryan Waxx's point, is telling the jury the defendant was even in a felon database prejudicial?
5.7.2008 10:50am
Lior:
Is there a problem of some sort with even mentioning the database? After all, it's a felon database. Following up on Ryan Waxx's point, is telling the jury the defendant was even in a felon database prejudicial?


If you ran a general investigation, and your suspect happens to be a felon, this would be an unexpected "coincidence", something that tends to show he is guilty. On the other hand, if you only investigate felons then the suspect being a felon does not tell you anything about him you didn't know before starting to investigate, so it cannot be evidence regarding his guilt. Same goes for the DNA match in this case.
5.7.2008 12:50pm
byomtov (mail):
Lior,

I understand your point. But I can't tell if you agree or disagree with mine.

Telling the jury the defendant has a prior criminal record is not allowed. But telling them he was in a felon database is the same thing. Why should that be allowed?
5.7.2008 1:25pm
Lior:
If you don't tell the jury that the person was a felon, then why does the LA Times story say that the guy's MO in previous rapes was presented to the jury? (this is also confirmed by in Realist Liberal's comments, who clearly knows the details of this case). Clearly that was not thought to be prejudicial.

The obvious problem is that you can't simply say "this man was a pretty good (1/million odds) match to the DNA we have" and leave it at that. Since you knew from the start (30% sure even if the rapist was not in the database) that someone in the database would be such a close match to the DNA, the fact that the defendant was the person that matched is not news. So perhaps telling the jury he was in a felon database is bad for him, but not telling the jury how the DNA match was made is even worse. Perhaps he should have opted for a bench trial.

The real problem (as I said in an earlier post) is that you are willing to admit a "trier of fact" who is officially considered liable to be prejudiced by evidence. That Americans (and others) can live with this cognitive dissonance boggles the mind. If you don't believe a person can correctly weigh all the evidence presented then this person should not be part of a court of law. If your are having problems where your courts are giving some kinds of evidence the wrong weight, then the solution should be to fix the courts and not exclude the evidence.
5.7.2008 4:20pm