Applied statistics/Tutorials: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Nick Gardner
No edit summary
imported>Meg Taylor
No edit summary
 
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{subpages}}
{{subpages}}
==Rules of chance==
==Rules of chance==
===The addition rule===
===The addition rule===
For two mutually exclusive events, A and B,<br>
For two mutually exclusive events, A and B,<br>
Line 13: Line 12:


===Bayes' theorem===
===Bayes' theorem===
The probability that event A will occur, given that event B has occurred is equal to the probability that B will occur, given that A has occurred, mutiplied by the  probability that A will occur divided by the probability that B will occur,<br>
The probability that event A will occur, given that event B has occurred is equal to the probability that B will occur, given that A has occurred, multiplied by the  probability that A will occur divided by the probability that B will occur,<br>
:::P(A/B)&nbsp;=&nbsp;P(B/A)&nbsp;x&nbsp;P(A)/P(B).
:::P(A/B)&nbsp;=&nbsp;P(B/A)&nbsp;x&nbsp;P(A)/P(B).


Line 27: Line 26:
====Proof====
====Proof====
'''Step 1''': the following chain of argument proves that the number of different pairs in a group of 23 people is 253<br><small>
'''Step 1''': the following chain of argument proves that the number of different pairs in a group of 23 people is 253<br><small>
(a)&nbsp; the number of pairs that there would be if each of 23 people paired with one of 23 people is 23&bsp;x&bsp;23&bsp;=&bsp;529;
(a)&nbsp; the number of pairs that there would be if each of 23 people paired with one of 23 people is 23&nbsp;x&nbsp;23&nbsp;=&nbsp;529;
&nbsp;&nbsp;(b)&nbsp; deducting the 23 cases in which a person would be paired with himself leaves 529&nbsp;-&nbsp;23&nbsp;&nbsp;;506; and,
&nbsp;&nbsp;(b)&nbsp; deducting the 23 cases in which a person would be paired with himself leaves 529&nbsp;-&nbsp;23&nbsp;&nbsp;;506; and,
&nbsp;&nbsp;(c)&nbsp; deducting the duplicates that occur if a pair such as AB were counted as well as the pair BA leaves 506/2&nbsp;=&nbsp;253<br></small>
&nbsp;&nbsp;(c)&nbsp; deducting the duplicates that occur if a pair such as AB were counted as well as the pair BA leaves 506/2&nbsp;=&nbsp;253<br></small>
'''Step 2''': the following argument proves that probability that the two people making up one particular pair '''do not''' have the same birthday is 99.726 per cent<br><small>
'''Step 2''': the following argument proves that probability that the two people making up one particular pair '''do not''' have the same birthday is 99.726 per cent<br><small>
(a)&nbsp;of the 365 days in a year there are 364 days that are not A's birthday;&nbsp;(b)&nbsp; there is a one in 365 chance that  that B's birthday falls on any one of  those days;&nbsp;&nbsp;(c)&nbsp;therefore the probability that B's birthday falls on a day that is not A's birthday is 364/365&nbsp;=&nbsp;0.99726 or 99.726 per cent;&nbsp;&nbsp;.</small>
(a)&nbsp;of the 365 days in a year there are 364 days that are not A's birthday;&nbsp;(b)&nbsp; there is a one in 365 chance that B's birthday falls on any one of  those days;&nbsp;&nbsp;(c)&nbsp;therefore the probability that B's birthday falls on a day that is not A's birthday is 364/365&nbsp;=&nbsp;0.99726 or 99.726 per cent;&nbsp;&nbsp;<br></small>
'''Step 3''': the following argument proves that the probability that  none of all the 253 different pairs have the same birthdays is 49.94 per cent<br><small>
(a)&nbsp;since the probability that one particular pair do not have the same birthday is 99.726 per cent, the probability that neither of two selected pairs have the same birthday must be .99726&nbsp;x&nbsp;.99726&nbsp;or (0.99726)<sup>2</sup>, and that for none of three selected pairs it is (0.99726)<sup>3</sup>... and so on&nbsp;&nbsp;(b)&nbsp; so, the probability that none of  the 253 possible pairs of step 1, have a birthday in common is  (0.99726)<sup>253</sup>&nbsp;= 0.4994&nbsp;or 49.94 per cent.<br></small>
'''Step 4'''; since the probability that none of the 256 pairs has the same birthday is 0.4994, the probability that one of the pairs does have the same birthday must be 1&nbsp;-&nbsp;0.4995&nbsp;=&nbsp;0.5006&nbsp;or&nbsp;50.06 per cent.


===The false positive fallacy===
===The false positive fallacy===
====The fallacy====
====The fallacy====
Students at the Harvard Medical School estimated that if a test of a disease that has a prevalence rate of 1 in 1000 has a false positive rate of 5%, there is a 95 per cent probability that a person who has been given a positive result actually has the disease.<br>
That if a test of a disease that has a prevalence rate of 1 in 1000 has a false positive rate of 5%, there is a 95 per cent probability that a person who has been given a positive result actually has the disease.<br>
 
====The truth====
====The truth====
The true probability is 2 per cent.
The true probability is 2 per cent.
====Proof====
====Proof====
<small>
 
Let A denote the event of having the disease and, B the event of having been tested positive (for the purpose of applying Bayes'theorem),<br>  
Let A denote the event of having the disease and, B the event of having been tested positive (for the purpose of applying Bayes' theorem),<br>  
Then P(B/A) which is the probability of having been tested positive when having the disease, can be taken as equal to 1;<br>
Then P(B/A) which is the probability of having been tested positive when having the disease, can be taken as equal to 1;<br>
And  P(A) is the probability of having the disease, which with a prevalence of 1 in 1000 must be equal to 1/1000<<br>
And  P(A) is the probability of having the disease, which with a prevalence of 1 in 1000 must be equal to 1/1000<<br>
And  P(B) is the probability of being tested positive, which can be arrived at by 3 steps:<br>
And  P(B) is the probability of being tested positive, which can be arrived at by 3 steps:<br>
Step 1 is to observe that  since the prevalence of the disease is 1 in 1000, 999 persons out of every 1000 are healthy.<br>
'''Step 1''' is to observe that  since the prevalence of the disease is 1 in 1000, 999 persons out of every 1000 are healthy.<br>
Step 2 is to recall that for each healthy person the probability of being tested positive is 5% or 1 in 20.<br>
'''Step 2''' is to recall that for each healthy person the probability of being tested positive is 5% or 1 in 20.<br>
Step 3 is to apply the multiplication rule  and get the answer:<br>
'''Step 3''' is to apply the multiplication rule  and get the answer:<br>
::P(B) = 999/1000 multiplied by 1/20 or, near enough 1/20.<br>
::P(B) = 999/1000 multiplied by 1/20 or, near enough 1/20.<br>
So applying Bayes' theorem, the probability of having the disease, given that you have been tested positive is given by  
So applying Bayes' theorem, the probability of having the disease, given that you have been tested positive is given by  
::: P(A/B)&nbsp;=&nbsp;P(B/A)&nbsp;x&nbsp;P(A)/P(B), &nbsp;or:
::: P(A/B)&nbsp;=&nbsp;P(B/A)&nbsp;x&nbsp;P(A)/P(B), &nbsp;or:
::::&nbsp;&nbsp;&nbsp;=&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1&nbsp;&nbsp;&nbsp;&nbsp;x&nbsp;&nbsp;(1/1000)/(1/20) &nbsp;&nbsp;- which is 0.02, or 2%.
::::&nbsp;&nbsp;&nbsp;=&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1&nbsp;&nbsp;&nbsp;&nbsp;x&nbsp;&nbsp;(1/1000)/(1/20) &nbsp;&nbsp;- which is 0.02, or 2%.
</small>


===The prosecutor's fallacy===
===The prosecutor's fallacy===
Line 57: Line 59:
The  fact that the accused's DNA matched that of the sperm found on the victim in a test which has  a one in a thousand chance of giving a false positive  result means that there is only a one in a thousand chance of the accused's innocence.  
The  fact that the accused's DNA matched that of the sperm found on the victim in a test which has  a one in a thousand chance of giving a false positive  result means that there is only a one in a thousand chance of the accused's innocence.  
====The truth====
====The truth====
In fact it means nothing of the sort. One in a thousand of the rest of the population would  give the same result, so if the accused is one of a population of a million, the test would have indicated a one in a thousand chance of guilt, not innocence. (This is not to argue that DNA evidence cannot be conclusive: it would  be if it were also established that the crime must have been committed by, say,  one out of ten suspects.}
In fact it means nothing of the sort. One in a thousand of the rest of the population would  give the same result, so if the accused is one of half a million people who could have committed the crime, there would be 500 people (in addition to the real rapist) giving the same result. So, in the absence of other evidence, the positive result establishes only a one in 500 probability of the accused's guilt. (DNA evidence can, of course, provide valid proof of guilt when it is used to establish who, among a restricted group of suspects, had committed the crime).

Latest revision as of 22:00, 25 October 2013

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Tutorials [?]
 
Tutorials relating to the topic of Applied statistics.

Rules of chance

The addition rule

For two mutually exclusive events, A and B,
the probability that either A or B will occur is equal to the probability that A will occur plus the probability that B will occur,

P(A or B) = P(A) + P(B).

The multiplication rule

For two independent (unrelated) events, A and B,
the probability that A and B will both occur is equal to the probability that A will occur multiplied by the probability that B will occur,

P(A and B) = P(A) x P(B)

Bayes' theorem

The probability that event A will occur, given that event B has occurred is equal to the probability that B will occur, given that A has occurred, multiplied by the probability that A will occur divided by the probability that B will occur,

P(A/B) = P(B/A) x P(A)/P(B).

Common fallacies

The double birthday fallacy

The fallacy

That it is very unlikely that 2 people in a group of 24 have the same birthday.

The truth

That there is a better than 50 percent probability that 2 people in any group of 23 or more will have the same birthday.

Proof

Step 1: the following chain of argument proves that the number of different pairs in a group of 23 people is 253
(a)  the number of pairs that there would be if each of 23 people paired with one of 23 people is 23 x 23 = 529;   (b)  deducting the 23 cases in which a person would be paired with himself leaves 529 - 23  ;506; and,   (c)  deducting the duplicates that occur if a pair such as AB were counted as well as the pair BA leaves 506/2 = 253
Step 2: the following argument proves that probability that the two people making up one particular pair do not have the same birthday is 99.726 per cent
(a) of the 365 days in a year there are 364 days that are not A's birthday; (b)  there is a one in 365 chance that B's birthday falls on any one of those days;  (c) therefore the probability that B's birthday falls on a day that is not A's birthday is 364/365 = 0.99726 or 99.726 per cent;  
Step 3: the following argument proves that the probability that none of all the 253 different pairs have the same birthdays is 49.94 per cent
(a) since the probability that one particular pair do not have the same birthday is 99.726 per cent, the probability that neither of two selected pairs have the same birthday must be .99726 x .99726 or (0.99726)2, and that for none of three selected pairs it is (0.99726)3... and so on  (b)  so, the probability that none of the 253 possible pairs of step 1, have a birthday in common is (0.99726)253 = 0.4994 or 49.94 per cent.
Step 4; since the probability that none of the 256 pairs has the same birthday is 0.4994, the probability that one of the pairs does have the same birthday must be 1 - 0.4995 = 0.5006 or 50.06 per cent.

The false positive fallacy

The fallacy

That if a test of a disease that has a prevalence rate of 1 in 1000 has a false positive rate of 5%, there is a 95 per cent probability that a person who has been given a positive result actually has the disease.

The truth

The true probability is 2 per cent.

Proof

Let A denote the event of having the disease and, B the event of having been tested positive (for the purpose of applying Bayes' theorem),
Then P(B/A) which is the probability of having been tested positive when having the disease, can be taken as equal to 1;
And P(A) is the probability of having the disease, which with a prevalence of 1 in 1000 must be equal to 1/1000<
And P(B) is the probability of being tested positive, which can be arrived at by 3 steps:
Step 1 is to observe that since the prevalence of the disease is 1 in 1000, 999 persons out of every 1000 are healthy.
Step 2 is to recall that for each healthy person the probability of being tested positive is 5% or 1 in 20.
Step 3 is to apply the multiplication rule and get the answer:

P(B) = 999/1000 multiplied by 1/20 or, near enough 1/20.

So applying Bayes' theorem, the probability of having the disease, given that you have been tested positive is given by

P(A/B) = P(B/A) x P(A)/P(B),  or:
   =     1    x  (1/1000)/(1/20)   - which is 0.02, or 2%.

The prosecutor's fallacy

The fallacy (an example)

The fact that the accused's DNA matched that of the sperm found on the victim in a test which has a one in a thousand chance of giving a false positive result means that there is only a one in a thousand chance of the accused's innocence.

The truth

In fact it means nothing of the sort. One in a thousand of the rest of the population would give the same result, so if the accused is one of half a million people who could have committed the crime, there would be 500 people (in addition to the real rapist) giving the same result. So, in the absence of other evidence, the positive result establishes only a one in 500 probability of the accused's guilt. (DNA evidence can, of course, provide valid proof of guilt when it is used to establish who, among a restricted group of suspects, had committed the crime).