Probability: Introduction/

A lot of mathematical reasoning is concerned with trying to predict outcomes of events. If I create a sphere of radius 20cm, how large will the surface area be? If I put together 20ml of a 20% acid solution and 30ml of a 50% acid solution, what kind of solution do I get? What will the speed of a fallin apple be after 0.1 seconds? If the roulette ball has an initial speed of 3 meter per seconds, and a given initial direction, and the roulette wheel is spinning at a certain speed, where will the ball end? If the interest rate increases by 0.05, and productivity increases by 3%, how will it affect unemployment rate? Unfortunately, in many situations, theories from the sciences, social sciences, or economics are not strong enough to predict these outcomes. Or, the theories may be accurate, but the data available is just not sufficient, which is the case in the roulette example. In those cases, the outcomes seem to us random, or unpredictable.

This is where Probability Theory and Statistics enter. There is usually a better prediction than "I don't know what will be the outcome, it is random anyway." Although you cannot predict the outcome in these cases, you may be able to describe the likelihood of the different possible outcomes. And this may tell you what to expect. Of course, expectations are not always what you get, but we will see that in the long run it is.

Probability Theory uses some terminology which we have to get used to. An experiment has some unknown outcome. Simple examples are rolling a die, throwing a coin, or selecting a card from a shuffled 52-card deck. But the weather tomorrow, or the outcome of elections, are also experiments. We concentrate on one feature of the experiment, like which number the die shows, whether the coin shows head or tail, or what card is chosen, thereby neglecting other possible features (as, for instance, how long the die rolls. Therefore when describing an experiment you should always desribe the procedure and what you are looking at. Very often there are only finitely many possible outcomes. These outcomes are called the simple events. We have six simple events in the die example, two simple events ("head" and "tail") for coins, and 52 simple events for the cards. General (nonsimple) events are groups of simple events. Examples are all aces (combining four simple events) or all diamonds (combining 13 simple events), but more complicated groups are also possible. A formal mathematician would say that every subset of the set of simple events forms an event.

Next we want to assign a value P(A) to each event A, the probability or likelihood that the outcome of the experiment belongs to A. A probability of 0 means that the event can not occur, a probability of 1 means that the event will certainly occur, but everything between these two extremes is possible.

There are different ways to obtain these probabilities. One way is to perform the experiment often and measure how often the event occurs. If you perform the experiment n times, and if in k of the cases event A occurs, then the empirical probability p(A) of that event is defined to be the relative frequency k/n of the event. An obvious disadvantage is that we have to perform the experiment repeatedly, and also that the probabilities we get vary slightly.

However, we are considering so-called a priori models here, where we want to avoid having to perform the experiment at all. Rather we are interested in obtaining values of theoretical probabilities p(A) using mathematical reasoning (theory). These values should predict the relative frequency (empirical probability) as described above in the following sense:

Law of Large Numbers: If an experiment is repeated more and more times, the empirical probability (relative frequency) of an event will converge (come closer and closer) to its theoretical probability. If not, the a priori model is not a good reflection of reality.

Assume that the theoretical probability for the event "head" in the experiment "flipping a coin" is 1/2. Assume that frequencies of "head" after 100, 1000, and 10000 throws are 49, 550, and 5200. Then the corresponding relative frequencies are 49/100=0.49, 550/1000=0.55, and 5200/10000=0.52. The "closer and closer" part stated above does not say that these relative frequencies cannot temporariliy moive away from 0.5, like in the example from the close 0.49 to the a little disappointing 0.55. What it says is that eventually the relative frequency will come closer to 0.5 than it was before almost surely. We can also formulate it like this. A deviation of, say 0.03 (value below 0.47 or above 0.53) is always possible, but the probability for such a fixed deviation decreases as the number of trials increases. Actually the probability for such a fixed deviation in the relative frequency approaches 0. But note also that the absolutedeviation can always exceed a fiwed value. In the example above, the absoulte deviation between the number of heads and the predicted number of heads is -1, 50, 200, and these numbers may increase slowly as the number of trials increases.

Let me mention a common misconception of the Law of Large Numbers. You throw a coin and get a sequence of "Head", "Tail", "Tail", "Head", "Tail", "Head", "Tail", "Tail", "Tail", "Tail", "Tail". How likely is "Head" next? "Many people would say it is higher than 50%, since it is "due". Others may question the fairness of the coin and rather predict another "Tail". However, if we know that we have a fair coin, that the a priori probability of 50% is the right one, then the probability for "Head" is still exactly 50% for all future rounds. Probability Theory and the Law of Large Numbers are not "fair" in the way of actively fighting against an existing deviation. Rather this deviation may absolutely persist, but then the relative deviation will still go to 0.

Probabilities by Simple Counting

The simplest case of a a priori model is when it is known that all simple events are equally likely (like in all our three examples discussed above). If there are n simple events, then the probability for each of them equals 1/n. In case of the die, the probabilities are all 1/6, for the coin it is 1/2, and for the cards it is 1/52. In these cases, in principle the probability of other events can also be computed, though this computation is sometimes tedious. All what is needed is counting the number k of simple events that favorable to the event A, since we get P(A)=k/n then.

A container contains 5 red marbles, 4 blue marbles, and 3 green marbles. Closing your eyes, you remove one randomly. Assume each marble has the same chance of being picked. What is the probability of picking a red marble.
The trick consists in labeling the different marbles, and working with 12 different equally likely simple events: "red1", "red2", ... "green2", "green3". The event "red" consists of the five simple events "red1", ... "red5", therefore P(red)=5/12.
How likely is it to throw two fives with two dice?
Assume one die is red and the other blue. The simple events are the 36 pairs of numbers (1,1), (1,2), ... (6,5), (6,6), where the first entry indicates what the red die shows and the second entry what the blue die shows. (3,5) and (5,3) are two different outcomes. However, two fives is the simple event (5,5). All these simple events are equally likely, therefore P(5,5)=1/36.
How likely is it to get a sum of four with two dice?
It would be possible to formulate the simple events to be the sums of the numbers shon by the two dice. However, this model would have the big disadvantage that the probabilities of the simple events would not be identical (which may not even be obvious to you---think about how likely a sum of 2 is compared with a sum of 7). Therefore a better model is the one used in the previous example with the 36 pairs of numbers as simple events. Then our event A in question of a sum of four consists of simple simple events, namely (1,3), (2,2), and (3,1)---note again that (1,3) and (3,1) are different simple events. Therefore P(A)=3/36.
Two 3-spinner (7,5,4) and (1,8,6) are given. You have to select one and play against the other. The player whose 3-spinner shows the larger number wins. Which 3-spinner would you select?
There are 9 outcomes possible, 7-1, 7-8, 7-6, 5-1, 5-8, 5-6, 4-1, 4-8, 4-6, and all of them are equally likely. In outcome 1, 3, 4, 7, the left player wins, in the other 5 outcomes, the right player wins. Therefore the (1,8,6) spinner is better.

Spinning Applet: Select three values for the left spinner and three values for the right spinner (change the values). Then spin, to see which one is better.
Left SpinnerRight Spinner


More difficult is the situation when the outcomes are not equally likely. Note that the probabilities of all simple events still add up to 1.


not covered yet, might be good for the poker example

Example: Poker

There are 52·51·50·49·48/(5·4·3·2·1)=2,598,960 different Poker 5-card hands (why?). Show that

Note that some hands have several of the features above.

Laws for Probabilies

Probabilities obey certain laws. These laws allow us to compute probabilities for complex events if the probabilities of simple events are known (like in the case of equally likey simple events). ... Two events A and B are mutually exclusive if A and B can never occur at the same time. In other words P(A and B) = 0 in such cases. .... The conditional probability P(A|B) is the probability for event A if we know already that event B is true. Assume you draw two cards and put them face down on the desk. The probability that the first card is an ace is 4/52=1/13. However, if someone turns the second card, our evaluation of the situation will change in any case, no matter what this card shows. If it is an ace, then the probability for our first card to be an ace has sunk to 3/51. If the second card is not an ace, the probability for the first card to be an ace has increased to 4/51. If we define A as the event that the first card is an ace, and B the event that the second card is an ace, then P(A)=1/13, P(A|B)=3/51, and P(A|not B)=4/51.

Now two events are said to be independent if knowledge about whether one of the events is true or not would not affect our estimation of the probability for the other event. In other words, A and B are independent if P(A)=P(A|B) (and also P(B)=P(B|A), by symmetry). In the example above, A and B are not independent, therefore they are dependent. On the other hand, if we define C to be the event that the second card is a diamond, are A and C now independent?

  1. 0 <= P(A) <= 1 for every event A
  2. P(not A) = 1-P(A)
  3. P(A and B) = P(A)·P(B|A) = P(B)·P(A|B),
    and in particular
    Multiplication Property: P(A and B) = P(A)·P(B) if events A and B are independent.
  4. P(A or B) = P(A) + P(B) - P(A and B),
    and in particular
    Addition Property: P(A or B) = P(A)+P(B) if events A and B are mutually exclusive.

The sieve formula P(A or B or C) = P(A)+P(B)+P(C)-P(A and B)-P(A and C)-P(B and C)+P(A and B and C) and higher versions could also be tackled, however they could also be reduced to the law given above by applying the law twice.

Probability Theory was invented by Pierre de Fermat and Blaise Pascal around 1654, by exchanging letters about two problems. The first problem is diuscussed here. The second problem can be stated as follows: Antoine Gombauld, the chevalier de Mere was usually betting thet four consecutive rolls of a die would produce at least one "6". He changed to betting that he would produce a pair of "6"s in 24 consecutive rolls of two dice. Puzzled that he began to loose money, he wrote a letter to Pascal, who mentioned the problem to Fermat. They derived all the formulas above and settled the case.

More links


  1. Last semester, Professor Moriarty gave 7 As out of 85 grades. If you select randomly one of these students, what is the probability that her grade was an A? If you select randomly one of the students of one of Professor Moriarty's this semester's classes, what would be the probability that this student will get an A in this class? Explain!
  2. How likely is it to get a sum of five with two dice? What about a sum of six, or a sum of seven?
  3. A card is selected randomly from a 52-card deck.
    a) What is the probability that the card is a "K" or an "A"?
    b) What is the probability that the card is a "K" or a spade?
    c) What is the probability that the card is a "K" and a spade?
    d) What is the probability that the card is a "K" and an "A"?
  4. a) Two cards are selected randomly from a shuffled 52-card deck. Someone tells you that the first card is black. What is the probability that the second card is black too?
    b) Again two cards are selected randomly from a shuffled 52-card deck. Someone tells you that at least one of the two cards is black. What is the probability that both cards are black?
  5. a) What is the probability that the "6" shows at least once in 6 consecutive rolls of a die? (Use formulas (2) and (3) from the laws.)
    b) What is the probability that the "6" shows at least once in 4 consecutive rolls of a die?
    c) What is the probability that a double-6 occurs at least once in 24 consecutive rolls of two dice?
  6. The casino game "dice" consist of rolling two dice. The player wins in the first round if the sum of the values equals 7 or 11. The player loses in the first round if the sum of the values equals 2, 3, or 12. In all other cases, additional rounds have to be played.
    a) What is the probability for the player to win in the first round?
    b) What is the probability of the player losing in the first round?