A lot of mathematical reasoning is concerned with trying to predict outcomes of events. If I create a sphere of radius 20cm, how large will the surface area be? If I put together 20ml of a 20% acid solution and 30ml of a 50% acid solution, what kind of solution do I get? What will the speed of a fallin apple be after 0.1 seconds? If the roulette ball has an initial speed of 3 meter per seconds, and a given initial direction, and the roulette wheel is spinning at a certain speed, where will the ball end? If the interest rate increases by 0.05, and productivity increases by 3%, how will it affect unemployment rate? Unfortunately, in many situations, theories from the sciences, social sciences, or economics are not strong enough to predict these outcomes. Or, the theories may be accurate, but the data available is just not sufficient, which is the case in the roulette example. In those cases, the outcomes seem to us random, or unpredictable.
This is where Probability Theory and Statistics enter. There is usually a better prediction than "I don't know what will be the outcome, it is random anyway." Although you cannot predict the outcome in these cases, you may be able to describe the likelihood of the different possible outcomes. And this may tell you what to expect. Of course, expectations are not always what you get, but we will see that in the long run it is.
Probability Theory uses some terminology which we have to get used to. An experiment has some unknown outcome. Simple examples are rolling a die, throwing a coin, or selecting a card from a shuffled 52-card deck. But the weather tomorrow, or the outcome of elections, are also experiments. We concentrate on one feature of the experiment, like which number the die shows, whether the coin shows head or tail, or what card is chosen, thereby neglecting other possible features (as, for instance, how long the die rolls. Therefore when describing an experiment you should always desribe the procedure and what you are looking at. Very often there are only finitely many possible outcomes. These outcomes are called the simple events. We have six simple events in the die example, two simple events ("head" and "tail") for coins, and 52 simple events for the cards. General (nonsimple) events are groups of simple events. Examples are all aces (combining four simple events) or all diamonds (combining 13 simple events), but more complicated groups are also possible. A formal mathematician would say that every subset of the set of simple events forms an event.
Next we want to assign a value P(A) to each event A, the probability or likelihood that the outcome of the experiment belongs to A. A probability of 0 means that the event can not occur, a probability of 1 means that the event will certainly occur, but everything between these two extremes is possible.
There are different ways to obtain these probabilities. One way is to perform the experiment often and measure how often the event occurs. If you perform the experiment n times, and if in k of the cases event A occurs, then the empirical probability p(A) of that event is defined to be the relative frequency k/n of the event. An obvious disadvantage is that we have to perform the experiment repeatedly, and also that the probabilities we get vary slightly.
However, we are considering so-called a priori models here, where we want to avoid having to perform the experiment at all. Rather we are interested in obtaining values of theoretical probabilities p(A) using mathematical reasoning (theory). These values should predict the relative frequency (empirical probability) as described above in the following sense:
Assume that the theoretical probability for the event "head" in the experiment "flipping a coin" is 1/2. Assume that frequencies of "head" after 100, 1000, and 10000 throws are 49, 550, and 5200. Then the corresponding relative frequencies are 49/100=0.49, 550/1000=0.55, and 5200/10000=0.52. The "closer and closer" part stated above does not say that these relative frequencies cannot temporariliy moive away from 0.5, like in the example from the close 0.49 to the a little disappointing 0.55. What it says is that eventually the relative frequency will come closer to 0.5 than it was before almost surely. We can also formulate it like this. A deviation of, say 0.03 (value below 0.47 or above 0.53) is always possible, but the probability for such a fixed deviation decreases as the number of trials increases. Actually the probability for such a fixed deviation in the relative frequency approaches 0. But note also that the absolutedeviation can always exceed a fiwed value. In the example above, the absoulte deviation between the number of heads and the predicted number of heads is -1, 50, 200, and these numbers may increase slowly as the number of trials increases.
Let me mention a common misconception of the Law of Large Numbers. You throw a coin and get a sequence of "Head", "Tail", "Tail", "Head", "Tail", "Head", "Tail", "Tail", "Tail", "Tail", "Tail". How likely is "Head" next? "Many people would say it is higher than 50%, since it is "due". Others may question the fairness of the coin and rather predict another "Tail". However, if we know that we have a fair coin, that the a priori probability of 50% is the right one, then the probability for "Head" is still exactly 50% for all future rounds. Probability Theory and the Law of Large Numbers are not "fair" in the way of actively fighting against an existing deviation. Rather this deviation may absolutely persist, but then the relative deviation will still go to 0.
The simplest case of a a priori model is when it is known that all simple events are equally likely (like in all our three examples discussed above). If there are n simple events, then the probability for each of them equals 1/n. In case of the die, the probabilities are all 1/6, for the coin it is 1/2, and for the cards it is 1/52. In these cases, in principle the probability of other events can also be computed, though this computation is sometimes tedious. All what is needed is counting the number k of simple events that favorable to the event A, since we get P(A)=k/n then.
More difficult is the situation when the outcomes are not equally likely. Note that the probabilities of all simple events still add up to 1.
not covered yet, might be good for the poker example
There are 52·51·50·49·48/(5·4·3·2·1)=2,598,960 different Poker 5-card hands (why?). Show that
Probabilities obey certain laws. These laws allow us to compute probabilities for complex events if the probabilities of simple events are known (like in the case of equally likey simple events). ... Two events A and B are mutually exclusive if A and B can never occur at the same time. In other words P(A and B) = 0 in such cases. .... The conditional probability P(A|B) is the probability for event A if we know already that event B is true. Assume you draw two cards and put them face down on the desk. The probability that the first card is an ace is 4/52=1/13. However, if someone turns the second card, our evaluation of the situation will change in any case, no matter what this card shows. If it is an ace, then the probability for our first card to be an ace has sunk to 3/51. If the second card is not an ace, the probability for the first card to be an ace has increased to 4/51. If we define A as the event that the first card is an ace, and B the event that the second card is an ace, then P(A)=1/13, P(A|B)=3/51, and P(A|not B)=4/51.
Now two events are said to be independent if knowledge about whether one of the events is true or not would not affect our estimation of the probability for the other event. In other words, A and B are independent if P(A)=P(A|B) (and also P(B)=P(B|A), by symmetry). In the example above, A and B are not independent, therefore they are dependent. On the other hand, if we define C to be the event that the second card is a diamond, are A and C now independent?