## Tuesday, December 27, 2016

### Bernoulli vs Binomial vs Multinoulli vs Multinomial distributions

Bernoulli distribution
Say you are flipping a coin that has a probability of 0.4 of turning up heads and 0.6 of turning up tails. The simplest probability distribution there is is the Bernoulli distribution which just asks for the probability of the coin turning up heads after one toss, that is, 0.4. That's all. The Bernoulli distribution is only about the probability of a random event with two possible outcomes occurring after one trial. It's used for success/fail type random variables where you want to reason about the probability that a random variable will succeed (such as a coin turning up heads).

The probability mass function for the Bernoulli distribution is:
f(x;p) = p^x * (1 - p)^(1-x)

where "*" is multiplication and "^" is power, "p" is the probability of a success and "x" is the number of successes desired, which can be either 1 or 0. When "x" is 1 then you find the probability of a success whilst when it is 0 then you find the probability of a fail. Since there are only two outcomes, if "p" is the probability of a success then the probability of a non-success is "1-p". Notice that this function is quite silly really as it is just a closed form expression for choosing either "p" or "1-p" depending on what "x" is. The number you're interested in is raised to the power of 1 whilst the number you're not interested in is raised to the power of 0, turning it into 1, and then the two results are multiplied together.

Binomial distribution

The probability mass function for the Binomial distribution is:
f(x;n,p) = n!/(x! * (n-x)!) * p^x * (1 - p)^(n-x)

where "*" is multiplication and "^" is power, "p" is the probability of a success, "n" is the number of trials to try out, and "x" is the number of desired successes out of the "n" trials.

Multinoulli distribution
Whereas the binomial distribution generalises the Bernoulli distribution across the number of trials, the multinoulli distribution generalises it across the number of outcomes, that is, rolling a dice instead of tossing a coin. The multinoulli distribution is also called the categorical distribution.

The probability mass function for the multinoulli distribution is:
f(xs;ps) = ps_1^xs_1 * ps_2^xs_2 * ... * ps_k^xs_k

where "*" is multiplication and "^" is power, "k" is the number of outcomes (6 in the case of a dice, 2 for a coin), "ps" is the list of "k" probabilities where "ps_i" is the probability of the ith outcome resulting in a success, "xs" is a list of numbers of successes where "xs_i" is the number of successes desired for the ith outcome, which can be either 1 or 0 and where there can only be exactly a single "1" in "xs".

Multinomial distribution
Finally the most general generalisation of the Bernoulli distribution is across both the number of trials and the number of outcomes, called the multinomial distribution. In order to be more general, this distribution also allows specifying the number of successes desired for each individual outcome, rather than just of one outcome like the multinoulli distribution. This lets us calculate the probability of rolling a dice 6 times and getting 3 "2"s, 2 "3"s and a "5". Since we want it to be compatible with the binomial distribution, these outcomes can arrival in any order, that is, it doesn't matter whether you get [5,3,2,3,2,2] or [2,2,2,3,3,5]. For this reason we need to use the combination function again.

The probability mass function for the multinomial distribution is:
f(xs;n,ps) = n!/(xs_1! * xs_2! * ... * xs_k!) * ps_1^xs_1 * ps_2^xs_2 * ... * ps_k^xs_k

where "*" is multiplication and "^" is power, "k" is the number of outcomes (6 in the case of a dice, 2 for a coin), "ps" is the list of "k" probabilities where "ps_i" is the probability of the ith outcome resulting in a success, "xs" is a list of numbers of successes where "xs_i" is the number of successes desired for the ith outcome, the total of which must be "n".