MLE for the coin toss example
In the typical tossing coin example, with probability for the head equal to $p$ and tossing the coin $n$ times let’s calculate the Maximum Likelihood Estimate (MLE) for the heads.
We know this is typical case of Binomial distribution that is given with this formula:
$\operatorname{Bin}(k;n,p) = \binom{n}{k}p^k(1-p)^{n-k}$
( Read: $k$ is parametrized by $n$ and $p$)
We have:
$n=H+T$ is the total number of tossing, and $H=k$ is how many heads.
Leading to:
$\operatorname{Bin}(H;H+T,p) = \binom{H+T}{H}p^H(1-p)^{T}$
$\operatorname{Bin}(H;H+T,p)_{\operatorname{MLE}} = \underset{p}{\operatorname{arg\,max}} \binom{H+T}{H}p^H(1-p)^{T}$
$=\underset{p}{\operatorname{arg\,max}} \operatorname{log} \big[ \binom{H+T}{H}p^H(1-p)^{T} \big]$
$=\underset{p}{\operatorname{arg\,max}} \big[ \operatorname{log} \binom{H+T}{H} + \operatorname{log} p^H + \operatorname{log}(1-p)^{T} \big]$
$=\underset{p}{\operatorname{arg\,max}} \big[ H \operatorname{log} p + T \operatorname{log}(1-p) \big]$
We used log
trick to gain numerical stability, and we removed the constant in this transformation process since it will not affect the argmax
.
To get the MLE, and since this is the estimation we will find where first derivative of the is equal to zero:
$\large \frac{\partial [ H \operatorname{log} p + T \operatorname{log}(1-p)]}{\partial p}=\small 0$
And this is true for:
$\large \frac{H}{p} = \frac{T}{1-p}$
So:
$\large p_{\small \text{MLE}} = \frac{H}{T+H}$
We could intuitively get the same conclusion, let’s say we have some tossing events:
$\mathcal{T}=\{h, h, h, t, t, h, t, t, t, h, t \}$, where $\mathcal{T}$ is our tossing set with $n = T+H = 11$ elements, and number of heads is $H=5$. Just based on this example:
$\large p_{\small \text{MLE}}$ is ${H \over {T+H}} = {5 \over 11}$.
Addendum
Bernoulli distribution
Bernoulli distribution is a distribution for a single binary random variable $X$ with state $x \in{0,1}$. It is governed by a single continuous parameter $\mu \in[0,1]$ that represents the probability of $X=1 .$ The Bernoulli distribution $\operatorname{Ber}(\mu)$ is defined as:
\[\begin{aligned} p(x \mid \mu) &=\mu^{x}(1-\mu)^{1-x}, \quad x \in\{0,1\}, \\ \mathbb{E}[x] &=\mu, \\ \mathbb{V}[x] &=\mu(1-\mu) \end{aligned}\]where $\mathbb{E}[x]$ and $\mathbb{V}[x]$ are the mean and variance of the binary random variable $X$.
Binomial distribution
Binomial distribution is generalization of the Bernoulli distribution.
In particular, the Binomial can be used to describe the probability of observing $m$ occurrences of $X=1$ in a set of $N$ samples (number of trials) from a Bernoulli distribution where $p(X=1)=\mu \in[0,1] .$ The Binomial distribution $\operatorname{Bin}(N, \mu)$ is defined as:
\[\begin{aligned} p(x \mid N, \mu, m) &=\left(\begin{array}{c} N \\ m \end{array}\right) \mu^{m}(1-\mu)^{N-m} \\ \mathbb{E}[x] &=N \mu \\ \mathbb{V}[x] &=N \mu(1-\mu) \end{aligned}\]where $\mathbb{E}[x]$ and $\mathbb{V}[x]$ are the mean and variance of $m$, respectively.