For distributions like the Normal, it is easy to see what the probability that a r.v. will be towards the “tail end” of the distribution via and the 68-95-99.7 rule. However, how do we get an estimate on the probability for any arbitrary distribution?
There are two important inequalities that provide us upper bounds on the tail probabilities.
Markov’s inequality
Let be a r.v. with a finite expectation. For any constant , we have
Chebyshev’s inequality
Suppose is a r.v. with and . For any constant , we have
To prove, we first square both sides of , giving us . These two probabilities are equivalent. Plugging this into Markov’s inequality, we get
Alternate form
Assume that . Then for any constant , we have
This is useful because it sets up an inequality similar to the 68-95-99.7 rule. For example, we have
If , then we can see that both of the above are true: when is within 2 standard deviations, this means this encompasses 95% of the probability, so we are outside with a probability of 0.05. This is indeed less than . The same for .