Here we illustrate three of the famous zero-one laws for convergent sequences of random variables. Their names are: the Borel-Cantelli lemma, the second Borel-Cantelli lemma, and Kolmogorov’s zero-one law. They are going to help us study the nuances in the relationship between almost sure convergence and complete convergence.


Suppose we are given an infinite sequence of random variables

$$ \tag{1}{X_1, X_2, \dots} $$

defined on some probability space $(\Omega, \mathcal A, \operatorname P)$. We list three different modes in which the sequence may be convergent to zero.

  • $(i)$ The sequence $(1)$ converges to zero in probability if for every $\epsilon>0$

$$ \lim_{n\rightarrow \infty} \operatorname P(|X_n|>\epsilon) = 0. $$

  • $(ii)$ The sequence $(1)$ converges to zero almost surely if for every $\epsilon>0$

$$ \lim_{n\rightarrow \infty} \operatorname P({|X_n|> \epsilon} \cup {|X_{n+1}|> \epsilon} \cup \cdots) = 0. $$

It is easily seen that this is equivalent to $\operatorname P \left(\limsup_{n\to \infty }E_{n}\right)=0,$ which is equivalent to the usual condition $\operatorname P(\lim_{n\rightarrow\infty} X_n = 0)=1$.

  • $(iii)$ The sequence $(1))$ converges to zero completely if for every $\epsilon>0$

$$ \lim_{n\rightarrow \infty} \operatorname P({|X_n|> \epsilon}) + \operatorname P({|X_{n+1}|> \epsilon}) + \cdots = 0. $$

Clearly, $(iii)$ implies $(ii)$ and $(i)$, and $(ii)$ implies $(i)$. The example: $\Omega = [0,1]$, $\operatorname P$ is the Lebesgue measure, $X_n = 1$ for $0<\omega < 1/n$, $X_n=0$ otherwise, shows that $(ii)$ does not imply $(iii)$.

We are going to show that $(ii)$ and $(iii)$ are equivalent if $X_n$ are independent. To that end, we need to revise some facts.

Borel-Cantelli lemma


Let $E_1$, $E_2$,… be a sequence of events in some probability space. If the sum of the probabilities of $E_n$ is finite

$$ \sum_{n=1}^{\infty } \operatorname P(E_{n})<\infty , $$

then the probability that infinitely many of them occur is $0$, that is,

$$ \operatorname P \left(\limsup_{n\to \infty }E_{n}\right)=0. $$


The Borel-Cantelli lemma is an almost obvious fact which can be thought of as a more precise way of stating that complete convergence implies almost sure convergence, that is, that $(iii)$ implies $(ii)$. There is a partial converse to it, which is more useful to our goals.

Second Borel-Cantelli lemma


If the events $E_n$ are independent and the sum of the probabilities of $E_n$ diverges to infinity,

$$ \sum_{n=1}^{\infty} \operatorname P(E_{n})=\infty, $$

then the probability that infinitely many of them occur is $1$,

$$ \operatorname P \left(\limsup_{n\rightarrow \infty } E_{n} \right) = 1. $$


Denoting by $E_n$ the events $E_n = \{ \mid X_n \mid > \epsilon \}$, we have that they are independent under the assumption that $X_n$ are independent. If, in addition, $(ii)$ holds, but $(iii)$ does not, we reach to a contradiction in view of the second Borel-Cantelli lemma. Thus, under independence the latter two modes of convergences are equivalent.

The last thing we would like to touch upon here is the limit of a sequence of independent random variables. Such a limit may only be a constant, but in order to show this we need another fundamental convergence result.

Kolmogorov’s zero–one law


Kolmogorov’s zero–one law specifies that a certain type of event, called a tail event, will either almost surely happen or almost surely not happen; that is, the probability of such an event occurring is zero or one. Tail events are defined in terms of infinite sequences of random variables. Suppose $X_1, X_2, \dots$ is an infinite sequence of independent random variables (not necessarily identically distributed). Let $\mathcal {F}$ be the $\sigma$-algebra generated by all $X_{i}$ in the sequence. Then, a tail event $F\in {\mathcal {F}}$ is an event which is probabilistically independent of each finite subset of these random variables. Note that $F$ belonging to $\mathcal {F}$ implies that whether $F$ occurs or not is uniquely determined by the values of all $X_{i}$.

For example, the event that the sequence converges and the event that its sum converges are both tail events. In an infinite sequence of coin-tosses, a sequence of 100 consecutive heads occurring infinitely many times is a tail event. Intuitively, tail events are precisely those events whose occurrence can still be determined if an arbitrarily large but finite initial segment of the $X_{i}$ are removed. In many situations, it can be easy to apply Kolmogorov’s zero–one law to show that some event has probability 0 or 1, but surprisingly hard to determine which of these two extreme values is the correct one.

Exact formulation of Kolmogorov’s zero–one law


Let $(\Omega, \mathcal A, \operatorname P)$ be a probability space and let $\mathcal F_n$ be a sequence of mutually independent $\sigma$-algebras contained in $\mathcal F_n$. Let

$$ \mathcal G_{n}=\sigma {\bigg (}\bigcup_{k=n}^{\infty }\mathcal F_{k}{\bigg )} $$

be the smallest $\sigma$-algebra containing $\mathcal F_n, \mathcal F_{n+1}, \dots$ Then for any event

$$ F \in \bigcap_{n=1}^{\infty } \mathcal G_{n} $$

one has either $\operatorname P(F) = 0$ or $1$.


We now apply Kolmogorov’s zero–one law to show that the limit $X$ of a sequence of independent random variables $X_n$ must be a constant. The argument is valid for multivariate random variables of arbitrary dimension $d$. We begin by noticing that any set of the form $\{ X \in A \mid A \in \mathcal B(\operatorname R^d)\}$ represents a tail event. Therefore, there must be a sufficiently large hypercube $Q_0$ with sides parallel to the coordinate axes and center at the origin such that $\operatorname P(X \in Q_0) = 1.$ Dividing each side by two, we partition $Q_0$ into $2^d$ smaller hypercubes. In view of Kolmogorov’s zero–one law, there must be exactly one of these hypercubes, denoted by $Q_1$, such that $\operatorname P(X \in Q_1) = 1$. We continue this process iteratively and obtain an infinite sequence $Q_0, Q_1, \dots$ of nested hypercubes. Following Cantor’s argument, the intersection of all these hypercubes is a single point, $c$, and it follows that $\operatorname P(X = c) = 1$. This concludes the proof of the claim that $X$ must be a constant.