Study notes for Lawler Stochastic Calculus (Chapter 1).
| Symbol | Meaning |
|---|---|
| \(\Omega\) | Sample space — the set of all possible outcomes |
| \(\mathcal{F}\) | \(\sigma\)-algebra — the collection of all observable events |
| \(P\) | Probability measure — assigns numbers in \([0,1]\) to events in \(\mathcal{F}\) |
| \((\Omega, \mathcal{F}, P)\) | Probability space — the full mathematical arena |
| \(A \in \mathcal{F}\) | A is a measurable event (we can assign it a probability) |
| \(\Omega \setminus A\) | Complement of A — all outcomes not in A |
| \(\mathcal{F}_n\) | \(\sigma\)-algebra generated by \(X_1,\ldots,X_n\) — information at time n |
| \(\mathcal{F}_m \subseteq \mathcal{F}_n\) | All events knowable at time m are also knowable at time n |
| \(E[Y]\) | Unconditional expectation — best guess for Y with no information |
| \(E[Y \mid \mathcal{F}_n]\) | Conditional expectation — best guess for Y given information \(\mathcal{F}_n\) |
| \(\mathbf{1}_A\) | Function equal to 1 on event A, 0 elsewhere |
| i.i.d. | Independent and identically distributed |
| \(\mu \ll P\) | \(\mu\) absolutely continuous w.r.t. \(P\) — every P-null set is also a \(\mu\)-null set |
| \(S_n\) | Partial sum \(X_1 + X_2 + \cdots + X_n\) |
CORE IDEAS
\(E[Y]\) is the best guess for \(Y\) given no information at all. The unconditional expectation is the baseline: if you know nothing about the outcome of an experiment, your single best guess (in the mean-squared-error sense) is \(E[Y]\).
\(E[Y \mid \mathcal{F}_n]\) is the best guess for Y given the information \(\mathcal{F}_n\). As data arrives one variable at a time — \(X_1, X_2, \ldots, X_n\) — we collect more information. The conditional expectation updates our best guess for Y using whatever is currently known.
\(E[Y \mid \mathcal{F}_n]\) is itself a random variable, not a fixed number. Because it depends on the observed values of \(X_1, \ldots, X_n\) — which are random — it is a function of those observations, hence random.
The formal definition bypasses explicit computation via one key property. \(E[Y \mid \mathcal{F}_n]\) is defined as the unique \(\mathcal{F}_n\)-measurable random variable satisfying \(E[E[Y \mid \mathcal{F}_n] \cdot \mathbf{1}_A] = E[Y \cdot \mathbf{1}_A]\) for all \(\mathcal{F}_n\)-measurable events A.
We assume that the random variables \(X_1, X_2, \ldots\) are defined on a probability space \((\Omega, \mathcal{F}, P)\). Here \(\mathcal{F}\) is a \(\sigma\)-algebra or \(\sigma\)-field of subsets of \(\Omega\), that is, a collection of subsets satisfying:
i. \(\emptyset \in \mathcal{F}\)
ii. \(A \in \mathcal{F}\) implies \(\Omega \setminus A \in \mathcal{F}\)
iii. \(A_1, A_2, \ldots \in \mathcal{F}\) implies \(\bigcup_{n=1}^{\infty} A_n \in \mathcal{F}\).
This defines the three \(\sigma\)-algebra axioms. (i) The empty set — the impossible event — must be an event. (ii) If A is observable, so is its complement: “A did not happen” must also be observable. (iii) Countable unions of events are events: “at least one of \(A_1, A_2, \ldots\) happened” is observable. These three rules make \(\mathcal{F}\) a self-consistent collection of questions we can ask about the experiment.
Sample space: \(\Omega = \{HH, HT, TH, TT\}\)
The full \(\sigma\)-algebra \(\mathcal{F} = 2^\Omega\) contains all 16 subsets. This always satisfies all three axioms and encodes complete information.
A smaller valid \(\sigma\)-algebra encoding only the first flip:
\[\mathcal{F}_1 = \{ \emptyset, \{HH, HT\}, \{TH, TT\}, \Omega \}\]\(\mathcal{F}_1\) lets us distinguish “first flip = H” from “first flip = T” — nothing more.
\(\mathcal{F}_n\) is not a number or a random variable — it is a \(\sigma\)-algebra: a collection of events. “Information in \(X_1,\ldots,X_n\)” means all questions of the form “did the observations satisfy some condition?” that can be answered once \(X_1,\ldots,X_n\) are known.
“The information \(\mathcal{F}_n\) is the smallest sub \(\sigma\)-algebra \(G\) of \(\mathcal{F}\) such that \(X_1, \ldots, X_n\) are \(G\)-measurable. The latter statement means that for all \(t \in \mathbb{R}\), the event \(\{X_j \leq t\} \in \mathcal{F}_n\).”
”\(X_j\) is \(\mathcal{F}_n\)-measurable” means: for every threshold \(t\), the set of outcomes where \(X_j \leq t\) is an event in \(\mathcal{F}_n\). Knowing \(X_1,\ldots,X_n\) is enough to determine \(X_j\). The word “smallest” is important — we take only events genuinely required to describe \(X_1,\ldots,X_n\), not any extras.
Let \(\Omega=\{\omega_1,\omega_2,\omega_3\}\) and define the \(\sigma\)-algebra \(\mathcal{F}=\{\emptyset, \Omega, \{\omega_1,\omega_2\}, \{\omega_3\}\}\).
Define the random variable \(X(\omega_1)=0,\; X(\omega_2)=0,\; X(\omega_3)=1\).
To check measurability, verify that \(\{X\le t\}\in\mathcal{F}\quad \forall t\in\mathbb{R}\).
For example:
\(\{X\le 0\}=\{\omega_1,\omega_2\}\in\mathcal{F}\),
\(\{X\le 0.5\}=\{\omega_1,\omega_2\}\in\mathcal{F}\),
\(\{X\le 2\}=\Omega\in\mathcal{F}\).
Hence \(X\) is \(\mathcal{F}\)-measurable.
Let \(\Omega=\{\omega_1,\omega_2,\omega_3\}\) and consider the same \(\sigma\)-algebra \(\mathcal{F}=\{\emptyset,\Omega,\{\omega_1,\omega_2\},\{\omega_3\}\}\)
Define the function \(Y(\omega_1)=0,\; Y(\omega_2)=1,\; Y(\omega_3)=0\).
Now check measurability.
Consider the event \(\{Y\le 0\}=\{\omega_1,\omega_3\}\).
But \(\{\omega_1,\omega_3\}\notin\mathcal{F}\).
Hence \(Y\) is not \(\mathcal{F}\)-measurable.
\(f(x,y)\) is the joint probability density near \((x,y)\). \(f(x)\) is the marginal density of \(X\) — the probability near \(x\) regardless of \(Y\), obtained by integrating out all \(y\)-values. \(f(y\mid x)\) is the relative weight on each \(y\)-value given \(X = x\). Division by \(f(x)\) normalises so that \(f(y\mid x)\) integrates to 1 over \(y\).
This gives the familiar undergraduate formula:
\[E[Y \mid X = x] = \int_{-\infty}^{\infty} y\, f(y \mid x)\, dy = \frac{\int_{-\infty}^{\infty} y\, f(x,y)\, dy}{f(x)}\]Note that \(E[Y \mid X]\) is a random variable which is determined by the value of the random variable \(X\).
\(E[Y \mid X = x]\) for a fixed \(x\) is a number. But \(E[Y \mid X]\) — without fixing \(x\) — is a function of \(X\). Since \(X\) is random, this function is itself random. This is the key conceptual leap the formal definition must capture.
The tower property emerges naturally:
\[E\bigl[E[Y \mid X]\bigr] = \int_{-\infty}^{\infty} E[Y \mid X = x]\, f(x)\, dx = \iint y\, f(x,y)\, dy\, dx = E[Y]\]Averaging the conditional best-guess over all possible observations recovers the unconditional expectation. This is the tower property, generalised by the abstract definition below.
Condition (i): The output is computable from \(X_1,\ldots,X_n\) alone — it cannot use future information.
Condition (ii): On every slice of \(\Omega\) that can be identified from current information (any \(A \in \mathcal{F}_n\)), the average of \(E[Y \mid \mathcal{F}_n]\) over that slice equals the average of Y over the same slice.
These two conditions uniquely determine \(E[Y \mid \mathcal{F}_n]\) without requiring a closed-form formula.
What is \(\mathbf{1}_A\)?
\[\mathbf{1}_A(\omega) = \begin{cases} 1 & \text{if } \omega \in A \\ 0 & \text{if } \omega \notin A \end{cases}\]So \(E[Z \cdot \mathbf{1}_A]\) = probability-weighted average of Z over outcomes where A occurs = \(\int_A Z \, dP\).
Why not give an explicit formula? In general probability spaces (uncountable \(\Omega\), continuous distributions), no single formula works universally. The two-condition characterisation is both sufficient and rigorous. Existence follows from the Radon-Nikodym theorem; uniqueness from: if \(Z_1\) and \(Z_2\) both satisfy the conditions then \(E[(Z_1 - Z_2)^2] = 0\), so \(Z_1 = Z_2\) almost surely.
Proposition 1.1.1: Suppose \(X_1, X_2, \ldots\) is a sequence of random variables and \(\mathcal{F}_n\) denotes the information at time n. The conditional expectation \(E[Y \mid \mathcal{F}_n]\) satisfies the following properties.
Property 1 — If Y is already known, conditioning changes nothing
\[\text{If } Y \text{ is } \mathcal{F}_n\text{-measurable}, \quad E[Y \mid \mathcal{F}_n] = Y.\]Why: Y is determined by \(X_1,\ldots,X_n\). No uncertainty remains — the best guess is the value itself.
Property 2 — Tower property (law of iterated expectations)
\[m < n \implies E\bigl[\,E[Y \mid \mathcal{F}_n]\;\big|\;\mathcal{F}_m\bigr] = E[Y \mid \mathcal{F}_m].\]Why: Compute the best guess for \(Y\) using information up to time \(n\), then downgrade it to only use information up to time \(m < n\). The result is identical to computing the best guess with time \(m\) information directly.
Special case \(m = 0\): \(E[E[Y \mid \mathcal{F}_n]] = E[Y]\).
Property 3 — Independence means conditioning is useless
\[X_1,\dots,X_n \perp Y \implies E[Y \mid \mathcal{F}_n] = E[Y].\]Why: If the observations carry zero information about \(Y\), the best guess remains the unconditional mean.
Property 4 — Linearity
\[E[aY + bZ \mid \mathcal{F}_n] = a\,E[Y \mid \mathcal{F}_n] + b\,E[Z \mid \mathcal{F}_n].\]Why: Conditional expectation is an integral, and integrals are linear.
Property 5 — Known factors pull out (constants rule)
\[Z \text{ is } \mathcal{F}_n\text{-measurable} \implies E[YZ \mid \mathcal{F}_n] = Z\cdot E[Y \mid \mathcal{F}_n].\]Why: Z is already determined by current information — it plays the role of a known constant. Only \(Y\) carries residual randomness to average over.
Setup: \(X_1, X_2, \ldots\) independent with \(E[X_j] = \mu\). Let \(S_n = X_1 + \cdots + X_n\), \(\mathcal{F}_m = \sigma(X_1,\ldots,X_m)\), \(m < n\).
\[E[S_n \mid \mathcal{F}_m] = E[S_m \mid \mathcal{F}_m] + E[S_n - S_m \mid \mathcal{F}_m]\]\(S_m = X_1+\cdots+X_m\) is fully determined by \(X_1,\ldots,X_m\), so it is \(\mathcal{F}_m\)-measurable:
\[E[S_m \mid \mathcal{F}_m] = S_m\] \[S_n - S_m = X_{m+1}+\cdots+X_n\] \[E[S_n - S_m \mid \mathcal{F}_m] = E[S_n - S_m] = (n-m)\mu\]\(E[S_n \mid \mathcal{F}_m] = S_m + (n-m)\mu\)
Setup: \(\mu = 0\), \(E[X_j^2] = \sigma^2 < \infty\). Same \(S_n\), \(\mathcal{F}_m\) as above.
Write \(S_n = S_m + (S_n - S_m)\) and expand the square:
\[E[S_n^2 \mid \mathcal{F}_m] = E[S_m^2 \mid \mathcal{F}_m] + 2 E[S_m(S_n-S_m) \mid \mathcal{F}_m] + E[(S_n-S_m)^2 \mid \mathcal{F}_m]\]Term 1: \(S_m^2\) is \(\\mathcal{F}_m\)-measurable so \(E[S_m^2 \mid \mathcal{F}_m] = S_m^2\)
Term 2: \(S_m\) pulls out (\(\mathcal{F}_m\)-measurable); \(S_n-S_m\) is independent of \(\mathcal{F}_m\) with mean 0:
\[2 E[S_m(S_n-S_m) \mid \mathcal{F}_m] = 2 S_m \cdot E[S_n-S_m] = 2 S_m \cdot 0 = 0\]Term 3: \((S_n-S_m)^2\) is independent of \(\mathcal{F}_m\); its expectation is the variance of \((n-m)\) increments:
\[E[(S_n-S_m)^2 \mid \mathcal{F}_m] = \text{Var}(S_n-S_m) = (n-m)\sigma^2\]Combine Terms 1, 2, 3:
\[E[S_n^2 \mid \mathcal{F}_m] = S_m^2 + (n-m)\sigma^2\]The key mathematical consequence of “information is never lost” is the set inclusion \(\mathcal{F}_m \subseteq \mathcal{F}_n\) for all \(m < n\). Every event in \(\mathcal{F}_m\) is also in \(\mathcal{F}_n\). This is not a philosophical claim — it is a precise constraint that the sequence of \(\sigma\)-algebras must satisfy to qualify as a filtration.
Formally: A filtration is an increasing sequence of \(\sigma\)-algebras:
\[\mathcal{F}_0 \subseteq \mathcal{F}_1 \subseteq \mathcal{F}_2 \subseteq \cdots \subseteq \mathcal{F}\]\(\mathcal{F}_0 = \{\emptyset, \Omega\}\) — the trivial \(\sigma\)-algebra, representing the state before any observation.
| Symbol | Meaning |
|---|---|
| \(M_n\) | The martingale process at time \(n\) |
| \(\mathcal{F}_n\) | Filtration at time \(n\) — information in \(M_0, M_1, \ldots, M_n\) |
| \(E[M_n \mid \mathcal{F}_m] = M_m\) | The defining martingale condition for \(m < n\) |
| \(\Delta M_n = M_n - M_{n-1}\) | Increment of the martingale at step \(n\) |
| \(B_n\) | The “bet” at time \(n\) — an \(\mathcal{F}_{n-1}\)-measurable process |
| \(W_n = \sum_{j=1}^n B_j \Delta M_j\) | Winnings under a betting strategy — the discrete stochastic integral |
| \(S_n = X_1 + \cdots + X_n\) | Partial sum of i.i.d. mean-zero random variables |
| \(A_n = \sigma_1^2 + \cdots + \sigma_n^2\) | Cumulative variance (predictable compensator) |
| \(E[M_n \mid \mathcal{F}_m] \geq M_m\) | Submartingale condition |
| \(E[M_n \mid \mathcal{F}_m] \leq M_m\) | Supermartingale condition |
| \(\mathcal{F}_{n-1}\)-measurable | \(B_n\) is known before time \(n\) — the “non-anticipating” condition |
CORE IDEAS
A martingale models a fair game. If \(M_n\) represents cumulative winnings, “fair” means: regardless of the history of play, the expected winnings at any future time equal the current winnings. No strategy can give you a systematic advantage over a martingale in finite time.
The martingale condition is a statement about conditional expectations. The defining equation \(E[M_n \mid \mathcal{F}_m] = M_m\) for \(m < n\) is a direct application of §1.1: given everything observed up to time \(m\), the best prediction of \(M_n\) is simply \(M_m\) itself.
To verify the martingale property it suffices to check one step at a time (The One-Step Criterion). Rather than checking \(E[M_n \mid \mathcal{F}_m] = M_m\) for all pairs \(m < n\), it is enough to verify \(E[M_{n+1} \mid \mathcal{F}_n] = M_n\) for every \(n\). The tower property of §1.1 propagates this to all future times.</div>
A martingale has constant expected value. Taking the full expectation: $\mathbb{E}[M_n] = \mathbb{E}[E[M_n \mid \mathcal{F}_0]] = \mathbb{E}[M_0]$ for all $n$. The mean is time-invariant — a necessary (but not sufficient) condition for fairness.
Suppose \(X_1, X_2, \ldots\) is a sequence of random variables to which we associate the filtration \(\{\mathcal{F}_n\}\) where \(\mathcal{F}_n\) is the information contained in \(X_1, \ldots, X_n\).
A sequence of random variables \(M_0, M_1, \ldots\) is called a martingale with respect to the filtration \(\{\mathcal{F}_n\}\) if:
(i) For each \(n\), \(M_n\) is an \(\mathcal{F}_n\)-measurable random variable with \(E[\lvert M_n \rvert] < \infty\).
(ii) If \(m < n\), then
\(E[M_n \mid \mathcal{F}_m] = M_m. \tag{1.4}\)
Unpacking condition (i):
Unpacking condition (ii): Given everything observed up to time \(m\), the best prediction of \(M_n\) at any later time \(n > m\) is simply the current value \(M_m\). The process has no predictable drift in either direction.
If the condition of martingale (\(E[M_n \mid \mathcal{F}_m] = M_m\)) is replaced with \(E[M_n \mid \mathcal{F}_m] \geq M_m\), then the process is called a submartingale. If it is replaced with \(E[M_n \mid \mathcal{F}_m] \leq M_m\), then it is called a supermartingale.
| Process | Condition | Interpretation |
|---|---|---|
| Martingale | \(E[M_n \mid \mathcal{F}_m] = M_m\) | Fair game — no systematic gain or loss |
| Submartingale | \(E[M_n \mid \mathcal{F}_m] \geq M_m\) | Favourable game — expected value grows over time |
| Supermartingale | \(E[M_n \mid \mathcal{F}_m] \leq M_m\) | Unfavourable game — expected value shrinks over time |
In other words, games that are always in one’s favour are submartingales and games that are always against one are supermartingales. (At most games in Las Vegas, one’s winnings give a supermartingale.)
Suppose that \(M_0, M_1, \ldots\) is a martingale with respect to the filtration \(\mathcal{F}_n\) . For \(n \geq 1\), let \(\Delta M_n = M_n - M_{n-1}\). Let \(B_j\) denote the ‘bet’ on the \(j\)th game. We allow negative values of \(B_j\) which indicate betting that the price will go down or the game will be lost. Let \(W_n\) denote the winnings in this strategy: \(W_0 = 0\) and for \(n \geq 1\),
\(W_n = \sum_{j=1}^n B_j [M_j - M_{j-1}] = \sum_{j=1}^n B_j \Delta M_j.\)”
Three conditions on the betting strategy \(B_n\):
\(W_n\) is a martingale — verification:
\[E[W_{n+1} \mid \mathcal{F}_n] = E[W_n + B_{n+1}(M_{n+1} - M_n) \mid \mathcal{F}_n].\]Since \(W_n\) is \(\mathcal{F}_n\)-measurable, \(E[W_n \mid \mathcal{F}_n] = W_n\). Also, since \(B_{n+1}\) is \(\mathcal{F}_n\)-measurable and \(M\) is a martingale,
\[E[B_{n+1}(M_{n+1} - M_n) \mid \mathcal{F}_n] = B_{n+1} E[M_{n+1} - M_n \mid \mathcal{F}_n] = 0.\]Therefore, \(E[W_{n+1} \mid \mathcal{F}_n] = W_n\).
\(W_n\) is known at time \(n\). \(B_{n+1}\) is known at time \(n\) and pulls out. The remaining factor \(E[M_{n+1} - M_n \mid \mathcal{F}_n] = 0\) by the martingale property of \(M\).
Suppose \(X_1, X_2, \ldots\) are independent random variables with \(E[X_j] = 0\) for each \(j\). Let \(S_0 = 0\) and \(S_n = X_1 + \cdots + X_n\). In the last section we showed that if \(m < n\), then \(E[S_n \mid \mathcal{F}_m] = S_m\). Hence, \(S_n\) is a martingale with respect to \(\mathcal{F}_n\), the information in \(X_1, \ldots, X_n\).
Verification using the one-step criterion:
Check \(E[S_{n+1} \mid \mathcal{F}_n] = S_n\):
\[E[S_{n+1} \mid \mathcal{F}_n] = E[S_n + X_{n+1} \mid \mathcal{F}_n] = E[S_n \mid \mathcal{F}_n] + E[X_{n+1} \mid \mathcal{F}_n].\]Suppose \(X_n, S_n, \mathcal{F}_n\) are as in Example 1.2.1 and also assume \(\text{Var}[X_j] = E[X_j^2] = \sigma_j^2 < \infty\). Let
\[A_n = \sigma_1^2 + \cdots + \sigma_n^2, \qquad M_n = S_n^2 - A_n,\]where \(M_0 = 0\). Then \(M_n\) is a martingale with respect to \(\mathcal{F}_n\).
Verification — one-step check:
\[E[S_{n+1}^2 \mid \mathcal{F}_n] = E[(S_n + X_{n+1})^2 \mid \mathcal{F}_n]\] \[= E[S_n^2 \mid \mathcal{F}_n] + 2E[S_n X_{n+1} \mid \mathcal{F}_n] + E[X_{n+1}^2 \mid \mathcal{F}_n].\]Therefore \(E[S_{n+1}^2 \mid \mathcal{F}_n] = S_n^2 + \sigma_{n+1}^2\), and:
\[\begin{aligned} E[M_{n+1} \mid \mathcal{F}_n] &= E[S_{n+1}^2 - A_{n+1} \mid \mathcal{F}_n] \\ &= S_n^2 + \sigma_{n+1}^2 - (A_n + \sigma_{n+1}^2) = S_n^2 - A_n = M_n. \checkmark \end{aligned}\]Martingale betting strategy. Let \(X_1, X_2, \ldots\) be independent random variables with
\[P\{X_j = 1\} = P\{X_j = -1\} = \tfrac{1}{2}\]… We will consider the following betting strategy. We start by betting $1. If we win, we quit; otherwise, we bet $2 on the next game. If we win the second game, we quit; otherwise we double our bet to $4 and play. Each time we lose, we double our bet. At the time that we win, we will be ahead $1d.
The bets are:
\(B_1 = 1\), \(B_j = 2^{j-1} \text{ if } X_1 = X_2 = \cdots = X_{j-1} = -1\), otherwise \(B_j = 0\).
The winnings \(W_n\): either \(+1\) (if we won at some point), or \(-(1 + 2 + 4 + \cdots + 2^{n-1}) = -(2^n - 1)\) (if we lost all \(n\) rounds).
Direct verification that \(E[W_n] = 0\):
\[E[W_n] = 1 \cdot \Bigl(1 - \tfrac{1}{2^n}\Bigr) + \bigl(-(2^n - 1)\bigr) \cdot \tfrac{1}{2^n} = 1 - \tfrac{1}{2^n} - 1 + \tfrac{1}{2^n} = 0.\]The paradox: With probability one we eventually win, so \(W_\infty = \lim_{n \to \infty} W_n = 1\), giving \(E[W_\infty] = 1 \neq 0 = E[W_0]\).
This does not contradict the martingale property of \(W_n\). The issue is that the strategy requires potentially infinite bets (\(B_n = 2^{n-1}\)) — the boundedness condition \(\lvert B_n \rvert \leq K_n\) is satisfied for each fixed \(n\), but the constants \(K_n\) grow without bound. In finite time \(W_n\) is always a martingale; the “beating” of the game only emerges after infinitely many steps.
Every step in §1.2 relies directly on the five properties of conditional expectation from §1.1. The table below makes these dependencies explicit.
| Martingale argument | §1.1 property used |
|---|---|
| \(E[S_n \mid \mathcal{F}_n] = S_n\) — current sum is known | Property 1 — Known \(Y\) |
| \(E[X_{n+1} \mid \mathcal{F}_n] = E[X_{n+1}] = 0\) — future increment is independent | Property 3 — Independence |
| \(E[S_n X_{n+1} \mid \mathcal{F}_n] = S_n E[X_{n+1} \mid \mathcal{F}_n]\) — \(S_n\) pulls out | Property 5 — Constants rule |
| \(E[W_n \mid \mathcal{F}_n] = W_n\) — current winnings are known | Property 1 — Known \(Y\) |
| \(E[B_{n+1}(M_{n+1} - M_n) \mid \mathcal{F}_n] = B_{n+1} \cdot 0\) — \(B_{n+1}\) pulls out | Property 5 — Constants rule |
| One-step check implies all-future check | Property 2 — Tower property |
| Symbol | Meaning |
|---|---|
| \(T\) | A stopping time — a random time that depends only on past and present observations |
| \(\{T = n\} \in \mathcal{F}_n\) | The defining measurability condition for a stopping time |
| \(n \wedge T\) | \(\min\{n, T\}\) — the process stopped at time \(T\) |
| \(M_{n \wedge T}\) | The stopped process — equals \(M_T\) once \(T\) is reached, stays there afterwards |
| \(E[M_T] = E[M_0]\) | The optional sampling conclusion — martingale mean preserved at stopping time \(T\) |
| \(P\{T \leq k\} = 1\) | \(T\) is bounded — the strongest assumption guaranteeing \(E[M_T] = E[M_0]\) |
| \(P\{T < \infty\} = 1\) | \(T\) is almost surely finite — weaker condition, needs extra hypotheses |
| \(\mathbf{1}_{\{T > n\}}\) | Indicator function equal 1 if \(T > n\) and 0 if \(T < n\) |
| \(E[\lvert M_n \rvert \mathbf{1}_{\{T>n\}}] \to 0\) | The uniform integrability condition in OST II |
| \(E[M_{n \wedge T}^2] \leq C\) | The \(L^2\) boundedness condition in OST III |
| \(S_n\) | Simple symmetric random walk \(X_1 + \cdots + X_n\) with \(P\{X_j = \pm 1\} = \tfrac{1}{2}\) |
CORE IDEAS
Stopping a martingale produces another martingale. The stopped process \(Y_n = M_{n \wedge T}\) is always a martingale, regardless of what the stopping time \(T\) is.
\(E[M_T] = E[M_0]\) requires extra conditions beyond just \(P\{T < \infty\} = 1\). The stopped process has constant mean \(E[M_{n \wedge T}] = E[M_0]\) for all finite \(n\). Passing this to the limit \(n \to \infty\) — to get \(E[M_T] = E[M_0]\) — requires uniform integrability or \(L^2\) boundedness.
The OST is a tool for computing probabilities and expectations. By choosing a clever stopping time $T$ and applying OST to two different martingales simultaneously — \(S_n\) and \(S_n^2 - n\) — one can extract exact formulas for hitting probabilities and expected hitting times. Examples 1.3.1 and 1.3.2 demonstrate this technique completely.
A nonnegative integer-valued random variable \(T\) is a stopping time with respect to the filtration \(\{\mathcal{F}_n\}\) if for each \(n\) the event \(\{T = n\}\) is \(\mathcal{F}_n\)-measurable.
What this means: The decision to stop at time \(n\) can only use information available up to and including time \(n\). You cannot decide to stop “because something will happen tomorrow.”
Equivalent condition: \(\{T \leq n\} \in \mathcal{F}_n\) for every \(n\), since
\[\{T \leq n\} = \{T = 0\} \cup \{T = 1\} \cup \cdots \cup \{T = n\},\]and each event \(\{T = k\} \in \mathcal{F}_k \subseteq \mathcal{F}_n\).
The betting interpretation: \(T\) is a stopping time if and only if the strategy “bet 1 on rounds \(1, 2, \ldots, T\) and bet 0 afterwards” is an allowable (predictable) betting strategy in the sense of §1.2. The bet \(B_j = \mathbf{1}_{\{j \leq T\}}\) is \(\mathcal{F}_{j-1}\)-measurable precisely because \(\{T \geq j\} = \{T \leq j-1\}^c \in \mathcal{F}_{j-1}\).
Let $T$ be the stopping time for the betting strategy in §1.2. Then the winnings at time $t$ is
\[M_0 + \sum_{j=1}^n B_j [M_j - M_{j-1}],\]where \(B_j = 1\) if \(j \leq T\) and \(B_j = 0\) if \(j > T\). We can write this as \(M_{n \wedge T}\), where \(n \wedge T\) is shorthand for \(\min\{n, T\}\).
The stopped process \(M_{n \wedge T}\): at time $n$, the process equals:
Since \(B_j = \mathbf{1}_{\{j \leq T\}}\) is predictable (it is \(\mathcal{F}_{j-1}\)-measurable because \(\{T \geq j\} \in \mathcal{F}_{j-1}\)), the discrete stochastic integral result from §1.2 immediately gives:
Suppose \(T\) is a stopping time and \(M_n\) is a martingale with respect to \(\{\mathcal{F}_n\}\). Then \(Y_n = M_{n \wedge T}\) is a martingale. In particular, for each \(n\),
\[E[M_{n \wedge T}] = E[M_0].\]If \(T\) is bounded, that is, if there exists \(k < \infty\) such that \(P\{T \leq k\} = 1\), then
\[E[M_T] = E[M_0]. \tag{1.7}\]Why: Since \(P\{T \leq k\} = 1\), we have \(n \wedge T = T\) for all \(n \geq k\). Therefore \(E[M_{n \wedge T}] = E[M_T]\) for \(n \geq k\). Combined with \(E[M_{n \wedge T}] = E[M_0]\) for all \(n\), we get \(E[M_T] = E[M_0]\).
No extra conditions are needed when \(T\) is bounded — the martingale property of the stopped process is all that is required.
Suppose \(T\) is a stopping time and \(M_n\) is a martingale with respect to \(\{\mathcal{F}_n\}\). Suppose that \(P\{T < \infty\} = 1\), \(E[\lvert M_T \rvert] < \infty\), and for each \(n\),
\[\lim_{n \to \infty} E[\lvert M_n \rvert \mathbf{1}_{\{T > n\}}] = 0. \tag{1.8}\]Then, \(E[M_T] = E[M_0]\).
Where the condition comes from: For every finite \(n\),
\[E[M_0] = E[M_{n \wedge T}] = E[M_T \mathbf{1}_{\{T \leq n\}}] + E[M_n \mathbf{1}_{\{T > n\}}].\]As \(n \to \infty\), the first term converges to \(E[M_T]\). The second term vanishes by condition (1.8). Hence \(E[M_0] = E[M_T]\).
Condition (1.8) in plain English: The contribution to the expected value from paths that have not yet stopped by time \(n\) must vanish as \(n \to \infty\). Paths that take very long to stop and reach very large values can violate this.
The doubling strategy fails here: In Example 1.2.4, if \(T = \min\{n : W_n = 1\}\), then \(P\{T < \infty\} = 1\) but \(E[\lvert W_n \rvert \mathbf{1}_{\{T > n\}}] = (2^n - 1) \cdot 2^{-n} \to 1 \neq 0\). Condition (1.8) is violated, consistently with \(E[W_T] = 1 \neq 0 = E[W_0]\).
Suppose \(T\) is a stopping time and \(M_n\) is a martingale with respect to \(\{\mathcal{F}_n\}\). Suppose that \(P\{T < \infty\} = 1\), \(E[\lvert M_T \rvert] < \infty\), and that there exists \(C < \infty\) such that for each \(n\),
\[E[M_{n \wedge T}^2] \leq C. \tag{1.9}\]Then, \(E[M_T] = E[M_0]\).
Why (1.9) implies (1.8) — the key argument:
For any \(b > 0\), split the expectation:
\[E[\lvert M_n \rvert \mathbf{1}_{\{T > n\}}] = E[\lvert M_n \rvert \mathbf{1}_{\{T > n,\, \lvert M_n \rvert \geq b\}}] + E[\lvert M_n \rvert \mathbf{1}_{\{T > n,\, \lvert M_n \rvert < b\}}].\]Therefore \(\limsup_{n \to \infty} E[\lvert M_n \rvert \mathbf{1}_{\{T > n\}}] \leq C/b\) for every \(b > 0\), so the limit is \(0\).
Practical value: Condition (1.9) is often easier to verify than (1.8) directly — it suffices to bound the second moment of the stopped process uniformly in \(n\).
| Version | Assumption on $T$ | Extra condition | Conclusion |
|---|---|---|---|
| OST I | \(P\{T \leq k\} = 1\) (bounded) | None | \(E[M_T] = E[M_0]\) |
| OST II | \(P\{T < \infty\} = 1\) | \(E[M_{n}\mathbf{1}_{\{T>n\}}] \to 0\) | \(E[M_T] = E[M_0]\) |
| OST III | \(P\{T < \infty\} = 1\) | \(E[M_{n \wedge T}^2] \leq C < \infty\) | \(E[M_T] = E[M_0]\) |
Gambler’s ruin for random walk. Let \(X_1, X_2, \ldots\) be independent, coin-tosses as in (1.6) and let \(S_n = 1 + X_1 + \cdots + X_n\). $S_n$ is called simple (symmetric) random walk starting at 1. … Let \(K > 1\) be a positive integer and let $T$ denote the first time $n$ such that \(S_n = 0\) or \(S_n = K\).
Setup:
Apply OST to \(S_n\):
\[1 = E[S_0] = E[S_T] = 0 \cdot P\{S_T = 0\} + K \cdot P\{S_T = K\}.\]Solving:
\[\boxed{P\{S_T = K\} = \frac{1}{K}, \qquad P\{S_T = 0\} = \frac{K-1}{K}.}\]Interpretation: Starting at 1, with a fair game, the probability of reaching \(K\) before 0 is \(1/K\). As \(K \to \infty\), \(P\{S_T = K\} \to 0\): a gambler with $1 playing against a casino with $K almost surely goes broke. This is the gambler’s ruin estimate.
Let \(S_n = X_1 + \cdots + X_n\) be simple random walk starting at 0. We have seen that \(M_n = S_n^2 - n\) is a martingale. Let \(J, K\) be positive integers and let \(T = \min\{n : S_n = -J \text{ or } S_n = K\}\).
Step 1 — Find \(P\{S_T = K\}\) using \(S_n\):
Apply OST to \(S_n\) (same argument as Example 1.3.1, starting at 0):
\[0 = E[S_T] = (-J) P\{S_T = -J\} + K \cdot P\{S_T = K\}.\] \[\boxed{P\{S_T = K\} = \frac{J}{J+K}, \qquad P\{S_T = -J\} = \frac{K}{J+K}.}\]Step 2 — Find \(E[T]\) using \(M_n = S_n^2 - n\):
Exercise 1.13 establishes that \(E[M_{n \wedge T}^2] \leq C < \infty\), so OST III applies to \(M_n\):
\[E[M_T] = E[M_0] \implies E[S_T^2] - E[T] = 0 \implies E[T] = E[S_T^2].\]Compute \(E[S_T^2]\) directly:
\[\begin{aligned} E[S_T^2] = J^2 P\{S_T = -J\} + K^2 P\{S_T = K\} &= J^2 \cdot \frac{K}{J+K} + K^2 \cdot \frac{J}{J+K} \\ &= \frac{JK(J+K)}{J+K} = JK. \end{aligned}\] \[\boxed{E[T] = JK.}\]Interpretation: The expected time for symmetric random walk to exit \((-J, K)\) starting from 0 is exactly \(JK\). In particular, to travel distance \($K\) from the origin in either direction, the expected time is \(K^2\) (set \(J = K\)).
As in Example 1.3.2, let \(S_n = X_1 + \cdots + X_n\) be simple random walk starting at 0. Let \(T = \min\{n : S_n = 1\}\).
\(P\{T < \infty\} = 1\): From Example 1.3.2, Probility random walk hits 1 before \(-J\) \(P\{S_T = 1\}= J/(J+1) \to 1\) as \(J \to \infty\). So with probability one the walk reaches 1.
But \(E[T] = \infty\): From Example 1.3.2, the expected time to exit \((-J, 1)\) starting at 0 is \(J \cdot 1 = J\). Since this holds for all \(J\) and \(T \geq T_J\) (where \(T_J\) is the exit time from \((-J,1)\)), we have \(E[T] \geq J\) for all \(J\), so \(E[T] = \infty\).
OST fails: \(S_T = 1\) almost surely, so \(E[S_T] = 1 \neq 0 = E[S_0]\).
Why: Neither condition (1.8) nor (1.9) holds for this \(T\). In particular, \(E[M_{n \wedge T}^2] = E[S_{n \wedge T}^2] \to \infty\) as \(n \to \infty\), violating (1.9).
The martingale betting strategy from §1.2 Example 1.2.4 has:
Checking that OST conditions fail:
\[E[\lvert W_n \rvert \mathbf{1}_{\{T > n\}}] = (2^n - 1) \cdot 2^{-n} \to 1 \neq 0,\]so condition (1.8) is violated. Since \(E[W_{n \wedge T}^2] \to \infty\) as well, condition (1.9) also fails. And \(T\) is clearly unbounded. All three OST conditions fail, consistently with \(E[W_T] \neq E[W_0]\).
| Symbol | Meaning |
|---|---|
| \(M_\infty\) | The almost-sure limit of \(M_n\) — guaranteed by the MCT when \(E[\lvert M_n \rvert] \leq C\) |
| \(\sup_n E[\lvert M_n \rvert] \leq C\) | Uniform \(L^1\) bound — the hypothesis of the MCT |
| \(\liminf_{n\to\infty} M_n\), \(\limsup_{n\to\infty} M_n\) | Smallest and largest cluster points of the sequence \((M_n)\) |
| \(U_n(a,b)\) | Number of upcrossings of \([a,b]\) by time \(n\) |
| \(U_\infty(a,b)\) | Total upcrossings of \([a,b]\); MCT proof shows \(E[U_\infty] < \infty\) |
| \(S_j,\, T_j\) | Buy and sell stopping times in the upcrossing strategy |
| \(W_n\) | Winnings from the buy-low-sell-high strategy used in the proof |
| \(R_n,\, G_n\) | Number of red and green balls in Pólya’s urn after \(n\) draws |
| \(M_n = R_n/(n+2)\) | Fraction of red balls — a bounded nonneg martingale converging a.s. |
| \(M_\infty \sim \mathrm{Uniform}[0,1]\) | The limiting distribution of Pólya’s urn |
| \(f_{n,k}(\theta)\) | Posterior Beta density for \(\theta\) after \(n\) Bernoulli trials with \(k\) successes |
CORE IDEAS
Uniform \(L^1\) boundedness forces almost-sure convergence. The condition \(\sup_n E[\lvert M_n \rvert] \leq C < \infty\) prevents the martingale from wandering to \(\pm\infty\), and the upcrossing argument shows it cannot oscillate indefinitely between any two levels \(a < b\). Together these force \(\lim_{n\to\infty} M_n\) to exist with probability one.
The proof is a financial argument via upcrossings. If \(M_n\) crossed between $a$ and $b$ infinitely often, buying every time \(M_n \leq a\) and selling every time \(M_n \geq b\) would produce infinite expected profit. Since \(M_n\) is a fair game, that is impossible. The upcrossing inequality makes this precise and quantitative.
\(E[M_\infty] = E[M_0]\) does NOT follow from the MCT. The theorem guarantees a.s. convergence but not convergence of expectations. The martingale doubling strategy from §1.2 illustrates this: \(E[\lvert W_n \rvert] \leq 2\) for all \(n\), so the MCT applies and \(W_\infty = 1\) a.s., yet \(E[W_\infty] = 1 \neq 0 = E[W_0]\). Preserving the mean requires the extra conditions of the OST.
Nonnegative martingales automatically satisfy the MCT hypothesis. If \(M_n \geq 0\) for all \(n\), then \(E[\lvert M_n \rvert] = E[M_n] = E[M_0]\), so the \(L^1\) bound holds with \(C = E[M_0]\). Pólya’s urn is the prime example.
Suppose \(M_n\) is a martingale with respect to \(\{\mathcal{F}_n\}\) and there exists \(C < \infty\) such that \(E[\lvert M_n \rvert] \leq C\) for all \(n\). Then there exists a random variable \(M_\infty\) such that with probability one
\(\lim_{n \to \infty} M_n = M_\infty.\)
Unpacking the hypothesis — \(E[\lvert M_n \rvert] \leq C\) uniformly in \(n\): The absolute mean of \(M_n\) stays bounded no matter how far the process runs. Since \(E[\lvert M_n \rvert] \geq \lvert E[M_n] \rvert = \lvert E[M_0] \rvert\), this implicitly requires \(E[M_0]\) to be finite, but it is strictly stronger — it controls the size of fluctuations, not just the level.
Unpacking the conclusion — “with probability one”: There exists a single event \(\Omega^* \subseteq \Omega\) with \(P(\Omega^*) = 1\) such that for every \(\omega \in \Omega^*\), the real sequence \((M_n(\omega))\) converges to a finite number \(M_\infty(\omega)\).
What the theorem does NOT say:
Fix \(a < b\). Define stopping times implementing “buy at \(a\), sell at \(b\)”:
\[S_1 = \min\{n : M_n \leq a\}, \qquad T_1 = \min\{n > S_1 : M_n \geq b\},\] \[S_j = \min\{n > T_{j-1} : M_n \leq a\}, \qquad T_j = \min\{n > S_j : M_n \geq b\}.\]The number of completed upcrossings by time \(n\) is:
\[U_n = U_n(a,b) = \max\{j : T_j \leq n\}.\]Each completed upcrossing — going from \(\leq a\) up to \(\geq b\) — produces a profit of at least \(b - a\).
Define bets \(B_k = \mathbf{1}_{\{S_j \leq k-1 < T_j \text{ for some } j\}}\), i.e., \(B_k = 1\) when a position is held (bought but not yet sold), \(B_k = 0\) otherwise. This is \(\mathcal{F}_{k-1}\)-measurable since whether we hold at step \(k\) depends only on \(M_0, \ldots, M_{k-1}\).
The winnings from this strategy satisfy:
\[W_n \geq U_n(b - a) - (a - M_n).\]Here \(U_n(b-a)\) is the profit from completed upcrossings, and \((a - M_n)\) is the potential loss on a position still open at time \(n\) (held below \(a\)).
Since \(B_k\) is predictable and bounded, \(W_n\) is a martingale (§1.2), so \(E[W_n] = 0\):
\[0 = E[W_n] \geq (b-a)\, E[U_n] - E[(a - M_n)].\]Using \((a - M_n) \leq \lvert a \rvert + \lvert M_n \rvert\):
\[\boxed{E[U_n(a,b)] \leq \frac{\lvert a \rvert + E[\lvert M_n \rvert]}{b - a} \leq \frac{\lvert a \rvert + C}{b - a} < \infty.}\]This is Doob’s Upcrossing Inequality.
Since \(E[U_n] \leq (\lvert a \rvert + C)/(b-a)\) for every \(n\), monotone convergence gives:
\[E[U_\infty(a,b)] \leq \frac{\lvert a \rvert + C}{b - a} < \infty \implies U_\infty(a,b) < \infty \quad \text{a.s.}\]Now let \(a, b\) range over all rational pairs \(a < b\). There are only countably many such pairs, so:
\[P\bigl(\exists\, a < b \text{ rational}: U_\infty(a,b) = \infty\bigr) = 0.\]On the complementary event (probability one), \(U_\infty(a,b) < \infty\) for every rational pair, which forces \(\liminf_{n \to \infty} M_n = \limsup_{n \to \infty} M_n\) a.s. Hence \(M_\infty = \lim_{n \to \infty} M_n\) exists and is finite a.s.
Start with one red and one green ball. At each step: draw a ball uniformly at random, observe its colour, return it plus one new ball of the same colour.
\(\begin{aligned}\)E[M_{n+1} \mid \mathcal{F}_n] &= M_n \cdot \frac{R_n + 1}{n+3} + (1 - M_n) \cdot \frac{R_n}{n+3}
&= \frac{R_n(n+3)}{(n+2)(n+3)}
&= \frac{R_n}{n+2}
&= M_n. \checkmark\(\end{aligned}\)
Since \(0 \leq M_n \leq 1\), we have \(E[\lvert M_n \rvert] = E[M_n] = E[M_0] = \tfrac{1}{2}\). The uniform \(L^1\) bound holds with \(C = \tfrac{1}{2}\), so the MCT gives:
\[M_\infty = \lim_{n \to \infty} M_n \quad \text{exists a.s.}\]Exercise 1.11 establishes that for each \(n\), \(M_n\) is uniform on the finite set \(\bigl\{\tfrac{1}{n+2}, \tfrac{2}{n+2}, \ldots, \tfrac{n+1}{n+2}\bigr\}\). As \(n \to \infty\) this discrete uniform converges in distribution to \(\mathrm{Uniform}[0,1]\), so \(M_\infty \sim \mathrm{Uniform}[0,1]\).
Setup: Run Bernoulli trials with unknown success probability \(\theta\), starting from a \(\mathrm{Uniform}[0,1]\) prior. After \(n\) trials with \(S_n = k\) successes, the Bayes update gives the posterior:
\[f_{n,k}(\theta) \propto \theta^k (1-\theta)^{n-k}, \quad \theta \in (0,1).\]This is the \(\mathrm{Beta}(k+1,\, n-k+1)\) density, with posterior mean:
\[E[\theta \mid S_n = k] = \frac{k+1}{n+2} = \frac{S_n + 1}{n+2}.\]This is identical to the Pólya urn fraction \(M_n = R_n/(n+2)\) when \(R_n = S_n + 1\) (successes plus the one initial red ball). The urn fraction equals the posterior mean at every step.
MCT as Bayesian law of large numbers: By the strong law, \(S_n/n \to \theta\) a.s. Combined with \(M_n \to M_\infty\) a.s., we get \(M_\infty = \theta\) a.s. — the posterior mean converges to the true parameter.
| MCT | OST | |
|---|---|---|
| Question | Does \(M_n\) converge as \(n \to \infty\)? | Does \(E[M_T] = E[M_0]\) at a stopping time? |
| Hypothesis | \(\sup_n E[\lvert M_n \rvert] \leq C\) | Bounded \(T\), or (1.8), or (1.9) |
| Conclusion | \(M_n \to M_\infty\) a.s. | \(E[M_T] = E[M_0]\) |
| Mean preserved? | Not guaranteed | Yes — that is the conclusion |
| Counterexample | Simple random walk: \(E[\lvert S_n \rvert] \to \infty\), no convergence | Doubling strategy: \(P\{T<\infty\}=1\) but \(E[W_T] \neq E[W_0]\) |
| Key tool | Upcrossing inequality | Stopped process is a martingale |
| Symbol | Meaning |
|---|---|
| \(E[M_n^2] < \infty\) | Square integrability — the hypothesis of §1.5 |
| \(L^2(\Omega, \mathcal{F}, P)\) | Hilbert space of square-integrable random variables with inner product \((X,Y) = E[XY]\) |
| \((X, Y) = E[XY]\) | Inner product on \(L^2\) — orthogonality means \((X,Y) = 0\) |
| \(\Delta M_n = M_n - M_{n-1}\) | Increment of \(M_n\) at step \(n\) |
| \(E[\Delta M_{n+1} \cdot \Delta M_{m+1}] = 0,\; m \neq n\) | Orthogonality of martingale increments |
| \(J_n\) | Predictable integrand — \(J_n\) is \(\mathcal{F}_{n-1}\)-measurable |
| \(Z_n = \sum_{j=1}^n J_j X_j\) | Discrete stochastic integral of \(J\) with respect to the random walk \(S_n\) |
| \(\sigma^2 = E[X_j^2]\) | Variance of each i.i.d. increment |
| \(\mathrm{Var}[Z_n] = \sigma^2 \sum_{j=1}^n E[J_j^2]\) | Variance rule for the discrete stochastic integral |
| \(\bar{Y}_n = \max\{Y_0, Y_1, \ldots, Y_n\}\) | Running maximum of a nonneg submartingale \(Y_n\) |
| \(\overline{M}_n = \max\{\lvert M_0 \rvert, \ldots, \lvert M_n \rvert\}\) | Running maximum of \(\lvert M_n \rvert\) |
| \(P\{\bar{Y}_n \geq a\} \leq a^{-1} E[Y_n]\) | Doob’s maximal inequality for submartingales |
| \(P\{\overline{M}_n \geq a\} \leq a^{-2} E[M_n^2]\) | Doob’s \(L^2\) maximal inequality for square integrable martingales |
A martingale \(M_n\) is called square integrable if for each \(n\), \(E[M_n^2] < \infty\).
This is the condition that \(M_n \in L^2(\Omega, \mathcal{F}_n, P)\) at every time \(n\).
Why stronger than integrability: Square integrability \(E[M_n^2] < \infty\) implies integrability \(E[\lvert M_n \rvert] < \infty\) by Jensen’s inequality, but not vice versa.
Why weaker than uniform \(L^2\) boundedness: The definition requires \(E[M_n^2] < \infty\) for each fixed \(n\), but the bound may grow with \(n\). The stronger condition \(\sup_n E[M_n^2] \leq C < \infty\) (used in OST III and the MCT) is a separate, stricter requirement.
Random variables \(X, Y\) are orthogonal if \(E[XY] = E[X]\, E[Y]\).
For zero-mean random variables, orthogonality reduces to \(E[XY] = 0\). Independent random variables are orthogonal, but orthogonal random variables need not be independent.
Suppose that \(M_n\) is a square integrable martingale with respect to \(\{\mathcal{F}_n\}\). Then if \(m < n\),
\[E[(\Delta M_{n+1})(\Delta M_{m+1})] = 0,\]where \(\Delta M_k = M_k - M_{k-1}\). Moreover, for all \(n\),
\[E[M_n^2] = E[M_0^2] + \sum_{j=1}^n E\bigl[(\Delta M_j)^2\bigr].\]Proof of orthogonality:
For \(m < n\), the increment \(\Delta M_{m+1} = M_{m+1} - M_m\) is \(\mathcal{F}_n\)-measurable (since \(m+1 \leq n\)). Therefore:
\[E[(\Delta M_{n+1})(\Delta M_{m+1}) \mid \mathcal{F}_n] = (\Delta M_{m+1})\, E[\Delta M_{n+1} \mid \mathcal{F}_n] = (\Delta M_{m+1}) \cdot 0 = 0.\]The second equality uses the martingale property: \(E[\Delta M_{n+1} \mid \mathcal{F}_n] = E[M_{n+1} - M_n \mid \mathcal{F}_n] = 0\). Taking full expectations gives \(E[(\Delta M_{n+1})(\Delta M_{m+1})] = 0\).
Proof of the Pythagorean identity:
Write \(M_n = M_0 + \sum_{j=1}^n \Delta M_j\) and expand the square:
\[M_n^2 = M_0^2 + \sum_{j=1}^n (\Delta M_j)^2 + \sum_{j \neq k} (\Delta M_j)(\Delta M_k).\]Taking expectations and using orthogonality (all cross terms vanish):
\[E[M_n^2] = E[M_0^2] + \sum_{j=1}^n E[(\Delta M_j)^2].\]Suppose that \(X_1, X_2, \ldots\) are independent, identically distributed random variables with mean zero and variance \(\sigma^2\).
The two main examples are:
Let \(S_n = X_1 + \cdots + X_n\) and let \(\{\mathcal{F}_n\}\) be the filtration generated by \(X_1, \ldots, X_n\).
A sequence of random variables \(J_1, J_2, \ldots\) is called predictable (with respect to \(\{\mathcal{F}_n\}\)) if for each \(n\), \(J_n\) is \(\mathcal{F}_{n-1}\)-measurable.
This is the non-anticipating condition from §1.2: The integrand \(J_n\) is determined by observations strictly before time \(n\).
The discrete stochastic integral is defined by:
\[Z_n = \sum_{j=1}^n J_j X_j = \sum_{j=1}^n J_j \,\Delta S_j.\]Property 1 — Martingale property
\[Z_n \text{ is a martingale with respect to } \{\mathcal{F}_n\}.\]Proof:
\[E[Z_{n+1} \mid \mathcal{F}_n] = E[Z_n + J_{n+1} X_{n+1} \mid \mathcal{F}_n] = Z_n + J_{n+1}\, E[X_{n+1} \mid \mathcal{F}_n] = Z_n + J_{n+1} \cdot 0 = Z_n.\]Here: \(Z_n\) is \(\mathcal{F}_n\)-measurable (Property 1 of §1.1); \(J_{n+1}\) is \(\mathcal{F}_n\)-measurable and pulls out (Property 5); \(X_{n+1}\) is independent of \(\mathcal{F}_n\) with \(E[X_{n+1}] = 0\) (Property 3).
Property 2 — Linearity
If \(J_n, K_n\) are predictable sequences and \(a, b\) constants, then \(aJ_n + bK_n\) is predictable and:
\[\sum_{j=1}^n (aJ_j + bK_j) X_j = a \sum_{j=1}^n J_j X_j + b \sum_{j=1}^n K_j X_j.\]Property 3 — Variance rule
\[\mathrm{Var}[Z_n] = E[Z_n^2] = \sigma^2 \sum_{j=1}^n E[J_j^2].\]Proof:
Using orthogonality of martingale increments (§1.5), the cross terms \(E[J_j X_j \cdot J_k X_k]\) vanish for \(j \neq k\):
\[E[Z_n^2] = \sum_{j=1}^n E[J_j^2 X_j^2].\]Since \(J_j\) is \(\mathcal{F}_{j-1}\)-measurable and \(X_j\) is independent of \(\mathcal{F}_{j-1}\):
\[E[J_j^2 X_j^2] = E\bigl[E[J_j^2 X_j^2 \mid \mathcal{F}_{j-1}]\bigr] = E\bigl[J_j^2\, E[X_j^2 \mid \mathcal{F}_{j-1}]\bigr] = E[J_j^2]\, \sigma^2.\]Summing over \(j\) gives \(E[Z_n^2] = \sigma^2 \sum_{j=1}^n E[J_j^2]\).
| Feature | Discrete: \(Z_n = \sum J_j X_j\) | Continuous: \(\int_0^t A_s\, dB_s\) |
|---|---|---|
| Integrand condition | \(J_n\) is \(\mathcal{F}_{n-1}\)-measurable (predictable) | \(A_s\) is adapted, square-integrable |
| Martingale property | \(Z_n\) is a martingale | \(\int_0^t A_s\, dB_s\) is a martingale |
| Variance rule | \(E[Z_n^2] = \sigma^2 \sum E[J_j^2]\) | \(E\!\left[\left(\int_0^t A_s\, dB_s\right)^2\right] = \int_0^t E[A_s^2]\, ds\) |
| Linearity | ✓ direct from summation | ✓ by construction |
Suppose \(Y_n\) is a nonneg submartingale with respect to \(\{\mathcal{F}_n\}\), and \(\bar{Y}_n = \max\{Y_0, Y_1, \ldots, Y_n\}\). Then for every \(a > 0\),
\[P\{\bar{Y}_n \geq a\} \leq \frac{1}{a}\, E[Y_n].\]Proof:
Let \(T = \min\{k \leq n : Y_k \geq a\}\) (with \(T = n+1\) if no such \(k\) exists). Then:
\[\{\bar{Y}_n \geq a\} = \bigsqcup_{k=0}^n A_k, \quad A_k = \{T = k\}.\]Each \(A_k \in \mathcal{F}_k\). Since \(Y_n\) is a submartingale, \(E[Y_n \mid \mathcal{F}_k] \geq Y_k\) for \(k \leq n\), so:
\[E[Y_n \mathbf{1}_{A_k}] = E\bigl[E[Y_n \mid \mathcal{F}_k]\, \mathbf{1}_{A_k}\bigr] \geq E[Y_k\, \mathbf{1}_{A_k}] \geq a\, P(A_k).\]Summing over \(k = 0, 1, \ldots, n\):
\[E[Y_n] \geq E\!\left[Y_n\, \mathbf{1}_{\{\bar{Y}_n \geq a\}}\right] = \sum_{k=0}^n E[Y_n\, \mathbf{1}_{A_k}] \geq a\, P\{\bar{Y}_n \geq a\}.\]Dividing by \(a\) gives the result.
If \(M_n\) is a square integrable martingale with respect to \(\{\mathcal{F}_n\}\) and \(\overline{M}_n = \max\{\lvert M_0 \rvert, \ldots, \lvert M_n \rvert\}\), then for every \(a > 0\),
\[P\{\overline{M}_n \geq a\} \leq \frac{E[M_n^2]}{a^2}.\]Proof: Exercise 1.15 shows that if \(M_n\) is a martingale and \(\varphi\) is a convex function, then \(\varphi(M_n)\) is a submartingale. Taking \(\varphi(x) = x^2\) gives that \(M_n^2\) is a nonneg submartingale. Apply Theorem 1.7.1 to \(Y_n = M_n^2\) with threshold \(a^2\):
\[P\{\bar{Y}_n \geq a^2\} \leq \frac{E[M_n^2]}{a^2}.\]Since \(\{\bar{Y}_n \geq a^2\} = \{\max_k M_k^2 \geq a^2\} = \{\overline{M}_n \geq a\}\), the result follows.
Setup: \(X_1, X_2, \ldots\) i.i.d. with \(P\{X_j = \pm 1\} = \tfrac{1}{2}\), so \(\sigma^2 = 1\). Let \(S_n = X_1 + \cdots + X_n\).
Integrand: \(J_j = S_{j-1}\) (the running sum just before step \(j\), which is \(\mathcal{F}_{j-1}\)-measurable ✓).
Integral: \(Z_n = \sum_{j=1}^n S_{j-1} X_j.\)
Apply the variance rule:
\[E[Z_n^2] = \sigma^2 \sum_{j=1}^n E[J_j^2] = \sum_{j=1}^n E[S_{j-1}^2] = \sum_{j=1}^n (j-1) = \frac{n(n-1)}{2}.\]Here we used \(E[S_{j-1}^2] = j - 1\) (since \(\mathrm{Var}[S_{j-1}] = j-1\) for zero-mean i.i.d. increments with \(\sigma^2 = 1\)).
Direct check with Itô’s formula analogy: The integral \(Z_n = \sum S_{j-1} X_j\) is the discrete analogue of \(\int_0^t B_s\, dB_s\), which by Itô’s formula equals \(\tfrac{1}{2}(B_t^2 - t)\). The variance of \(\tfrac{1}{2}(B_t^2 - t)\) at time \(t\) is \(\tfrac{t^2}{2}\), consistent with \(n(n-1)/2 \approx n^2/2\) for large \(n\).
All three sections prepare the same three-property package that will recur throughout Chapters 3 and 4:
| Property | §1.6 Discrete version | Chapter 3 Continuous version |
|---|---|---|
| Martingale | \(Z_n = \sum J_j X_j\) is a martingale | \(\int_0^t A_s\, dB_s\) is a martingale |
| Linearity | \(\sum (aJ_j + bK_j) X_j = a Z_n^J + b Z_n^K\) | \(\int (aA + bC)\, dB = a\int A\, dB + b \int C\, dB\) |
| Variance rule (Itô isometry) | \(E[Z_n^2] = \sigma^2 \sum E[J_j^2]\) | \(E\!\left[\left(\int_0^t A_s\, dB_s\right)^2\right] = \int_0^t E[A_s^2]\, ds\) |
| Maximal inequality | \(P\{\overline{M}_n \geq a\} \leq E[M_n^2]/a^2\) | \(P\{\sup_{s \leq t} \lvert M_s \rvert \geq a\} \leq E[M_t^2]/a^2\) |