Probability & Statistics in Quant Finance

TQT 2024

Siddharth Vishwanath

A little about me

$$ % % % % % %%%%%%%%%%%%%%%%%%%%%%%%%%% % %%%%%%%%%%%%%%%%%%%%%%%%%% % %

$$

  • Visiting Assistant Professor at the Department of Mathematics
  • PhD in Statistics from Penn State
  • Quant research at Goldman Sachs
  • Model validation Quant at Nomura

A little bit about my research

  • Broadly, I’m interested in uncertainty quantification & inference for complex systems

  • You discover circles? 🥱 😴 What does that have to do with finance?

  • You discover circles? 🥱 😴 What does that have to do with finance?

ARIMA(1,1,2)

Cox-Ingersoll-Ross

Statistics & finance

Introduction

A Tale of Two Probabilities

\({\mathbb P}\) vs. \(\widetilde{\mathbb P}\)

The forward probability \({\mathbb P}\)

  • Goal: Model the future
  • Uses: Risk management, investing
  • Reference: the “real world” probability
  • Machinery: High-dimensional statistics, machine learning, etc.

The risk neutral probability \(\widetilde{\mathbb P}\)

  • Goal: Extrapolate the present
  • Uses: Pricing, hedging
  • Reference: the “risk neutral” probability
  • Machinery: Itô calculus, PDEs

What is the risk-neutral probability?

Consider the following setup:

  • For time intervals \[ 0=t_0 < t_1 < t_2 < \dots < t_n \]
  • At each time \(t_n\) you have access to:
    • A stock with price \(S_n\).
    • A bond (risk-free asset) with price \(B_n\)
  • From time \(t_n\) to \(t_{n+1}\) the bond always gives you risk-free return, i.e., \[ B_{n+1} = (1+r)B_n \] where \(r\) is the interest rate your bank gives you, for example. \(0.1\%\) 🤣

What is the risk-neutral probability?

Portfolio

A portfolio is a collection of assets you own at any given time.

In this case, a portfolio is two numbers \(\alpha_n, \beta_n > 0\) such that \[ \begin{aligned} V_n &= \alpha_n S_n + \beta_n B_n \quad\quad \text{such that} \quad\quad \alpha_n + \beta_n \equiv 1 \end{aligned} \] \(V_n\) is your net value at time \(t_n\).

Your investment strategy is simple.

  • At each time \(t_n\), choose a value \(\alpha_{n+1}, \beta_{n+1}\) such that
  • Your net value \(V_{n+1} = \alpha_{n+1} S_{n+1} + \beta_{n+1} B_{n+1}\) is maximized 💰🤑💰

What is the risk-neutral probability?

European Call Option

A European call option \(X\) is a derivative where the payoff at time \(t_n\) is \[ X_n = \max(S_n - K, 0) \] where \(K\) is called the strike price.

What is the risk-neutral probability?

Now consider the stock price \(S_n\) at time \(t_n\). At time \(t_{n+1}\) suppose one of two things can happen:

\[ S_{n+1} = \begin{cases} (1+u) \times S_n & \text{with probability } p\\ (1-d) \times S_n & \text{with probability } 1-p \end{cases} \]

here \(p\) is the “real world” probability

Since \(X_n\) depends on \(S_n\), let’s choose a portfolio which mimics the payoff of \(X_{n+1}\) at time \(t_{n+1}\), i.e.,

\[ \begin{aligned} X^u_{n+1} &= \Big(\alpha_{n+1} \times (1+u) S_n\Big) + \Big(\beta_{n+1} \times (1+r) B_n\Big)\\ X^d_{n+1} &= \Big(\alpha_{n+1} \times (1-d) S_n\Big) + \Big(\beta_{n+1} \times (1+r) B_n\Big) \end{aligned} \]

Here, we can solve for \(\alpha_{n+1}\) and \(\beta_{n+1}\)

What is the risk-neutral probability?

\[ \begin{aligned} X^u_{n+1} &= \Big(\alpha_{n+1} \times (1+u) S_n\Big) + \Big(\beta_{n+1} \times (1+r) B_n\Big)\\ X^d_{n+1} &= \Big(\alpha_{n+1} \times (1-d) S_n\Big) + \Big(\beta_{n+1} \times (1+r) B_n\Big) \end{aligned} \]

Here, we can solve for \(\alpha_{n+1}\) and \(\beta_{n+1}\)

\[ \begin{aligned} \alpha_{n+1} &= \frac{X^u_{n+1} - X^d_{n+1}}{(u+d)S_n}\\ \beta_{n+1} &= \frac{1}{1+r}\Big(\frac{(1+u)X^d_{n+1} - (1-d)X^u_{n+1}}{u+d}\Big) \end{aligned} \]

Since you buy \(\alpha_{n+1}\) shares of the stock and \(\beta_{n+1}\) shares of the bond at time \(t_n\), you net value at time \(t_{n}\) needs to be

What is the risk-neutral probability?

Since you buy \(\alpha_{n+1}\) shares of the stock and \(\beta_{n+1}\) shares of the bond at time \(t_n\), you net value at time \(t_{n}\) needs to be \[ \begin{aligned} V_n &= \alpha_{n+1} S_n + \beta_{n+1} B_n\\ &= \dots\\ &= \frac{1}{1+r} \Big( \frac{r+d}{u+d} X^u_{n+1} + \frac{u-r}{u+d} X^d_{n+1} \Big )\\ &= \frac{1}{1+r} \Big( \tilde p X^u_{n+1} + (1-\tilde p) X^d_{n+1} \Big ) \end{aligned} \]

In other words, \[ \begin{aligned} &\overbrace{(1+r) V_n}^{\text{If you took the money and invested it all in bonds at time $t_n$}}\\ &= \underbrace{\tilde p X^u_{n+1} + (1-\tilde p) X^d_{n+1}={\mathbb E}_{\tilde p}(X_{n+1})}_{\text{expected returns from the call option at time $t_{n+1}$}} \end{aligned} \]

What is the risk-neutral probability?

  • Here \(\tilde p = \frac{r+d}{u+d}\) is the risk-neutral probability.
  • When the stock price \(S_t\) doesn’t just go up/down but can take a range of values (like a normal distribution), e.g., \[ S_{n+1} \mid S_n \sim N(S_n, \sigma^2) \equiv {\mathbb P} \] Then \(\widetilde{\mathbb P}\sim N(0, 1)\)
  • When \(t_{n+1} - t_n \approx dt\), then everything becomes continuous time
    • You have to model the assets using stochastic differential equations (SDEs)
    • Brownian Motion \(W_t\)
    • Girsanov’s theorem: There exists a risk-neutral Brownian Motion \(\tilde W_t\) associated with the real-world Brownian Motion \(W_t\)

Derivative Pricing

  • Use the real-world probability \({\mathbb P}\) to model the stock price \(S_t\).

  • Use the risk-neutral probability \(\widetilde{\mathbb P}\) to price the derivative \(X_t\).

  • The price of the derivative \(X_0\) at time \(t=0\) is the expected value of the derivative at time \(T\) under the risk-neutral probability \(\widetilde{\mathbb P}\), i.e.,

\[ X_0 = e^{-rT} \times {\mathbb E}_{\widetilde{\mathbb P}}(X_T) \]

Example: The Black-Scholes-Merton formula

\[ X_0 = S_0 \times \Phi(d_1) - K \times e^{-rT} \times \Phi(d_2) \] where \(d_2 = d_1 - \sigma\sqrt{T}\) and \[ \begin{aligned} d_1 &= \frac{\log(S_0/K) + (r + \sigma^2/2)T}{\sigma\sqrt{T}}\\ \end{aligned} \]

Statistical Modeling

What about \({\mathbb P}\)?

  • The whole discussion about \(\widetilde{\mathbb P}\) is based on the assumption that we know the real-world probability \({\mathbb P}\).

  • But how do we know \({\mathbb P}\)? 🤔

  • We don’t. We have to estimate it from data.

Estimating \({\mathbb P}\)

Let’s look at a simple example where we only have one asset \(r_t\) at time \(t\). Here:

  • \(r_t\) is a (stochastic) interest rate
  • It’s used to:
    • evaluate bond prices
    • create interest rate swaps, and
    • underlies almost every other financial derivative.
  • A common model for \(r_t\) is to assume it follows a Vašiček model, i.e., \[ dr_t = (\alpha - \beta r_t) dt + \sigma dW_t \] where \(W_t\) is a Brownian motion.

Estimating \({\mathbb P}\)

  • A common model for \(r_t\) is to assume it follows a Vašiček model, i.e., \[ dr_t = (\alpha - \beta r_t) dt + \sigma dW_t \] where \(W_t\) is a Brownian motion.

In other words,

\[ r_{t+ \Delta t} \sim N\Big(r_t + (\alpha - \beta r_t) \cdot \Delta t, \ \ \ \sigma^2 \cdot \Delta t\Big) \]

Estimating \({\mathbb P}\)

  • For times \(t_0 < t_1 < t_2 \dots t_n\)
  • You collect data \(r_{t_0}, r_{t_1}, r_{t_2}, \dots, r_{t_n}\)

Estimating \({\mathbb P}\)

  • For times \(t_0 < t_1 < t_2 \dots t_n\)
  • You collect data \(r_{t_0}, r_{t_1}, r_{t_2}, \dots, r_{t_n}\)
  • How do you estimate the parameters \(\alpha, \beta, \sigma\)?

Let \(f_t(r_t \mid \alpha, \beta, \sigma)\) be the probability density function of \(r_t\) at time \(t\)

Then the likelihood of the data is \[ L(\alpha, \beta, \sigma) = \prod_{i=1}^n f_{t_i}(r_{t_i} \mid \alpha, \beta, \sigma) \]

Estimating \({\mathbb P}\)

Let \(f_t(r_t \mid \alpha, \beta, \sigma)\) be the probability density function of \(r_t\) at time \(t\)

Then the likelihood of the data is \[ L(\alpha, \beta, \sigma) = \prod_{i=1}^n f_{t_i}(r_{t_i} \mid \alpha, \beta, \sigma) \]

  • The maximum likelihood estimate (MLE) of \(\alpha, \beta, \sigma\) is:

\[ \hat \alpha, \hat \beta, \hat \sigma = \arg\max_{\alpha, \beta, \sigma} L(\alpha, \beta, \sigma) \]

  • This is a standard optimization problem.

The statistical advantage

  • You have data \(x_1, x_2, \dots, x_t\)

  • You assume a model \(x_{t + \Delta t} = F(x_t \mid \theta)\)

    • Where \(\theta\) are some parameters with interpretation



* If you are willing to dip your toes into the math, you can:

  • Estimate \(\hat\theta\) using a dinosaur computer
  • Use \(\hat\theta\) to make predictions about the future
  • Quantify how much uncertainty you have in your predictions
  • Quantify the effect that changing \(\hat\theta \mapsto \hat\theta + \Delta\theta\) has on your predictions,
  • etc.

21st Century Forecasting

  • You have data \(x_1, x_2, \dots, x_t\)

  • You take a deep learning architecture \(x_t = F(t \mid \theta)\)

    • Where \(\theta\) are some parameters of the network



  • If you have have enough compute you can
    • Estimate \(\hat\theta\) using state of the art GPUs
    • Use \(\hat\theta\) to make predictions about the future
    • But it comes at the price of uncertainty quantification 😭
    • But you don’t have to worry about the math 🙂👍

Which is better?

  • The answer is: it depends.

Philosophically:

  • The first method is based on assumptions.
    • Assumptions have consequences!
  • The second method is based on data.
    • Garbage in, garbage out!

Relistically:

  • If you are a mathematician, you might enjoy the first approach.
    • It’s like solving a puzzle.
  • If you are a computer scientist, you might enjoy the second approach.
    • It’s like playing a video game.

Questions?

References