Topological Data Analysis

class: inverse, center, title-slide, middle
count: false
background-image: url(https://wallpapercave.com/wp/91vqmm7.jpg)
background-size: cover

# .white.massive[Topological Data Analysis ]

### .bolder[Siddharth Vishwanath] .small.black[28 August, 2020]

<a href="https://sidv23.github.io/smac-2020"> `$\bbox[2pt,grey]{\color{orange}{\texttt{sidv23.github.io/smac-2020}}}$` </a>

---
class: center

# .black[Topological Data Analysis]

### | TDA |

A collection of mathematical, statistical and algorithmic tools to analyze complex data

--
### .red[*Today's objective:* ] 
--
.orange[.small[A gentle introduction ... ]]
--
.orange[.small[with no topology ... ]]
--
.orange[.small[I promise]] .large[🤞]

---

# Motivation

---

# Sufficient Statistics may not suffice

<img src="images/Anscombes_quartet_3.svg" width="650">
.center[Anscombe's quartet]
</div>

---
layout: false
count: false

# Sufficient Statistics may not suffice

<div class="centered">
<img align="centered" class="animated-gif" src="images/DinoSequential.gif" width="120%">
.center[Anscombe's quartet on steroids]
</div>

---
layout: false
class: center

# .left[Quantum Mechanics]

![](slides_files/figure-html/unnamed-chunk-3-1.png)

Electron clouds for the `$2p_z$` and `$3d_{z^2}$` orbitals

---
class: center
count: false

# .left[Quantum Mechanics]

![](slides_files/figure-html/unnamed-chunk-4-1.png)

Electron clouds for the `$2p_z$` and `$3d_{z^2}$` orbitals

---

# .left[Sensor Network]
 
![](https://francis.naukas.com/files/2010/04/dibujo20100413_homological_sensor_networks_how_answer_global_questions_from_local_data_small_local_sensors_samples_-environment_at_set_nodes.png)

---

# .left[Sensor Network]
 
![](http://www.ams.org/images/sensors.jpg)

---
class: inverse, center, middle
count: false

# Introduction

---

# What is Topological Data Analysis?
* A methodology to extract shape from complex data 
--
 - **Setup:** Given `$\mathbb{X}_n = \{ \boldsymbol{x}_1, \boldsymbol{x}_2, \dots \boldsymbol{x}_n \} \subset \mathbb{R}^d$` 
<img src="slides_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" />
 
--
 - **Objective:** What is the "*shape*" of `$\mathbb{X}_n$`?

---

# What is the shape of `$\mathbb{X}_n$`?

---
class: center
# .left[Quintessential Example]

Coffee Mug `$\simeq$` Doughnut

---
layout: false
class: center

# .left[Topology in one slide]

---

# Topology in one slide

### Equivalence classes of spaces

* **Homeomorphism**
    * `$\mathcal{X} \cong \mathcal{Y}$` if and only if 
        * there exists `$f: \mathcal{X} \rightarrow \mathcal{Y}$` bijective
        * such that `$f$` and `$f^{-1}$` are continuous

* **Homotopy**
    * `$\mathcal{X} \simeq \mathcal{Y}$` if and only if 
        * there exists continuous `$f: \mathcal{X} \rightarrow \mathcal{Y}$` and `$g : \mathcal{Y} \rightarrow \mathcal{X}$` 
        * such that `$f\circ g \sim \text{id}_{_{\mathcal{X}}}$` and `$g\circ f \sim \text{id}_{_{\mathcal{Y}}}$`

---

# Topology in ~~one~~ two slides

### What remains the same when `$\mathcal{X} \simeq \mathcal{Y}$` ?

* **Homology** 
 * The fundamental invariant 
 * Encodes topological information as a vector-space 
* **Betti numbers** 
 * Dimension of the homology group 
 * Counts the number of holes in the space 
 * `$\beta_0 = \#$`connected components 
 * `$\beta_1 = \#$`loops 
 * `$\beta_2 = \#$`holes ... 
* **Euler Characteristic** 
 * A signature of the each space 
 * Looks at alternating sums of Betti numbers 
$$
\chi = \beta_0 - \beta_1 + \beta_2 \dots \pm \beta_d
$$

---

.column[.content[
.split-five[
.row.bg-main1[.content.center.vmiddle[
# Space
]]
.row.bg-main2[.content.center[
## Circle
]]
.row.bg-main3[.content.center[
## Sphere
]]
.row.bg-main4[.content.center[
## Torus
]]
.row.bg-main5[.content.center[
## `$3d_{z^2}$`
]]
]]]
.column[.content[
.split-five[
.row.bg-main1[.content.center.vmiddle[
# Shape
]]
.row.bg-main2[.content.center[
<img src="images/circle.svg" width="100",height="100">
]]
.row.bg-main3[.content.center[
<img src="images/sphere.svg" width="100",height="100">
]]
.row.bg-main4[.content.center[
<img src="images/torus.svg" width="100",height="100">
]]
.row.bg-main5[.content.center[
<img src="images/orbital.png" width="100",height="100">
]]
]]]
.column[.content[
.split-five[
.row.bg-main1[.content.center.vmiddle[
# `$\beta_0$` 
]]
.row.bg-main2[.content.center[
## 1
]]
.row.bg-main3[.content.center[
## 1
]]
.row.bg-main4[.content.center[
## 1
]]
.row.bg-main5[.content.center[
## 1
]]
]]]
.column[.content[
.split-five[
.row.bg-main1[.content.center.vmiddle[
# `$\beta_1$`
]]
.row.bg-main2[.content.center[
## 1
]]
.row.bg-main3[.content.center[
## 0
]]
.row.bg-main4[.content.center[
## 2
]]
.row.bg-main5[.content.center[
## 1
]]
]]]
.column[.content[
.split-five[
.row.bg-main1[.content.center.vmiddle[
# `$\beta_2$` 
]]
.row.bg-main2[.content.center[
## 0
]]
.row.bg-main3[.content.center[
## 1
]]
.row.bg-main4[.content.center[
## 1
]]
.row.bg-main5[.content.center[
## 3
]]
]]]

---
class: fade-row2 fade-row3 fade-row4 fade-row5 gray-row2 gray-row3 gray-row4 gray-row5

---
count: false
class: fade-row3 fade-row4 fade-row5 gray-row3 gray-row4 gray-row5

---
count: false
class: fade-row2 fade-row4 fade-row5 gray-row2 gray-row4 gray-row5

---
count: false
class: fade-row2 fade-row3 fade-row5 gray-row2 gray-row3 gray-row5

---
count: false
class: fade-row2 fade-row3 fade-row4 gray-row2 gray-row3 gray-row4

---
layout: false
class: inverse, center, middle
count: false

# TDA Pipeline
## .orange[Persistent Homology]

---
layout: true
class: split-two  with-border border-black

.column.bg-orange[.content[
 .split-five[
 .row[.content.left[
 
 .Large.bolder[Input:] `$\hspace{0.5cm} \mathbb{X}_n = \{ \boldsymbol{x}_1,\boldsymbol{x}_2, \dots ,\boldsymbol{x}_n \} \subset \mathbb{R}^d$`
 
 ]]
 .row[.content.left[
 
 `$\hspace{1cm}$` **1:** At resolution `$r>0$`, look at `$\bigcup\limits_{i=1}^n B_r(\mathbf{x}_i)$`
 
 ]]
 .row[.content.left[
 
 `$\hspace{1cm}$` **2:** Construct a ****simplicial complex**** `$\mathcal{K}(\mathbb{X}_n, r)$`
 
 ]]
 .row[.content.left[
 
 `$\hspace{1cm}$` **3:** Examine the ****filtration**** `$\{\mathcal{K}( \mathbb{X}_n, r )\}_{r>0}$`
 <br 
 ]]
 .row[.content.left[
 
 .Large.bolder[Output:] `$\hspace{0.5cm}\textbf{Bar}(\mathbb{X}_n)$` or `$\textbf{Dgm}(\mathbb{X}_n)$`
 
 ]]
 ]
]]

.column.bg[.content.center.vmiddle[
{{content}}
]
]

---

class: fade-row2-col1 fade-row3-col1 fade-row4-col1 fade-row5-col1 with-border  gray-row2-col1 gray-row3-col1 gray-row4-col1 gray-row5-col1

---

class: fade-row1-col1 fade-row3-col1 fade-row4-col1 fade-row5-col1 with-border  gray-row1-col1 gray-row3-col1 gray-row4-col1 gray-row5-col1
count: false

---

class: fade-row1-col1 fade-row2-col1 fade-row4-col1 fade-row5-col1 with-border  gray-row1-col1 gray-row2-col1 gray-row4-col1 gray-row5-col1
count: false

---

class: fade-row1-col1 fade-row2-col1 fade-row3-col1 fade-row5-col1 with-border  gray-row1-col1 gray-row2-col1 gray-row3-col1 gray-row5-col1
count: false

---

class: fade-row1-col1 fade-row2-col1 fade-row3-col1 fade-row5-col1 with-border  gray-row1-col1 gray-row2-col1 gray-row3-col1 gray-row5-col1
count: false

---

class: fade-row1-col1 fade-row2-col1 fade-row3-col1 fade-row5-col1 with-border  gray-row1-col1 gray-row2-col1 gray-row3-col1 gray-row5-col1
count: false

---

class: fade-row1-col1 fade-row2-col1 fade-row3-col1 fade-row5-col1 with-border  gray-row1-col1 gray-row2-col1 gray-row3-col1 gray-row5-col1
count: false

---

class: fade-row1-col1 fade-row2-col1 fade-row3-col1 fade-row4-col1 with-border  gray-row1-col1 gray-row2-col1 gray-row3-col1 gray-row4-col1
count: false

---
layout: false

# Persistent Homology

---
class: center
# .left[Diagrams and Barcodes]

<img src="images/2d_surface2.svg" width="450",height="450">
 
`$\mathbb{X}_n$` is sampled from `$2p_z$` orbital

---
layout: false
class: center
count: false

# .left[Diagrams and Barcodes]

.pull-left[
<img src="images/2d_barcode.svg" width="420",height="420">
 
`$\textbf{Bar}(\mathbb{X}_n)$`
]

.pull-left[
<img src="images/2d_diagram.svg" width="400",height="400">
 
`$\textbf{Dgm}(\mathbb{X}_n)$`
]

---
class: inverse, center, middle
count: false

# Statistics + TDA

## .purple[Statistical Invariance of Betti Numbers] <a href="https://arxiv.org/abs/2001.00220">.half.purple[Vishwanath, Fukumizu, Kuriki, and Sriperumbudur (2020a)] </a>

---
# Random Topology

* Given a probability space `$(\Omega,\mathcal{F},\mathbb{P})$` and some metric-space `$\mathcal{X}$`

* `$\mathbb{X}_n = \{ \boldsymbol{X}_1, \boldsymbol{X}_2, \dots \boldsymbol{X}_n \} \sim \mathbb{P}$`
  - A fixed probability measure, i.e., observed i.i.d.
  - A random field, e.g., Poisson Process

--
* A simplicial complex, `$\mathcal{K}(\mathbb{X}_n,r)$`, is a random-variable measurable w.r.t. `$\mathbb{P}^{\otimes n}$`

--
* `$\mathbf S : \mathcal X^n \rightarrow \mathcal S$` is a topological summary, e.g., `$\beta_k \left( \mathcal{K}(\mathbb{X}_n,r) \right) : \mathcal{X}^n \rightarrow \mathbb{N}$`

--

* What are the properties of these **.purple[random]** topological summaries?
`\begin{align}
\text{(LLN)} & & \lim\limits_{n\rightarrow \infty}\frac{1}{n}\beta_k\left( \mathcal{K}(\mathbb{X}_n,r) \right) = \color{red}{\gamma_k(\mathbb{P})} \ \ \text{a.s.} \hspace{2cm}\\ \\
\text{(CLT)} & & \lim\limits_{n\rightarrow \infty}\frac{\beta_k\left( \mathcal{K}(\mathbb{X}_n,r) \right) - \mathbb{E}(\beta_k\left( \mathcal{K}(\mathbb{X}_n,r) \right))}{\sqrt{n}}  \sim \color{red}{\mathcal{N}(0,\sigma^2)}
\end{align}`
.tiny[.caption[Bobrowski and Kahle (2018); Kahle and Meckes (2013); Yogeshwaran, Subag, and Adler (2017)]]

---
layout: true
class: center,split-three
count: false

# .left[Random Topology]

In the simplicial complex `$\mathcal{K}\left( \mathbb{X}_n, {r_n} \right)$`, `$r_n$` depends on `$n$`

.column.bg-main1[.content[
 
.center[
Dense
<img src="images/dense.gif" width="300",height="300">]
`$nr_n^d \rightarrow \infty$`
]]

.column.bg-main1[.content[
 
.center[
Sparse
<img src="images/sparse.gif" width="300",height="300">]
`$nr_n^d \rightarrow 0$`
]]

.column.bg-main1[.content[
 
.center[
Thermodynamic
<img src="images/thermodynamic.gif" width="300",height="300">]
`$nr_n^d \rightarrow t \in (0,\infty)$`
]]

---

---
count: false
class: show-100

---

---
class: show-111
count: false

---
layout: false

# Statistical Invariance of Betti Numbers

* Consider a family of distributions `$\mathcal{P} = \{\mathbb{P}_\theta : \theta \in \Theta \}$`

* Given `$\mathbb{X}^\theta_n = \{\mathbf{X}^\theta_1,\mathbf{X}^\theta_2,\dots,\mathbf{X}^\theta_n\} \sim \mathbb{P}_\theta$`, for `$\theta \in \Theta$`

.content-box-purple[
`$\mathbf{S}(\mathbb{P}_{\theta}^{\otimes n})\!:= \mathbf{S}(\mathbb{X}^\theta_n)\!= \frac{1}{n} \Big( \beta_0\big( \mathcal{K}(\mathbb{X}^\theta_n,r_n) \big), \beta_1\big( \mathcal{K}(\mathbb{X}^\theta_n,r_n) \big), \dots , \beta_d\big( \mathcal{K}(\mathbb{X}^\theta_n,r_n) \big) \Big)$`
]

* **.purple[Invariance.]** 
.center[<body> `$\bbox[20px, border: 2px solid orange]{ \text{For } \theta_1,\theta_2 \in \Theta, \text{ can we have that } \lim\limits_{n\rightarrow \infty}\mathbf{S}(\mathbb{P}_{\theta_1}^{\otimes n}) {=} \lim\limits_{n\rightarrow \infty}\mathbf{S}(\mathbb{P}_{\theta_2}^{\otimes n}) \text{ ? } }$`</body>]

--
* **Example (1).**

Consider `$\color{red}{\mathcal{P} = \{ \mathcal{N}(\theta,\mathbf{I}_d) : \theta \in \mathbb{R}^d \}}$` and `$\color{green}{\mathbf{S}(\mathbb{X}_n) = \bar{\mathbf{X}}_n}$`
  
      `$\hspace{3cm} \lim\limits_{n\rightarrow \infty}\bar{\mathbf{X}}^{\theta_1}_n = \theta_1 \neq \theta_2 = \lim\limits_{n\rightarrow \infty}\bar{\mathbf{X}}^{\theta_2}_n$`

---
layout: false
count: false

# Statistical Invariance of Betti Numbers

* Consider a family of distributions `$\mathcal{P} = \{\mathbb{P}_\theta : \theta \in \Theta \}$`

* Given `$\mathbb{X}^\theta_n = \{\mathbf{X}^\theta_1,\mathbf{X}^\theta_2,\dots,\mathbf{X}^\theta_n\} \sim \mathbb{P}_\theta$`, for `$\theta \in \Theta$`

* **Example (2).**

Consider `$\color{red}{\mathcal{P} = \{ \mathcal{N}(\mathbf{0},\boldsymbol{\theta}) : \boldsymbol{\theta} \in \mathcal{S^d_{++}} \}}$` and `$\color{green}{\mathbf{S}(\mathbb{X}_n) = \bar{\mathbf{X}}_n}$`
  
      `$\hspace{3.5cm} \lim\limits_{n\rightarrow \infty}\bar{\mathbf{X}}^{\theta_1}_n = 0 = \lim\limits_{n\rightarrow \infty}\bar{\mathbf{X}}^{\theta_2}_n$`

---
layout: false
count: false

# Statistical Invariance of Betti Numbers

* Consider a family of distributions `$\mathcal{P} = \{\mathbb{P}_\theta : \theta \in \Theta \}$`

* Given `$\mathbb{X}^\theta_n = \{\mathbf{X}^\theta_1,\mathbf{X}^\theta_2,\dots,\mathbf{X}^\theta_n\} \sim \mathbb{P}_\theta$`, for `$\theta \in \Theta$`

* **Example (3).**

Consider `$\color{red}{\mathcal{P} = \{ \mathcal{N}(\mathbf{0},\boldsymbol{\theta}) : \boldsymbol{\theta} \in \mathcal{S^d_{++}} \}}$` and `$\color{green}{\mathbf{S}(\mathbb{X}_n) = \text{Cov}(\mathbb{X}_n)}$`
  
      `$\hspace{2cm} \lim\limits_{n\rightarrow \infty}\text{Cov}(\mathbb{X}^{\theta_1}_n) = \boldsymbol{\theta}_1 \neq \boldsymbol{\theta}_2 = \lim\limits_{n\rightarrow \infty}\text{Cov}(\mathbb{X}^{\theta_2}_n)$`

---
layout: false
count: false

# Statistical Invariance of Betti Numbers

* Consider a family of distributions `$\mathcal{P} = \{\mathbb{P}_\theta : \theta \in \Theta \}$`

* Given `$\mathbb{X}^\theta_n = \{\mathbf{X}^\theta_1,\mathbf{X}^\theta_2,\dots,\mathbf{X}^\theta_n\} \sim \mathbb{P}_\theta$`, for `$\theta \in \Theta$`

As `$n \rightarrow \infty$` and `$nr_n^d \rightarrow t$`, the **.purple[thermodynamic limit]** is the functional 
`\begin{align}
\mathbf{S}(\mathbb{P}_\theta; t) = \lim_{n \rightarrow\infty}\mathbf{S}(\mathbb{P}_{\theta}^{\otimes n})
\end{align}`
.content-box-purple[
**.purple[Definition.]**  `$\mathcal{P}$` admits `$\beta$`-equivalence if `$\mathbf S(\mathbb{P}_\theta; t) = \eta(t)$` for all `$\theta \in \Theta$`
]

---
layout: false
class: left

# Invariance via Topological Groups

.small[
* `$\mathcal{G} = \{ g_\theta : \theta \in \Theta \}$` is a group acting bijectively
* `$T$` is `$\color{red}{\mathcal{G}}$`.red[-maximal invariant] if it is constant **only** on orbits i.e., `$T(\mathbf{x}) = T(\mathbf{y})$` **iff** `$\mathbf{y} \in \mathcal{G}\mathbf{x}$` 
]

--
.small[
Let `$\color{purple}{f_\theta(x)\!:=\!\xi\big( g_\theta\!\circ\!\Psi(x) \big)}$` where `$\Psi:\mathcal{X} \rightarrow \mathcal{Y}$` is differentiable and `$\xi$` ensures `$f_\theta$` is a pdf
]

--
.center[
`$\mathbf{X_\theta} \sim f_\theta \ \ \  \longrightarrow  \ \ \ Z_\theta = f_\theta(\mathbf X_\theta) \ \ \ \longrightarrow \ \ \ f_{Z_\theta}$` is `$\mathcal G$`-invariant
]

---
layout: false
class: left
count: false

# Invariance via Topological Groups

.small[
Let `$\color{purple}{f_\theta(x)\!:=\!\xi\big( g_\theta\!\circ\!\Psi(x) \big)}$` where `$\Psi:\mathcal{X} \rightarrow \mathcal{Y}$` is differentiable and `$\xi$` ensures `$f_\theta$` is a pdf
]
.content-box-purple[.small[
**.purple[Theorem.]** `$\mathcal{P}$` admits `$\beta$`-equivalence **IFF**  `$\exists \hspace{0.1cm}\zeta \text{  }$` such that `$\text{  det}(J_{\Psi^{-1}}(y)) = \zeta(T(y))$`
]]

---
layout: false
class: left
count: false

# Invariance via Topological Groups (.purple[Example])

.content-box-red[.small[
$$
f_\theta(x_1,x_2) = \big( \cos(\theta) \Phi^{-1}(x_1) + \sin(\theta )\Phi^{-1}(x_2) \big)^2 \hspace{0.5cm} \mathbf{1}(0 \le x_1,x_2 \le 1)
$$
]
]

--
* .footnotesize[ <body> `$\mathcal{X} = [0,1]^2, \hspace{0.2cm} \mathcal{Y} = \mathbb{R}^2,$` and `$\Psi : \mathcal{X} \rightarrow \mathcal{Y}$` such that `$(x_1,x_2) \mapsto (\Phi^{-1}(x_1),\Phi^{-1}(x_2))$` </body> ]

* .footnotesize[ <body> `$\mathcal G = S\mathcal{O}(2)$` for which `$g_\theta = {\begin{pmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{pmatrix}}$` and `$T(\mathbf{y}) = ||\mathbf{y}||$` </body> ]

--
* .footnotesize[ <body> Then `$\bbox[2pt,#ff9166]{f_\theta(x_1,x_2) = \xi(g_\theta \circ \Psi(x_1,x_2))}$`, where `$\xi:\mathcal{Y} \rightarrow \mathbb{R}$` is given by `$\xi(\mathbf{y}) = \big( (1,0)^\top\mathbf{y} \big)^2$` </body> ]

--
* .footnotesize[ <body> **.red[Jacobian condition:]** For `$\mathbf{y} = (y_1,y_2) \in \mathcal{Y},$` observe `$\Psi^{-1}(y_1,y_2) = (\Phi(y_1),\Phi(y_2))$` .center[ <body> `$\text{det}(J_{\Psi^{-1}}(\mathbf{y})) = \phi(y_1)\cdot\phi(y_2) = \exp\Big( -\frac{y_1^2}{2} -\frac{y_2^2}{2} \Big) = \exp\Big( -\frac{1}{2}T(\mathbf{y})^2 \Big)$` </body> ] </body> ]

.content-box-purple[.small[
**.purple[Theorem.]** `$\mathcal{P}$` admits `$\beta$`-equivalence **IFF**  `$\exists \hspace{0.1cm}\zeta \text{  }$` such that
`$\text{  det}(J_{\Psi^{-1}}(y)) = \zeta(T(y))$`
]]

---
layout: false
class: left
count: false

# Invariance via Topological Groups (.purple[Example])

.content-box-red[.small[
$$
f_\theta(x_1,x_2) = \big( \cos(\theta) \Phi^{-1}(x_1) + \sin(\theta )\Phi^{-1}(x_2) \big)^2 \hspace{0.5cm} \mathbf{1}(0 \le x_1,x_2 \le 1)
$$
]
]

---
layout: false

# Invariance via Excess Mass

- For `$\mathbb{P}$` with pdf `$f$`, the **.purple[excess mass function]** is given by 
`\begin{align}
\hat{f}(t) := \mathbb{P}\big( \{ \mathbf{x} \in \mathcal{X} : f(\mathbf{x}) \ge t \} \big)
\end{align}`

---
layout: false
count: false

# Invariance via Excess Mass

---
layout: false
count: false

# Invariance via Excess Mass

---
layout: false
count: false

# Invariance via Excess Mass (.purple[Example])

.small[
Given a density `$g$` on `$\mathbb{R}_+$` and `$\Theta = \{ (a,b) : \frac{1}{a} + \frac{1}{b} = 1 \}$`, define `$f_\theta$` on `$\mathcal{X} = \mathbb{R}$` by
]

.content-box-red[.small.center[
`${f_\theta(x) = \color{red}{g(-bx) \mathbf{1}(x < 0)} + \color{blue}{g(ax) \mathbf{1}(x \ge 0)}}$`
]]

---
layout: false
count: false

# Invariance via Excess Mass (.purple[Nuts & Bolts])

`$\mathscr{X}=(\mathcal X, \pi, \mathcal Y, \mathcal Z)$` is a **smooth fiber bundle** with local trivialization `$\{ U_\alpha,\psi_\alpha \}$`

---
layout: false

# Verifying `$\beta$`-equivalence

* Consider a family of distributions `$\mathcal{P} = \{\mathbb{P}_\theta : \theta \in \Theta \}$`

* Can we check if a given `$\mathcal P$` admits `$\beta$`-equivalence?

.content-box-purple[
**.purple[Theorem.]**
If `$\Theta$` contains an open set in `$\mathbb R^p$`, then `$\mathcal P$` admits `$\beta$`-equivalence **if and only if** for all `$k \in \mathbb N$`
`\begin{align}
\big\langle S_\theta,f_\theta^k \big\rangle_{L_2(\mathcal X)} = \mathbf 0,
\end{align}`
where `$f_\theta$` is the density of `$\mathbb P_\theta$` and `$S_\theta=\nabla_\theta \log f_\theta$` is the **score function**
]

---
layout:false
class: inverse, center, middle
count: false

# Statistics + TDA

## .orange[Robust Topological Inference] <a href="https://arxiv.org/abs/2006.10012">.half.orange[Vishwanath, Fukumizu, Kuriki, and Sriperumbudur (2020b)] </a>

---
class: center
# .left[Persistence Diagram]

<img src="images/2d_surface2.svg" width="450",height="450">
 
`$\mathbb{X}_n$` is sampled from `$2p_z$` orbital

---
class: center

# .left[Underlying Philosophy]

Number of connected components

---
class: center
count: false

# .left[Underlying Philosophy]

Number of non-trivial loops

---
class: center
count: false

# .left[Underlying Philosophy]

Number of non-trivial holes

---
class: center
count: false

# .left[Underlying Philosophy]

* Points close to the diagonal are indicative of noisy topological features

* Points away from the diagonal are indicative of true topological features

---
class: center

# .left[Can you guess the shape?]

---
class: center
count: false

# .left[Can you guess the shape?]

---
class: center
count: false

# .left[Can you guess the shape?]

---
class: center
# .left[Robustness]

`$\mathbb{X}_n = \{\mathbf{X}_1,\mathbf{X}_2,\dots,\mathbf{X}_n\} \sim \mathbb{P}_{_{\mathcal{M}}}$`

.pull-left[
<img src="images/circ1.svg" width="420",height="420">
 
`$\mathbb{X}_n$` from .green[true] signal on `$\mathbb{S}^1$`
]

.pull-left[
<img src="images/c1_dgm.svg" width="400",height="400">
 
`$\textbf{Dgm}(\mathbb{X}_n)$`
]

---
class: center
count: false

# .left[Robustness]

`$\mathbb{X}_n = \{\mathbf{X}_1,\mathbf{X}_2,\dots,\mathbf{X}_n\} \sim \mathbb{P}_{_{\mathcal{M}}} \star \mathbb{N}_{\sigma}$`

.pull-left[
<img src="images/circ2.svg" width="420",height="420">
 
`$\mathbb{X}_n$` from .orange[perturbed] signal on `$\mathbb{S}^1$`
]

.pull-left[
<img src="images/c2_dgm.svg" width="400",height="400">
 
`$\textbf{Dgm}(\mathbb{X}_n)$`
]

---
class: center
count: false

# .left[Robustness]

`$\mathbb{X}_n = \{\mathbf{X}_1,\mathbf{X}_2,\dots,\mathbf{X}_n\} \sim (1-\pi)\cdot\left(\mathbb{P}_{\hspace{-10pt}_{\mathcal{M}}} \star \mathbb{N}_{\sigma}\right) + \pi \cdot \mathbb{Q}_{_{\mathcal{X}}}$`

.pull-left[
<img src="images/circ3.svg" width="420",height="420">
 
`$\mathbb{X}_n$` from .red[noisy] signal on `$\mathbb{S}^1$`
]

.pull-left[
<img src="images/c3_dgm.svg" width="400",height="400">
 
`$\textbf{Dgm}(\mathbb{X}_n)$`
]

---
class: center

# .left[Confidence Sets]

Given two point clouds `$\mathbb{X}_n$` and `$\mathbb{Y}_m$` and `$\textbf{D}_1 = \textbf{Dgm}(\mathbb{X}_n)$`, `$\textbf{D}_2 = \textbf{Dgm}(\mathbb{Y}_m)$`

`$W_\infty(\textbf{D}_1,\textbf{D}_2) = \inf\limits_{\gamma: \textbf{D}_1 \rightarrow \textbf{D}_2}\sup\limits_{x \in \textbf{D}_1} || x - \gamma(x) ||_\infty$`

---
class: center
count: false

# .left[Confidence Sets]

A value `$d_n$` such that `$\mathbb{P}^{\otimes n}\left( W_\infty(\hat{D}_n,\mathbb{E}(\hat{D}_n) ) \ge d_n \right) \le \alpha$`

.center[<img src="images/bootstrap2.svg" width="350">]
.center[.caption[Fasy, Lecci, Rinaldo, Wasserman, Balakrishnan, and Singh (2014)]]

---

# .left[Stable Topological Summaries]

* Given a sample `$\mathbb{X}_n \subset \mathbb{R}^d$` and a **filter** function `$f : \mathbb{R}^d \rightarrow \mathbb{R}$`

* `$\textbf{Dgm}\left( \text{Sup}(\mathbb{X}_n,f) \right)$` constructed from **superlevel** sets are more stable

---

# .left[Robust Persistence Diagrams]

- Given a reproducing kernel `$K_\sigma$` with RKHS `$\mathcal{H}_\sigma$`, the **.blue[KDE]** is given by
`\begin{align}
f^n_{\sigma} := \frac{1}{n}\sum_{i=1}^{n}K_\sigma(\cdot,\mathbf{X}_i) = \mathop{\text{arginf}}_{g \in \mathcal{H}_{\sigma}}\int_{\mathbb R^d}|| g -  K_\sigma(\cdot,\boldsymbol{y})||_{\mathcal H_\sigma}^2 d\mathbb{P}_n(\boldsymbol y)
\end{align}`
--

- Given a robust loss `$\rho: \mathbb R_+ \rightarrow \mathbb R_+$` the **.blue[robust KDE]** is given by
`\begin{align}
f^n_{\rho,\sigma} :=  \mathop{\text{arginf}}_{g \in \mathcal{H}_{\sigma}}\int_{\mathbb R^d} \rho \big( || g -  K_\sigma(\cdot, \boldsymbol{y})||_{\mathcal H_\sigma} \big) d\mathbb{P}_n(\boldsymbol y)
\end{align}`
--

- The **.red[robust persistence diagram]** is then given by `$\textbf{Dgm}(f^n_{\rho,\sigma})$`
--

- Better .purple[influence function] `$\hspace{-0.3cm}$` `$^*$`, optimal rates and uniform confidence band:
 
`\begin{align}
\sup_{\mathbb P \in \mathcal{M}(\mathbb R^d)}\mathbb{P}^{\otimes n}\Big( W_\infty\big(\textbf{Dgm}(f^n_{\rho,\sigma}),\textbf{Dgm}(f_{\rho,\sigma}) \big) \ge d_n \Big) \le \alpha
\end{align}`
 
 `$\hspace{1.5cm}$` where `$d_n = O(n^{-1/2})$` if the kernel is sufficiently smooth.

.tiny.purple[<body> `$^*$` Influence function is generalized as the .bold[metric derivative] of the `$W_\infty(\cdot,\cdot)$` Wasserstein metric along the curve `$(1-\epsilon)\mathbb P + \epsilon \delta_{\boldsymbol x}$`</body>]

---

# Summary

* TDA is a new and exciting area for statistical methodology

* TDA formalizes EDA using machinery from algebraic topology

* Applications include neuroscience, astrophysics, proteomics, etc.

### Open problems

* Clustering, dimension reduction and compressed sensing

* Robust **and efficient** topological inference

* Generalizations of nonparametric two-sample tests 
 * Multivariate distributions?
 * Distributions supported on manifolds?

---
layout: false
# References

[1] O. Bobrowski et al. "Topology of random geometric complexes: A
survey". In: _Journal of Applied and Computational Topology_ 1.3-4
(2018), pp. 331-–364.

[2] V. De Silva et al. "Coverage in sensor networks via persistent
homology". In: _Algebraic & Geometric Topology_ 7.1 (2007), pp.
339-358.

[3] B. T. Fasy et al. "Confidence sets for persistence diagrams". In:
_The Annals of Statistics_ 42.6 (2014), pp. 2301-2339.

[4] M. Kahle et al. "Limit theorems for Betti numbers of random
simplicial complexes". In: _Homology Homotopy Appl._ 15.1 (2013), pp.
343-374.

[5] S. Vishwanath et al. "Robust Persistence Diagrams using Reproducing
Kernels". In: _arXiv preprint arXiv:2006.10012_ (2020).

[6] S. Vishwanath et al. "Statistical Invariance of Betti Numbers in
the Thermodynamic Regime". In: _arXiv preprint arXiv:2001.00220_
(2020).

[7] D. Yogeshwaran et al. "Random geometric complexes in the
thermodynamic regime". In: _Probability Theory and Related fields_
167.1-2 (2017), pp. 107-142.

---
class: inverse, center, middle

# Thank you!

---
class: inverse, center, middle

# Appendix

---

# .left[Machine Learning]

* Deep learning:
 - **Learning with topological features** : Hofer, Kwitt, Niethammer, and Uhl (2017)
 - **Complexity Measure** : Rieck, Togninalli, Bock, Moor, Horn, Gumbsch, and Borgwardt (2018)
 - **Generative Adversarial Learning** : Khrulkov and Oseledets (2018)

* Linear machine learning models:
 - **RKHS embedding of diagrams** : Hiraoka, Shirai, and Trinh (2018); Kusano, Hiraoka, and Fukumizu (2016)
 - **Regularization for classification boundary**: Chen, Ni, Bai, and Wang (2019)
 - **Gradient Descent and Backpropagation**: Leygonie, Oudot, and Tillmann (2019)

* Applications:
  - **Neuroscience**: Nielson, Paquette, Liu, Guandique, Tovar, Inoue, Irvine, Gensel, Kloke, and Petrossian (2015)
  - **Astrophysics**: Adler, Agami, and Pranav (2017)
  - ...