Robust Persistence Diagrams

class: inverse, left, title-slide, middle
count: false

# .small[Robust Persistence Diagrams .orange[Using Reproducing Kernels]] .scriptsize[.scriptsize.black[Siddharth Vishwanath, Kenji Fukumizu, Satoshi Kuriki and Bharath Sriperumbudur]]
#

---
class: inverse, left, title-slide, middle
count: false

## .center[.dblue[NeurIPS 2020]]
 
## .text-shadow.center[.black[Poster: .red[\# 18998]]] .text-shadow.center[.black[Paper: .red[\# 11936]]]
 
---
count: true

# Introduction

* Given `$\mathbb{X}_n = \{ \boldsymbol X_1,\boldsymbol X_2, \dots , \boldsymbol X_n \} \subset \mathbb R^d$`, the shape of `$\mathbb{X}_n$` at resolution `$\epsilon > 0$` is encoded in .dblue[ <body> its nerve `$\color{#1E90FF}{\mathcal{K}(\mathbb{X}_n,\epsilon)}$` </body>]

---
count: false

# Introduction

* As `$\epsilon \uparrow$` new topological features are born 
---
count: false

# Introduction

* As `$\epsilon \uparrow$` new topological features are born **and/or** existing topological features die

---
count: false

# Introduction

.left[<img src="images/examples/fg1.svg">]

* The **multiscale** information for the birth and death of topological features is captured in `$\textbf{Dgm}\big({\mathcal{K}(\mathbb X_n)}\big)$`

---
layout: false

# Superlevel Persistent Homology

* Given a filter `$\phi_n: \mathbb R^d \rightarrow \mathbb R$`, constructed using `$\mathbb X_n$`,  the shape of `$\mathbb X_n$` is encoded in the *superlevel sets* `$\phi_n^{-1}\Big( [\epsilon, \infty) \Big)$`

---
count: false
layout: false

# Superlevel Persistent Homology

* As `$\epsilon \downarrow$` new features are born

---
count: false
layout: false

# Superlevel Persistent Homology

* As `$\epsilon \downarrow$` new features are born **and/or** existing features die

---
count: false
layout: false

# Superlevel Persistent Homology

* As `$\epsilon \downarrow$` new features are born **and/or** existing features die. .content-box-red[<body> The resulting persistence diagram is `$\mathbf{Dgm}(\phi_n) = \mathbf{Dgm}(\text{Sup}(\phi_n))$` </body>]

---

# Statistical Setting

* Given `$\mathbb X_n$` sampled i.i.d. from `$\mathbb P$`, the filter `$\phi_n$` is a **.red[.text-shadow[random]]** function. The population analogue `$\phi_{\mathbb P}$` encodes the topology of `$\mathbb P$`

* .content-box-red[<body> Consequently, `$\mathbf{Dgm}(\phi_n) \in (\mathfrak{D}, W_{\infty})$` is a .red[**random**] persistence diagram </body>]

### Some common examples include:

.pull-left[
#### `$\hspace{2cm}$` 1. Distance function
.content-box-grey[<body> 
$$
d_{n}(\boldsymbol x)= \mathop{\text{inf}}_{\boldsymbol X_i \in \mathbb X_n} || \boldsymbol{x -X_i} ||_2
$$ 
</body>]
]
.pull-right[
<img src="images/filtrations/distfct.svg">
]

---
count: false

# Statistical Setting

* .content-box-red[<body> Consequently, `$\mathbf{Dgm}(\phi_n) \in (\mathfrak{D}, W_{\infty})$` is a .red[**random**] persistence diagram </body>]

### Some common examples include:

.pull-left[
####  `$\hspace{2cm}$` 1. Distance function

#### `$\hspace{2cm}$` 2. Distance-to-Measure (DTM) <a name=cite-chazal2017robust></a><a name=cite-chazal2011geometric></a><a name=cite-biau2011weighted></a>[[1](#bib-chazal2017robust), [2](#bib-chazal2011geometric), [3](#bib-biau2011weighted)]
.content-box-grey[<body> 
$$
d^2_{m,n}(\boldsymbol x) = \frac{1}{k}\sum\limits_{\boldsymbol{X_i} \in \text{NN}_k(\boldsymbol{x})} || \boldsymbol x - \boldsymbol X_i ||_2^2
$$
.center[ <body> where `$0 < m < 1$` and `$k = \lfloor mn \rfloor$` </body> ] </body>]
]
.pull-right[
<img src="images/filtrations/dtm.svg">
]

---
count: false

# Statistical Setting

* .content-box-red[<body> Consequently, `$\mathbf{Dgm}(\phi_n) \in (\mathfrak{D}, W_{\infty})$` is a .red[**random**] persistence diagram </body>]

### Some common examples include:

.pull-left[

####  `$\hspace{2cm}$` 1. Distance function

####  `$\hspace{2cm}$` 2. Distance-to-Measure (DTM) [[1](#bib-chazal2017robust), [2](#bib-chazal2011geometric), [3](#bib-biau2011weighted)]

#### `$\hspace{2cm}$` 3. Kernel Density Estimator (KDE) <a name=cite-fasy2014confidence></a><a name=cite-phillips2015geometric></a>[[1](#bib-chazal2017robust), [4](#bib-fasy2014confidence), [5](#bib-phillips2015geometric)]
.content-box-grey[<body> 
$$
\displaystyle{f_\sigma(\boldsymbol x) = \frac{1}{n}\sum\limits_{i=1}^{n}k_\sigma(\boldsymbol x, \boldsymbol X_i)}
$$
.center[ <body> where `$\sigma > 0$` and `$K_{\sigma}(\cdot, \cdot)$` is a kernel </body> ] </body>]
]
.pull-right[
<img src="images/filtrations/kde.svg">
]

---

# Motivation: Stability `$\neq$` Robustness

.blockquote[<body> **.dred[Stability of Persistence diagrams <a name=cite-cohen2007stability></a><a name=cite-chazal2016structure></a>[[6](#bib-cohen2007stability), [7](#bib-chazal2016structure)]:]** For two tame filter functions `$f$` and `$g$`, `$W_{\infty}\Big( \textbf{Dgm}(f), \textbf{Dgm}(g) \Big) \le ||f-g||_{\infty}$` </body>]
--

.pull-left[
<img src="images/examples/signal.svg", height="250">
<img src="images/examples/signal-dgm.svg", height="250">
]
.pull-right[

]

.center[Persistence diagrams are stable w.r.t. small perturbations]

---
count:false

# Motivation: Stability `$\neq$` Robustness

.blockquote[<body> **.dred[Stability of Persistence diagrams [[6](#bib-cohen2007stability), [7](#bib-chazal2016structure)]:]** For two tame filter functions `$f$` and `$g$`, `$W_{\infty}\Big( \textbf{Dgm}(f), \textbf{Dgm}(g) \Big) \le ||f-g||_{\infty}$` </body>]

.pull-left[
<img src="images/examples/signal.svg", height="250">
<img src="images/examples/signal-dgm.svg", height="250">
]

.pull-right[
<img src="images/examples/X.svg", height="250">
<img src="images/examples/X-dgm.svg", height="250">
]

.center[Even a few **.red[outliers]** can dramatically change the inference]

.content-box-red.center[
Can we construct outlier-robust persistence diagrams without compromising on statistical efficiency?
]

---

# Robust Persistence Diagrams

* `$K_\sigma$` is a radial kernel and `$\mathcal{H}_\sigma$` is its reproducing kernel Hilbert space

* Given `$\mathbb X_n$` and a robust loss `$\rho: \mathbb{R}_+ \rightarrow \mathbb R_+$` the robust KDE <a name=cite-kim2012robust></a>[[8](#bib-kim2012robust)] is given by

.content-box-grey[<body> 
$$
f^n_{\rho,\sigma} = \mathop{\text{arg inf}}\limits_{g \in \mathcal{H}_\sigma}\displaystyle\int\limits_{\mathbb R^d}\rho\Big( || g - K_\sigma(\cdot,\boldsymbol{y}) ||_{\mathcal{H}_\sigma} \Big) \ d\mathbb{P}_n(\boldsymbol{y}) = \sum\limits_{i=1}^n w_\sigma(\boldsymbol{X_i}) K_\sigma(\cdot,\boldsymbol{X_i})
$$ </body>]

* .red.text-shadow[<body> `$w_\sigma$` is a weight function </body>] which weighs-down extreme outliers. Can be solved via iteratively reweighted least squares

* The population counterpart is

.content-box-grey[<body> 
$$
f_{\rho,\sigma} = \mathop{\text{arg inf}}\limits_{g \in \mathcal{H}_\sigma}\displaystyle\int\limits_{\mathbb R^d}\rho\Big( || g - K_\sigma(\cdot,\boldsymbol{y}) ||_{\mathcal{H}_\sigma} \Big) \ d\mathbb{P}(\boldsymbol{y})
$$ </body>]

.center.content-box-red[<body> `$\textbf{Dgm}(f^n_{\rho,\sigma}) = \textbf{Dgm}\big( \text{Sup}(f^n_{\rho,\sigma}) \big)$` is the robust persistence diagram </body>]

---

# Sensitivity analysis in the space of persistence diagrams

* Given a statistical functional `$T: \mathbb P \mapsto (V,||\cdot||_V)$` and `$\boldsymbol x \in \mathbb R^d$`, the **.dblue[influence function]** measures the sensitivity of `$T$` outliers:

.center[<body> `$\text{IF}(T; \mathbb P, \boldsymbol x) = \lim\limits_{\epsilon \rightarrow 0}\frac{1}{\epsilon}\Big( T(\mathbb{P}^{\epsilon}_{\boldsymbol x}) - T(\mathbb P) \Big)$` where `$\mathbb{P}^{\epsilon}_{\boldsymbol x}$` is a perturbation of `$\mathbb P$` by adding `$\epsilon$` amount of mass at `$\boldsymbol{x}$` </body>]

* But the space of diagrams, `$(\mathfrak{D}, W_{\infty})$`, is not a normed space 🙁 `$\hspace{1cm}$`
--
 `$\hspace{1cm}$` Generalize via metric derivative 🙂

--
* Given a filter `$\phi_{\mathbb P}$` and `$\boldsymbol{x} \in \mathbb R^d$`, the **.dblue.text-shadow[persistence influence]** is given by

`\begin{align}
\Psi\Big( \phi_{\mathbb P}; \mathbb{P}, \boldsymbol{x}  \Big) \stackrel{\cdot}{=}  \lim\limits_{\epsilon \rightarrow 0}\frac{1}{\epsilon} W_{\infty}\Big( \textbf{Dgm}\big(\phi_{\mathbb{P}^{\epsilon}_{\boldsymbol x}}\big), \textbf{Dgm}\big( \phi_{\mathbb P} \big) \Big)
\end{align}`

--
.blockquote[**.dred[Theorem:]** For a robust loss `$\rho$` and `$\sigma, m > 0$`:

1. **Robust KDE:** `$\Psi\Big( f_{\rho,\sigma}; \mathbb{P}, \boldsymbol{x}  \Big) \le \sigma^{-d/2} \color{purple}{w_{\sigma}(\boldsymbol x)} || K(\cdot,\boldsymbol{x}) - f_{\rho,\sigma} ||_{\mathcal{H}_\sigma}$`

1. **KDE:** `$\hspace{1.5cm}$` `$\Psi\Big( f_{\sigma}; \mathbb{P}, \boldsymbol{x}  \Big) \le \sigma^{-d/2} || K(\cdot,\boldsymbol{x}) - f_{\sigma} ||_{\mathcal{H}_\sigma}$`

1. **Distance-to-Measure:** `$\Psi\Big( d_{m}; \mathbb{P}, \boldsymbol{x}  \Big) \le \frac{2}{\sqrt{m}} \sup \Big\{ \big\vert f(\boldsymbol x) - \int f(\boldsymbol y)d\mathbb{P}(\boldsymbol{y})  \big\vert : || \nabla f ||_{L_2(\mathbb P)} \le 1 \Big\}$`
]

.tiny[<body> `$\color{purple}{w_{\sigma}(\boldsymbol x)}$` weighs down outliers </body>]

---

# Is it Robust in practice?

.pull-left[
<img src="images/influence/bottleneck.svg", height="500">
]
.pull-right[
<img src="images/influence/dtm-bottleneck.svg", height="500">
]

---

# Statistical Efficiency : "Gain with no pain''

* **.text-shadow.dred[Consistency:]** Given `$\mathbb X_n$` from `$\mathbb P$` with density `$f$` and `$\sigma > 0$`, then `$W_{\infty}\Big( \textbf{Dgm}\big(f^n_{\rho,\sigma}\big), \textbf{Dgm}\big( f_{\rho,\sigma} \big) \Big) \stackrel{p}{\longrightarrow} 0$` `$\hspace{0.5cm}$` as `$n \rightarrow \infty$`

* Furthermore, `$W_{\infty}\Big( \textbf{Dgm}\big(f_{\rho,\sigma}\big), \textbf{Dgm}\big( f \big) \Big) \longrightarrow 0$` `$\hspace{0.5cm}$` as `$\sigma \rightarrow 0$`

.blockquote[**.dred[Theorem:]** For `$a_\sigma > 1$` and `$p \in (0,1)$`, if the entropy numbers of the kernel `$K_\sigma$` satisfy `$e_n(\mathcal H_\sigma) \le a_\sigma n^{-{1}/{2p}}$`, then with probability `$\ge 1- \alpha$` and uniformly over `$\mathbb P$`

`\begin{align}
W_{\infty}\Big( \textbf{Dgm}\big(f^n_{\rho,\sigma}\big), \textbf{Dgm}\big( f_{\rho,\sigma} \big) \Big) \le 2C \sigma^{-d/2} \Bigg( \xi(n,p) + \sqrt{\frac{2\log(1/\alpha)}{n}} \Bigg)
\end{align}`
where
`\begin{align}
  \xi(n,p) = \begin{cases} \mathcal{O}(n^{-1/2}) & \text{ if } p \in (0,\frac{1}{2})\\
  \mathcal{O}(n^{-1/2} \log n) & \text{ if } p = \frac{1}{2}\\
  \mathcal{O}(n^{-1/4p}) & \text{ if } p \in (\frac{1}{2},1)
  \end{cases}
\end{align}`

]

---
count: false

# Statistical Efficiency : "Gain with no pain''

---

# References

<a name=bib-chazal2017robust></a>[[1]](#cite-chazal2017robust) F.
Chazal et al. "Robust topological inference: Distance to a measure and
kernel distance". In: _Journal of Machine Learning Research_ 18.1
(2017), pp. 5845-5884.

<a name=bib-chazal2011geometric></a>[[2]](#cite-chazal2011geometric) F.
Chazal et al. "Geometric inference for probability measures". In:
_Foundations of Computational Mathematics_ 11.6 (2011), pp. 733-751.

<a name=bib-biau2011weighted></a>[[3]](#cite-biau2011weighted) G. Biau
et al. "A weighted k-Nearest Neighbor density estimate for geometric
inference". In: _Electronic Journal of Statistics_ 5 (2011), pp.
204-237.

<a name=bib-fasy2014confidence></a>[[4]](#cite-fasy2014confidence) B.
T. Fasy et al. "Confidence sets for persistence diagrams". In: _The
Annals of Statistics_ 42.6 (2014), pp. 2301-2339.

<a
name=bib-phillips2015geometric></a>[[5]](#cite-phillips2015geometric)
J. M. Phillips et al. "Geometric Inference on Kernel Density
Estimates". In: _31st International Symposium on Computational
Geometry_. 2015.

<a name=bib-cohen2007stability></a>[[6]](#cite-cohen2007stability) D.
Cohen-Steiner et al. "Stability of persistence diagrams". In: _Discrete
& Computational Geometry_ 37.1 (2007), pp. 103-120.

<a name=bib-chazal2016structure></a>[[7]](#cite-chazal2016structure) F.
Chazal et al. _The Structure and Stability of Persistence Modules_. AG,
CH: Springer, 2016.

---

class: center, middle

## .dblue[Thank you!]
### .red.footnotesize[ .black.large.bold.text-shadow[Poster: ] .large[\# 18998 ]]
### .red.footnotesize[ <body> .black.large.bold.text-shadow[Code: ] ] <a href="https://github.com/sidv23/robust-PDs"> `$\bbox[2pt,white]{\color{red}{\texttt{github.com/sidv23/robust-PDs}}}$` </a>
### .red.footnotesize[ <body> .black.large.bold.text-shadow[Slides: ] ] <a href="https://sidv23.github.io/slides/neurips-2020"> `$\bbox[2pt,white]{\color{red}{\texttt{sidv23.github.io/slides/neurips-2020}}}$` </a>
### .black.footnotesize[Looking forward to discussions and questions during the poster session!]