class: inverse, left, title-slide, middle count: false # .small[Robust Persistence Diagrams .orange[Using Reproducing Kernels]]<br>.scriptsize[.scriptsize.black[Siddharth Vishwanath, Kenji Fukumizu, Satoshi Kuriki and Bharath Sriperumbudur]] # <br><br><br><br><br><br><br> <img style="position: absolute; bottom: 0; left: 0; border: 0;" src="images/other/psu-logo.png" height=25%> <img style="position: absolute; bottom: 0; right: 0; border: 0;" src="images/other/ism-logo.png" height=20%> --- class: inverse, left, title-slide, middle count: false ## .center[.dblue[NeurIPS 2020]] <br><br><br> ## .text-shadow.center[.black[Poster: .red[\# 18998]]] <br> .text-shadow.center[.black[Paper: .red[\# 11936]]] <br><br><br><br><br><br><br><br><br> --- count: true # Introduction * Given `\(\mathbb{X}_n = \{ \boldsymbol X_1,\boldsymbol X_2, \dots , \boldsymbol X_n \} \subset \mathbb R^d\)`, the shape of `\(\mathbb{X}_n\)` at resolution `\(\epsilon > 0\)` is encoded in .dblue[ <body> its nerve `\(\color{#1E90FF}{\mathcal{K}(\mathbb{X}_n,\epsilon)}\)` </body>] <img src="images/examples/fg11.svg"> --- count: false # Introduction * Given `\(\mathbb{X}_n = \{ \boldsymbol X_1,\boldsymbol X_2, \dots , \boldsymbol X_n \} \subset \mathbb R^d\)`, the shape of `\(\mathbb{X}_n\)` at resolution `\(\epsilon > 0\)` is encoded in .dblue[ <body> its nerve `\(\color{#1E90FF}{\mathcal{K}(\mathbb{X}_n,\epsilon)}\)` </body>] <img src="images/examples/fg12.svg"> * As `\(\epsilon \uparrow\)` new topological features are born --- count: false # Introduction * Given `\(\mathbb{X}_n = \{ \boldsymbol X_1,\boldsymbol X_2, \dots , \boldsymbol X_n \} \subset \mathbb R^d\)`, the shape of `\(\mathbb{X}_n\)` at resolution `\(\epsilon > 0\)` is encoded in .dblue[ <body> its nerve `\(\color{#1E90FF}{\mathcal{K}(\mathbb{X}_n,\epsilon)}\)` </body>] <img src="images/examples/fg13.svg"> * As `\(\epsilon \uparrow\)` new topological features are born **and/or** existing topological features die --- count: false # Introduction * Given `\(\mathbb{X}_n = \{ \boldsymbol X_1,\boldsymbol X_2, \dots , \boldsymbol X_n \} \subset \mathbb R^d\)`, the shape of `\(\mathbb{X}_n\)` at resolution `\(\epsilon > 0\)` is encoded in .dblue[ <body> its nerve `\(\color{#1E90FF}{\mathcal{K}(\mathbb{X}_n,\epsilon)}\)` </body>] .left[<img src="images/examples/fg1.svg">] * The **multiscale** information for the birth and death of topological features is captured in `\(\textbf{Dgm}\big({\mathcal{K}(\mathbb X_n)}\big)\)` --- layout: false # Superlevel Persistent Homology * Given a filter `\(\phi_n: \mathbb R^d \rightarrow \mathbb R\)`, constructed using `\(\mathbb X_n\)`, the shape of `\(\mathbb X_n\)` is encoded in the *superlevel sets* `\(\phi_n^{-1}\Big( [\epsilon, \infty) \Big)\)` <img src="images/examples/fig21.svg"> --- count: false layout: false # Superlevel Persistent Homology * Given a filter `\(\phi_n: \mathbb R^d \rightarrow \mathbb R\)`, constructed using `\(\mathbb X_n\)`, the shape of `\(\mathbb X_n\)` is encoded in the *superlevel sets* `\(\phi_n^{-1}\Big( [\epsilon, \infty) \Big)\)` <img src="images/examples/fig22.svg"> * As `\(\epsilon \downarrow\)` new features are born --- count: false layout: false # Superlevel Persistent Homology * Given a filter `\(\phi_n: \mathbb R^d \rightarrow \mathbb R\)`, constructed using `\(\mathbb X_n\)`, the shape of `\(\mathbb X_n\)` is encoded in the *superlevel sets* `\(\phi_n^{-1}\Big( [\epsilon, \infty) \Big)\)` <img src="images/examples/fig23.svg"> * As `\(\epsilon \downarrow\)` new features are born **and/or** existing features die --- count: false layout: false # Superlevel Persistent Homology * Given a filter `\(\phi_n: \mathbb R^d \rightarrow \mathbb R\)`, constructed using `\(\mathbb X_n\)`, the shape of `\(\mathbb X_n\)` is encoded in the *superlevel sets* `\(\phi_n^{-1}\Big( [\epsilon, \infty) \Big)\)` <img src="images/examples/fig2.svg"> * As `\(\epsilon \downarrow\)` new features are born **and/or** existing features die. .content-box-red[<body> The resulting persistence diagram is `\(\mathbf{Dgm}(\phi_n) = \mathbf{Dgm}(\text{Sup}(\phi_n))\)` </body>] --- # Statistical Setting * Given `\(\mathbb X_n\)` sampled i.i.d. from `\(\mathbb P\)`, the filter `\(\phi_n\)` is a **.red[.text-shadow[random]]** function. The population analogue `\(\phi_{\mathbb P}\)` encodes the topology of `\(\mathbb P\)` * .content-box-red[<body> Consequently, `\(\mathbf{Dgm}(\phi_n) \in (\mathfrak{D}, W_{\infty})\)` is a .red[**random**] persistence diagram </body>] -- ### Some common examples include: .pull-left[ #### `\(\hspace{2cm}\)` 1. Distance function .content-box-grey[<body> $$ d_{n}(\boldsymbol x)= \mathop{\text{inf}}_{\boldsymbol X_i \in \mathbb X_n} || \boldsymbol{x -X_i} ||_2 $$ </body>] ] .pull-right[ <img src="images/filtrations/distfct.svg"> ] --- count: false # Statistical Setting * Given `\(\mathbb X_n\)` sampled i.i.d. from `\(\mathbb P\)`, the filter `\(\phi_n\)` is a **.red[.text-shadow[random]]** function. The population analogue `\(\phi_{\mathbb P}\)` encodes the topology of `\(\mathbb P\)` * .content-box-red[<body> Consequently, `\(\mathbf{Dgm}(\phi_n) \in (\mathfrak{D}, W_{\infty})\)` is a .red[**random**] persistence diagram </body>] ### Some common examples include: .pull-left[ #### `\(\hspace{2cm}\)` 1. Distance function #### `\(\hspace{2cm}\)` 2. Distance-to-Measure (DTM) <a name=cite-chazal2017robust></a><a name=cite-chazal2011geometric></a><a name=cite-biau2011weighted></a>[[1](#bib-chazal2017robust), [2](#bib-chazal2011geometric), [3](#bib-biau2011weighted)] .content-box-grey[<body> $$ d^2_{m,n}(\boldsymbol x) = \frac{1}{k}\sum\limits_{\boldsymbol{X_i} \in \text{NN}_k(\boldsymbol{x})} || \boldsymbol x - \boldsymbol X_i ||_2^2 $$ .center[ <body> where `\(0 < m < 1\)` and `\(k = \lfloor mn \rfloor\)` </body> ] </body>] ] .pull-right[ <img src="images/filtrations/dtm.svg"> ] --- count: false # Statistical Setting * Given `\(\mathbb X_n\)` sampled i.i.d. from `\(\mathbb P\)`, the filter `\(\phi_n\)` is a **.red[.text-shadow[random]]** function. The population analogue `\(\phi_{\mathbb P}\)` encodes the topology of `\(\mathbb P\)` * .content-box-red[<body> Consequently, `\(\mathbf{Dgm}(\phi_n) \in (\mathfrak{D}, W_{\infty})\)` is a .red[**random**] persistence diagram </body>] ### Some common examples include: .pull-left[ #### `\(\hspace{2cm}\)` 1. Distance function #### `\(\hspace{2cm}\)` 2. Distance-to-Measure (DTM) [[1](#bib-chazal2017robust), [2](#bib-chazal2011geometric), [3](#bib-biau2011weighted)] #### `\(\hspace{2cm}\)` 3. Kernel Density Estimator (KDE) <a name=cite-fasy2014confidence></a><a name=cite-phillips2015geometric></a>[[1](#bib-chazal2017robust), [4](#bib-fasy2014confidence), [5](#bib-phillips2015geometric)] .content-box-grey[<body> $$ \displaystyle{f_\sigma(\boldsymbol x) = \frac{1}{n}\sum\limits_{i=1}^{n}k_\sigma(\boldsymbol x, \boldsymbol X_i)} $$ .center[ <body> where `\(\sigma > 0\)` and `\(K_{\sigma}(\cdot, \cdot)\)` is a kernel </body> ] </body>] ] .pull-right[ <img src="images/filtrations/kde.svg"> ] --- # Motivation: Stability `\(\neq\)` Robustness .blockquote[<body> **.dred[Stability of Persistence diagrams <a name=cite-cohen2007stability></a><a name=cite-chazal2016structure></a>[[6](#bib-cohen2007stability), [7](#bib-chazal2016structure)]:]** For two tame filter functions `\(f\)` and `\(g\)`, `\(W_{\infty}\Big( \textbf{Dgm}(f), \textbf{Dgm}(g) \Big) \le ||f-g||_{\infty}\)` </body>] -- .pull-left[ <img src="images/examples/signal.svg", height="250"> <img src="images/examples/signal-dgm.svg", height="250"> ] .pull-right[ ] .center[Persistence diagrams are stable w.r.t. small perturbations] --- count:false # Motivation: Stability `\(\neq\)` Robustness .blockquote[<body> **.dred[Stability of Persistence diagrams [[6](#bib-cohen2007stability), [7](#bib-chazal2016structure)]:]** For two tame filter functions `\(f\)` and `\(g\)`, `\(W_{\infty}\Big( \textbf{Dgm}(f), \textbf{Dgm}(g) \Big) \le ||f-g||_{\infty}\)` </body>] .pull-left[ <img src="images/examples/signal.svg", height="250"> <img src="images/examples/signal-dgm.svg", height="250"> ] .pull-right[ <img src="images/examples/X.svg", height="250"> <img src="images/examples/X-dgm.svg", height="250"> ] .center[Even a few **.red[outliers]** can dramatically change the inference] -- .content-box-red.center[ Can we construct outlier-robust persistence diagrams without compromising on statistical efficiency? ] --- # Robust Persistence Diagrams * `\(K_\sigma\)` is a radial kernel and `\(\mathcal{H}_\sigma\)` is its reproducing kernel Hilbert space * Given `\(\mathbb X_n\)` and a robust loss `\(\rho: \mathbb{R}_+ \rightarrow \mathbb R_+\)` the robust KDE <a name=cite-kim2012robust></a>[[8](#bib-kim2012robust)] is given by .content-box-grey[<body> $$ f^n_{\rho,\sigma} = \mathop{\text{arg inf}}\limits_{g \in \mathcal{H}_\sigma}\displaystyle\int\limits_{\mathbb R^d}\rho\Big( || g - K_\sigma(\cdot,\boldsymbol{y}) ||_{\mathcal{H}_\sigma} \Big) \ d\mathbb{P}_n(\boldsymbol{y}) = \sum\limits_{i=1}^n w_\sigma(\boldsymbol{X_i}) K_\sigma(\cdot,\boldsymbol{X_i}) $$ </body>] -- * .red.text-shadow[<body> `\(w_\sigma\)` is a weight function </body>] which weighs-down extreme outliers. Can be solved via iteratively reweighted least squares * The population counterpart is .content-box-grey[<body> $$ f_{\rho,\sigma} = \mathop{\text{arg inf}}\limits_{g \in \mathcal{H}_\sigma}\displaystyle\int\limits_{\mathbb R^d}\rho\Big( || g - K_\sigma(\cdot,\boldsymbol{y}) ||_{\mathcal{H}_\sigma} \Big) \ d\mathbb{P}(\boldsymbol{y}) $$ </body>] -- .center.content-box-red[<body> `\(\textbf{Dgm}(f^n_{\rho,\sigma}) = \textbf{Dgm}\big( \text{Sup}(f^n_{\rho,\sigma}) \big)\)` is the robust persistence diagram </body>] --- # Sensitivity analysis in the space of persistence diagrams * Given a statistical functional `\(T: \mathbb P \mapsto (V,||\cdot||_V)\)` and `\(\boldsymbol x \in \mathbb R^d\)`, the **.dblue[influence function]** measures the sensitivity of `\(T\)` outliers: .center[<body> `\(\text{IF}(T; \mathbb P, \boldsymbol x) = \lim\limits_{\epsilon \rightarrow 0}\frac{1}{\epsilon}\Big( T(\mathbb{P}^{\epsilon}_{\boldsymbol x}) - T(\mathbb P) \Big)\)` where `\(\mathbb{P}^{\epsilon}_{\boldsymbol x}\)` is a perturbation of `\(\mathbb P\)` by adding `\(\epsilon\)` amount of mass at `\(\boldsymbol{x}\)` </body>] -- * But the space of diagrams, `\((\mathfrak{D}, W_{\infty})\)`, is not a normed space 🙁 `\(\hspace{1cm}\)` --
<i class="fas fa-hand-point-right faa-horizontal animated "></i>
`\(\hspace{1cm}\)` Generalize via metric derivative 🙂<br> -- * Given a filter `\(\phi_{\mathbb P}\)` and `\(\boldsymbol{x} \in \mathbb R^d\)`, the **.dblue.text-shadow[persistence influence]** is given by `\begin{align} \Psi\Big( \phi_{\mathbb P}; \mathbb{P}, \boldsymbol{x} \Big) \stackrel{\cdot}{=} \lim\limits_{\epsilon \rightarrow 0}\frac{1}{\epsilon} W_{\infty}\Big( \textbf{Dgm}\big(\phi_{\mathbb{P}^{\epsilon}_{\boldsymbol x}}\big), \textbf{Dgm}\big( \phi_{\mathbb P} \big) \Big) \end{align}` -- .blockquote[**.dred[Theorem:]** For a robust loss `\(\rho\)` and `\(\sigma, m > 0\)`: 1. **Robust KDE:** `\(\Psi\Big( f_{\rho,\sigma}; \mathbb{P}, \boldsymbol{x} \Big) \le \sigma^{-d/2} \color{purple}{w_{\sigma}(\boldsymbol x)} || K(\cdot,\boldsymbol{x}) - f_{\rho,\sigma} ||_{\mathcal{H}_\sigma}\)` 1. **KDE:** `\(\hspace{1.5cm}\)` `\(\Psi\Big( f_{\sigma}; \mathbb{P}, \boldsymbol{x} \Big) \le \sigma^{-d/2} || K(\cdot,\boldsymbol{x}) - f_{\sigma} ||_{\mathcal{H}_\sigma}\)` 1. **Distance-to-Measure:** `\(\Psi\Big( d_{m}; \mathbb{P}, \boldsymbol{x} \Big) \le \frac{2}{\sqrt{m}} \sup \Big\{ \big\vert f(\boldsymbol x) - \int f(\boldsymbol y)d\mathbb{P}(\boldsymbol{y}) \big\vert : || \nabla f ||_{L_2(\mathbb P)} \le 1 \Big\}\)` ] <br> .tiny[<body> `\(\color{purple}{w_{\sigma}(\boldsymbol x)}\)` weighs down outliers </body>] --- # Is it Robust in practice? .pull-left[ <img src="images/influence/bottleneck.svg", height="500"> ] .pull-right[ <img src="images/influence/dtm-bottleneck.svg", height="500"> ] --- # Statistical Efficiency : "Gain with no pain'' * **.text-shadow.dred[Consistency:]** Given `\(\mathbb X_n\)` from `\(\mathbb P\)` with density `\(f\)` and `\(\sigma > 0\)`, then `\(W_{\infty}\Big( \textbf{Dgm}\big(f^n_{\rho,\sigma}\big), \textbf{Dgm}\big( f_{\rho,\sigma} \big) \Big) \stackrel{p}{\longrightarrow} 0\)` `\(\hspace{0.5cm}\)` as `\(n \rightarrow \infty\)` * Furthermore, `\(W_{\infty}\Big( \textbf{Dgm}\big(f_{\rho,\sigma}\big), \textbf{Dgm}\big( f \big) \Big) \longrightarrow 0\)` `\(\hspace{0.5cm}\)` as `\(\sigma \rightarrow 0\)` .blockquote[**.dred[Theorem:]** For `\(a_\sigma > 1\)` and `\(p \in (0,1)\)`, if the entropy numbers of the kernel `\(K_\sigma\)` satisfy `\(e_n(\mathcal H_\sigma) \le a_\sigma n^{-{1}/{2p}}\)`, then with probability `\(\ge 1- \alpha\)` and uniformly over `\(\mathbb P\)` `\begin{align} W_{\infty}\Big( \textbf{Dgm}\big(f^n_{\rho,\sigma}\big), \textbf{Dgm}\big( f_{\rho,\sigma} \big) \Big) \le 2C \sigma^{-d/2} \Bigg( \xi(n,p) + \sqrt{\frac{2\log(1/\alpha)}{n}} \Bigg) \end{align}` where `\begin{align} \xi(n,p) = \begin{cases} \mathcal{O}(n^{-1/2}) & \text{ if } p \in (0,\frac{1}{2})\\ \mathcal{O}(n^{-1/2} \log n) & \text{ if } p = \frac{1}{2}\\ \mathcal{O}(n^{-1/4p}) & \text{ if } p \in (\frac{1}{2},1) \end{cases} \end{align}` ] --- count: false # Statistical Efficiency : "Gain with no pain'' <img src="images/other/band.svg"> --- # References <a name=bib-chazal2017robust></a>[[1]](#cite-chazal2017robust) F. Chazal et al. "Robust topological inference: Distance to a measure and kernel distance". In: _Journal of Machine Learning Research_ 18.1 (2017), pp. 5845-5884. <a name=bib-chazal2011geometric></a>[[2]](#cite-chazal2011geometric) F. Chazal et al. "Geometric inference for probability measures". In: _Foundations of Computational Mathematics_ 11.6 (2011), pp. 733-751. <a name=bib-biau2011weighted></a>[[3]](#cite-biau2011weighted) G. Biau et al. "A weighted k-Nearest Neighbor density estimate for geometric inference". In: _Electronic Journal of Statistics_ 5 (2011), pp. 204-237. <a name=bib-fasy2014confidence></a>[[4]](#cite-fasy2014confidence) B. T. Fasy et al. "Confidence sets for persistence diagrams". In: _The Annals of Statistics_ 42.6 (2014), pp. 2301-2339. <a name=bib-phillips2015geometric></a>[[5]](#cite-phillips2015geometric) J. M. Phillips et al. "Geometric Inference on Kernel Density Estimates". In: _31st International Symposium on Computational Geometry_. 2015. <a name=bib-cohen2007stability></a>[[6]](#cite-cohen2007stability) D. Cohen-Steiner et al. "Stability of persistence diagrams". In: _Discrete & Computational Geometry_ 37.1 (2007), pp. 103-120. <a name=bib-chazal2016structure></a>[[7]](#cite-chazal2016structure) F. Chazal et al. _The Structure and Stability of Persistence Modules_. AG, CH: Springer, 2016. --- class: center, middle ## .dblue[Thank you!] ### .red.footnotesize[ .black.large.bold.text-shadow[Poster: ] .large[\# 18998 ]] ### .red.footnotesize[ <body> .black.large.bold.text-shadow[Code: ] ] <a href="https://github.com/sidv23/robust-PDs"> `\(\bbox[2pt,white]{\color{red}{\texttt{github.com/sidv23/robust-PDs}}}\)` </a> ### .red.footnotesize[ <body> .black.large.bold.text-shadow[Slides: ] ] <a href="https://sidv23.github.io/slides/neurips-2020"> `\(\bbox[2pt,white]{\color{red}{\texttt{sidv23.github.io/slides/neurips-2020}}}\)` </a> ### .black.footnotesize[Looking forward to discussions and questions during the poster session!]