class: left,middle,title-slide count: false # Statistical Learning for Robust Topological Inference ## Ph.D. Comprehensive Examination ### .bolder[Siddharth Vishwanath] <br><br> #### 4<sup>th</sup> December, 2020 <br><br><br><br><br><br><br> <!-- <img style="position: absolute; bottom: 0; right: 0; border: 0;" src="images/other/psu-logo2.png" height=25%> --> <!-- <img style="position: absolute; bottom: 0; right: 0; border: 0;" src="images/other/ism-logo.png" height=20%> --> --- # Outline ### 1. Motivation ### 2. Introduction to Homology and Persistent Homology ### 3. Robust Persistence Diagrams using Reproducing Kernels <a name=cite-vishwanath2020robust></a>[[VFKS-2020b](#bib-vishwanath2020robust)] ### 4. Statistical Invariance of Betti Numbers <a name=cite-vishwanath2020statistical></a>[[VFKS-2020a](#bib-vishwanath2020statistical)] ### 5. Future Work <div style="display:none"> $$ \newcommand{\defeq}{\stackrel{\small\bullet}{=}} \newcommand{\ra}{\rangle} \newcommand{\la}{\langle} \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\abs}[1]{\left\lvert#1\right\rvert} \newcommand{\Abs}[1]{\Bigl\lvert#1\Bigr\rvert} \newcommand{\pr}{{\mathbb P}} \newcommand{\qr}{{\mathbb Q}} \newcommand{\xv}{{\boldsymbol{x}}} \newcommand{\av}{{\boldsymbol{a}}} \newcommand{\bv}{{\boldsymbol{b}}} \newcommand{\cv}{{\boldsymbol{c}}} \newcommand{\dv}{{\boldsymbol{d}}} \newcommand{\ev}{{\boldsymbol{e}}} \newcommand{\fv}{{\boldsymbol{f}}} \newcommand{\gv}{{\boldsymbol{g}}} \newcommand{\hv}{{\boldsymbol{h}}} \newcommand{\nv}{{\boldsymbol{n}}} \newcommand{\sv}{{\boldsymbol{s}}} \newcommand{\tv}{{\boldsymbol{t}}} \newcommand{\uv}{{\boldsymbol{u}}} \newcommand{\vv}{{\boldsymbol{v}}} \newcommand{\wv}{{\boldsymbol{w}}} \newcommand{\zerov}{{\mathbf{0}}} \newcommand{\onev}{{\mathbf{0}}} \newcommand{\phiv}{{\boldsymbol{\phi}}} \newcommand{\cc}{{\check{C}}} \newcommand{\xv}{{\boldsymbol{x}}} \newcommand{\Xv}{{\boldsymbol{X}\!}} \newcommand{\yv}{{\boldsymbol{y}}} \newcommand{\Yv}{{\boldsymbol{Y}}} \newcommand{\zv}{{\boldsymbol{z}}} \newcommand{\Zv}{{\boldsymbol{Z}}} \newcommand{\Iv}{{\boldsymbol{I}}} \newcommand{\Jv}{{\boldsymbol{J}}} \newcommand{\Cv}{{\boldsymbol{C}}} \newcommand{\Ev}{{\boldsymbol{E}}} \newcommand{\Fv}{{\boldsymbol{F}}} \newcommand{\Gv}{{\boldsymbol{G}}} \newcommand{\Hv}{{\boldsymbol{H}}} \newcommand{\alphav}{{\boldsymbol{\alpha}}} \newcommand{\epsilonv}{{\boldsymbol{\epsilon}}} \newcommand{\betav}{{\boldsymbol{\beta}}} \newcommand{\deltav}{{\boldsymbol{\delta}}} \newcommand{\gammav}{{\boldsymbol{\gamma}}} \newcommand{\etav}{{\boldsymbol{\eta}}} \newcommand{\piv}{{\boldsymbol{\pi}}} \newcommand{\thetav}{{\boldsymbol{\theta}}} \newcommand{\tauv}{{\boldsymbol{\tau}}} \newcommand{\muv}{{\boldsymbol{\mu}}} \newcommand{\phiinv}{\Phi^{-1}} \newcommand{\Fiinv}{F^{-1}} \newcommand{\giinv}{g^{-1}} \newcommand{\fhat}{\hat{f}} \newcommand{\ghat}{\hat{g}} \newcommand{\ftheta}{f_\theta} \newcommand{\fthetav}{f_{\thetav}} \newcommand{\gtheta}{g_\theta} \newcommand{\gthetav}{g_{\thetav}} \newcommand{\ztheta}{Z_\theta} \newcommand{\xtheta}{\Xv_\theta} \newcommand{\ytheta}{\Yv_\theta} \newcommand{\p}{\partial} \newcommand{\f}{\frac} \newcommand{\cf}{\cfrac} \newcommand{\e}{\epsilon} \newcommand{\indep}{\perp\kern-5pt \perp} \newcommand{\inner}[1]{\langle#1\rangle} \newcommand{\pa}[1]{\left(#1\right)} \newcommand{\pb}[1]{\left\{#1\right\}} \newcommand{\pc}[1]{\left[#1\right]} \newcommand{\pA}[1]{\Big(#1\Big)} \newcommand{\pB}[1]{\Big\{#1\Big\}} \newcommand{\pC}[1]{\Big[#1\Big]} \newcommand{\ty}[1]{\texttt{#1}} \newcommand{\borel}[1]{\mathscr{B}\pa{#1}} \newcommand{\scr}{\mathcal} \newcommand{\scrb}{\mathscr} \newcommand{\argmin}{\mathop{\text{arg}\ \!\text{min}}} \newcommand{\arginf}{\mathop{\text{arg}\ \!\text{inf}}} \newcommand{\argmax}{\mathop{\text{arg}\ \!\text{max}}} \newcommand{\argsup}{\mathop{\text{arg}\ \!\text{sup}}} \newcommand{\bigo}[1]{\mathcal{O}_{p}\!\left(#1\right)} \newcommand{\f}{\frac} \newcommand{\e}{\epsilon} \newcommand{\inv}{^{-1}} \newcommand{\phiinv}{\Phi^{-1}} \newcommand{\Fiinv}{F^{-1}} \newcommand{\giinv}{g^{-1}} \newcommand{\fhat}{\hat{f}} \newcommand{\ghat}{\hat{g}} \newcommand{\ftheta}{f_\theta} \newcommand{\fthetav}{f_{\thetav}} \newcommand{\gtheta}{g_\theta} \newcommand{\gthetav}{g_{\thetav}} \newcommand{\ztheta}{Z_\theta} \newcommand{\xtheta}{\Xv_\theta} \newcommand{\ytheta}{\Yv_\theta} \newcommand{\absdet}[1]{\abs{\det\pa{#1}}} \newcommand{\jac}[1]{\Jv_{#1}} \newcommand{\absdetjx}[1]{\abs{\det\pa{\Jv_{#1}}}} \newcommand{\absdetj}[1]{\norm{\Jv_{#1}}} \newcommand{\sint}{sin(\theta)} \newcommand{\cost}{cos(\theta)} \newcommand{\sor}[1]{S\mathcal{O}(#1)} \newcommand{\ort}[1]{\mathcal{O}(#1)} \newcommand{\A}{{\mathbb A}} \newcommand{\C}{{\mathbb C}} \newcommand{\E}{{\mathbb E}} \newcommand{\F}{{\mathcal{F}}} \newcommand{\N}{{\mathbb N}} \newcommand{\R}{{\mathbb R}} \newcommand{\Q}{{\mathbb Q}} \newcommand{\Z}{{\mathbb Z}} \newcommand{\X}{{\scr{X}}} \newcommand{\Y}{{\mathbb{Y}}} \newcommand{\G}{{\mathcal{G}}} \newcommand{\M}{{\mathcal{M}}} \newcommand{\betaequivalent}{\beta\text{-equivalent}} \newcommand{\betaequivalence}{\beta\text{-equivalence}} \newcommand{\Mb}{{\boldsymbol{\mathsf{M}}}} \newcommand{\Br}{{\mathbf{\mathsf{Bar}}}} \newcommand{\dgm}{{\mathbf{\mathsf{Dgm}}}} \newcommand{\Db}{{\mathbf{\mathsf{D}}}} \newcommand{\Img}{{\mathbf{\mathsf{Img}}}} \newcommand{\mmd}{{\mathbf{\mathsf{MMD}}}} \newcommand{\Xn}{{\mathbb{X}_n}} \newcommand{\Yn}{{\mathbb{Y}_n}} \newcommand{\Xb}{{\mathbb{X}}} \newcommand{\Yb}{{\mathbb{Y}}} \newcommand{\s}{{{\sigma}}} \newcommand{\fnsbar}{{\bar{f}^n_\s}} \newcommand{\fns}{{f^n_\s}} \newcommand{\fs}{{f_\s}} \newcommand{\fsbar}{{\bar{f}_\s}} \newcommand{\barfn}{{{f}^n_\sigma}} \newcommand{\barfnm}{{{f}^{n+m}_\sigma}} \newcommand{\barfo}{{{f}_\sigma}} \newcommand{\fn}{{f^n_{\rho,\sigma}}} \newcommand{\fnm}{{f^{n+m}_{\rho,\sigma}}} \newcommand{\fo}{{f_{\rho,\sigma}}} \newcommand{\K}{{{K_{\sigma}}}} \newcommand{\barpn}{{\bar{p}^n_\sigma}} \newcommand{\barpo}{{\bar{p}_\sigma}} \newcommand{\pn}{{p^n_\sigma}} \newcommand{\po}{{p_\sigma}} \newcommand{\J}{{\mathcal{J}}} \newcommand{\B}{{\mathcal{B}}} \newcommand{\pt}{{\tilde{\mathbb{P}}}} \newcommand{\Winf}{{W_{\infty}}} \newcommand{\HH}{{{\scr{H}_{\sigma}}}} \newcommand{\D}{{{\scr{D}_{\sigma}}}} \newcommand{\Ts}{{T_{\sigma}}} \newcommand{\Phis}{{\Phi_{\sigma}}} \newcommand{\nus}{{\nu_{\sigma}}} \newcommand{\Qs}{{\mathcal{Q}_{\sigma}}} \newcommand{\ws}{{w_{\sigma}}} \newcommand{\vs}{{v_{\sigma}}} \newcommand{\ds}{{\delta_{\sigma}}} \newcommand{\fp}{{f_{\pr}}} \newcommand{\prs}{{\widetilde{\pr}_{\sigma}}} \newcommand{\qrs}{{\widetilde{\qr}_{\sigma}}} \newcommand{\Inner}[1]{\Bigl\langle#1\Bigr\rangle} \newcommand{\innerh}[1]{\langle#1\rangle_{\HH}} \newcommand{\Innerh}[1]{\Bigl\langle#1\Bigr\rangle_{\HH}} \newcommand{\normh}[1]{\norm{#1}_{\HH}} \newcommand{\norminf}[1]{\norm{#1}_{\infty}} \newcommand{\gdelta}{{\G_{\delta}}} \newcommand{\supgdelta}{{\sup\limits_{g\in\gdelta}\abs{\Delta_n(g)}}} \newcommand{\id}{\textup{id}} \newcommand{\supp}{\text{supp}} \newcommand{\cech}{\v{C}ech} \newcommand{\Zz}{{\scr{Z}}} \newcommand{\psis}{\psi_\s} \newcommand{\phigox}{\Phis(\xv)-g} \newcommand{\phigoy}{\Phis(\yv)-g} \newcommand{\fox}{{f^{\epsilon,{\xv}}_{\rho,\sigma}}} \newcommand{\prx}{{\pr^{\epsilon}_{\xv}}} \newcommand{\pro}{{\pr_0}} \newcommand{\dotfo}{\dot{f}_{\!\!\rho,\s}} \newcommand{\phifo}{{\Phis(\yv)-\fo}} \newcommand{\phifox}{{\Phis(\xv)-\fo}} \newcommand{\kinf}{{\norm{\K}_{\infty}}} \newcommand{\half}{{{\f{1}{2}}}} \newcommand{\Jx}{\J_{\epsilon,{\xv}}} $$ </div> --- class: inverse, center, middle # Motivation ## .small[Topological Data Analysis] --- class: center # .left[Detecting Cosmic Voids and Filamental Structures] .center[ <iframe src="https://rstudio.aws.science.psu.edu:3838/suv87/comps/voronoi/" width="500" height="400"></iframe> ] Cosmic voids, discovered in the 1980s, are posited to have been formed during the Big Bang `\(\pr\pa{\# \text{ cosmic voids} \ge n} = ?\)` --- # Understanding Nanoscopic Structures ![](index_files/figure-html/unnamed-chunk-3-1.png)<!-- --> .center[ Electron clouds for the `\(2p_z\)` and `\(3d_{z^2}\)` orbitals ] -- .center[ `\(\pr\pA{ 3d_{z^2} \simeq \mathbb{T}^2 \bigvee \mathbb S^2 \bigvee \mathbb S^2 } = ?\)` ] --- # Sufficient statistics may not suffice <div class="centered"> <img align="centered" class="animated-gif" src="images/animation/DinoSequential.gif" width="120%"> </div> --- class: inverse, center, middle count:false # Introduction ## .small[Persistent Homology] --- # .left[Defining Shape through Topology] .center[ <img src="code/images/lem-purple.svg" width=600> ] .center[ Given a compact set `\(\color{purple}{\Xb} \subseteq \pa{\X,d}\)`, what is the shape of `\(\color{purple}{\Xb}\)`? ] --- count:false class: left # .left[Homology] .center[ <img src="code/images/lem-different.svg" width=600> ] Topologically speaking, `\(\color{pink}{\mathbb{Y}} \simeq \color{purple}\Xb \simeq \color{red}{\mathbb{Z}}\)`. -- There exists a continuous deformation `\(\color{pink}{\mathbb{Y}} \rightarrow \color{purple}\Xb \rightarrow \color{red}{\mathbb{Z}}\)` --- count:false class: center # .left[Homology] .center[ <img src="code/images/lem-homotopy.svg" width=600> ] The fundamental invariant of the topological space is encoded in its Homology. `\(H_*\pa{\color{pink}{\mathbb{Y}}} \simeq H_*\pa{\color{purple}\Xb} \simeq H_*\pa{\color{red}{\mathbb{Z}}}\)` --- class: left # .left[.orange[Persistent] Homology] Given a collection of points `\(\Xn = \pb{\xv_1, \xv_2, \dots, \xv_n} \subseteq \R^d\)`, what is the shape of `\(\Xn\)`? .center[ <img src="code/images/lem-r0.svg" width=550> ] -- .center[ `\(\Xn\)` intrinsically has no shape ] --- count:false # .orange[Persistent] Homology Given a collection of points `\(\Xn = \pb{\xv_1, \xv_2, \dots, \xv_n} \subseteq \R^d\)`, what is the shape of `\(\Xn\)`? .center[ <img src="code/images/lem-r1.svg" width=550> ] .center[ The shape of `\(\Xn\)` at resolution `\(\e\)` is encoded in `\(\color{orange}{\bigcup\limits_{i=1}^n B(\xv_i,\e)}\)` ] --- count:false # .orange[Persistent] Homology Given a collection of points `\(\Xn = \pb{\xv_1, \xv_2, \dots, \xv_n} \subseteq \R^d\)`, what is the shape of `\(\Xn\)`? .center[ <img src="code/images/lem-r2.svg" width=550> ] .center[ As `\(\e \uparrow\)` new topological features are born / existing topological features die ] --- class: center count:false .center[ <iframe src="https://sidv23.shinyapps.io/filtration/" width="900" height="550"></iframe> ] The birth and death of the topological features is summarized in a .bolder[persistence diagram] `\(\dgm\pa{\Xn}\)` --- # Equivalent Formulation using the Distance Function Given `\(\Xn \subset \R^d\)` and `\(\yv \in \R^d\)`, the .bolder[distance function] to `\(\Xn\)` is given by `\({d_{\Xn}}(\color{red}{\yv}) := \inf\limits_{\color{green}{\xv \in \Xn}}\norm{\xv-\yv}\)` .center[ <img src="code/images/dist.svg" width=550> ] -- The .bolder.orange[sublevel] set at resolution `\(\e\)` is given by `\(d_{\Xn}\inv\Big([0,\e]\Big):= \pb{\yv \in \R^d: d_{\Xn}(\yv) \le \e }\)` -- `\(=\bigcup\limits_{i=1}^{n}B(\xv_i,\e)\)` --- # Functional Persistence For a .bolder.dblue[filter function] `\(\phi_n\)` constructed using `\(\Xn\)`, the shape is encoded in the superlevel set `\(\phi_n\inv\pA{[\e,\infty)}\)` <img src="images/examples/fig21.svg"> --- count: false # Functional Persistence For a .bolder.dblue[filter function] `\(\phi_n\)` constructed using `\(\Xn\)`, the shape is encoded in the superlevel set `\(\phi_n\inv\pA{[\e,\infty)}\)` <img src="images/examples/fig22.svg"> .center[ As `\(\e \downarrow\)` new features are born ] --- count: false # Functional Persistence For a .bolder.dblue[filter function] `\(\phi_n\)` constructed using `\(\Xn\)`, the shape is encoded in the superlevel set `\(\phi_n\inv\pA{[\e,\infty)}\)` <img src="images/examples/fig23.svg"> .center[ As `\(\e \downarrow\)` new features are born, or existing features die ] --- count: false # Functional Persistence For a .bolder.dblue[filter function] `\(\phi_n\)` constructed using `\(\Xn\)`, the shape is encoded in the superlevel set `\(\phi_n\inv\pA{[\e,\infty)}\)` <img src="images/examples/fig2.svg"> .center[ The resulting persistence diagram is given by `\(\dgm\pa{\phi_n} = \dgm\pA{\text{Sup}\pa{\phi_n}}\)` ] --- class: center count:true # .left[Examples of Filter Functions ] <img src="images/filtrations/distfct.svg" height=420> `\(d_{\Xn}(\boldsymbol y)= \mathop{\text{inf}}\limits_{\boldsymbol \xv_i \in \mathbb X_n} || \boldsymbol{y -\xv_i} ||_2\)` --- class: center count:false # .left[Examples of Filter Functions ] <img src="images/filtrations/dtm.svg" height=420> `\(d^2_{m,n}(\boldsymbol x) = \frac{1}{k}\sum\limits_{\boldsymbol{x_i} \in \text{NN}_k(\boldsymbol{x})} || \boldsymbol x - \boldsymbol x_i ||_2^2, \hspace{1cm}\)` where `\(m>0\)` and `\(k = \lfloor mn \rfloor\)` --- class: center count:false # .left[Examples of Filter Functions ] <img src="images/filtrations/kde.svg" height=420> `\(\fns(\yv) = \f{1}{n}\sum\limits_{i=1}^{n}\K(\yv,\xv_i), \hspace{1cm}\)` where `\(\sigma > 0\)` and `\(\K\)` is a kernel --- class: inverse, center, middle count:false # .small[Robust Persistence Diagrams using Reproducing Kernels] ## .tiny.tiny.tiny[S.V., Fukumizu, Kuriki, Sriperumbudur. *Advances in Neural Information Processing Systems 33* (2020)] --- # Statistical Setting * Given `\(\Xn = \pb{\Xv_1,\Xv_2,\dots,\Xv_n}\)` sampled i.i.d. from `\(\pr\)`, the filter function `\(\phi_n\)` is a .dblue[random] function<br> -- * The population analogue, `\(\phi_{\pr}\)`, encodes the topology underlying `\(\pr\)`<br><br> -- * As a result, `\(\dgm\pa{\phi_n} \in \pA{\mathfrak{D},\Winf}\)` is a .bolder.dblue[random persistence diagram] -- .pull-left[ <img src="images/intro/twocircles.svg" width="300 ",height="300"> <br/> ] .pull-right[ <img src="images/intro/wasserstein.svg" width="300",height="300"> <br/> ] .center[ `\(W_\infty(\color{#1E90FF}{\textbf{D}_1},\color{orange}{\textbf{D}_2}) = \inf\limits_{\gamma: \color{#1E90FF}{\textbf{D}_1} \rightarrow \color{orange}{\textbf{D}_2}}\sup\limits_{\color{#1E90FF}{x \in \textbf{D}_1}} || x - \gamma(x) ||_\infty\)` ] --- count: false # Stability `\(\neq\)` Robustness .content-box-psu3[<body> .bolder[Theorem.] <a name=cite-cohen2007stability></a><a name=cite-chazal2016structure></a>[[3](#bib-cohen2007stability); [4](#bib-chazal2016structure)]: Given two tame filter functions `\(f\)` and `\(g\)`, `\(W_{\infty}\Big( \dgm(f), \dgm(g) \Big) \le ||f-g||_{\infty}\)` </body>] .pull-left[ <img src="code/images/signal.svg", height="350"> ] .pull-right[<img src="code/images/signal-dgm.svg", height="350"> ] .center[Persistence diagrams are .green[stable] w.r.t. small perturbations] --- count: false # Stability `\(\neq\)` Robustness .content-box-psu3[<body> .bolder[Theorem.] [[3](#bib-cohen2007stability); [4](#bib-chazal2016structure)]: Given two tame filter functions `\(f\)` and `\(g\)`, `\(W_{\infty}\Big( \dgm(f), \dgm(g) \Big) \le ||f-g||_{\infty}\)` </body>] .pull-left[ <img src="code/images/signal1.svg", height="350"> ] .pull-right[<img src="code/images/signal1-dgm.svg", height="350"> ] .center[Persistence diagrams are .green[stable] w.r.t. small perturbations] --- count: false # Stability `\(\neq\)` Robustness .content-box-psu3[<body> .bolder[Theorem.] [[3](#bib-cohen2007stability); [4](#bib-chazal2016structure)]: Given two tame filter functions `\(f\)` and `\(g\)`, `\(W_{\infty}\Big( \dgm(f), \dgm(g) \Big) \le ||f-g||_{\infty}\)` </body>] .pull-left[ <img src="code/images/X.svg", height="350"> ] .pull-right[<img src="code/images/X-dgm.svg", height="350"> ] .center[Even a few .orchid[outliers] can dramatically change the inference] --- count: false # Stability `\(\neq\)` Robustness .center[.content-box-orange[Can we construct robust persistence diagrams without compromising on statistical efficiency?]] .pull-left[ <img src="code/images/X.svg", height="350"> ] .pull-right[<img src="code/images/X-dgm.svg", height="350"> ] .center[Even a few .orchid[outliers] can dramatically change the inference] --- # Robust Persistence Diagrams * `\(K_\sigma\)` is a radial kernel and `\(\mathcal{H}_\sigma\)` is its reproducing kernel Hilbert space * Given `\(\mathbb X_n\)` and a robust loss `\(\rho: \mathbb{R}_+ \rightarrow \mathbb R_+\)` the robust KDE <a name=cite-kim2012robust></a>[[5](#bib-kim2012robust)] is given by .content-box-grey[<body> $$ f^n_{\rho,\sigma} \stackrel{\small\bullet}{=} \mathop{\text{arg inf}}\limits_{g \in \mathcal{H}_\sigma}\displaystyle\int\limits_{\mathbb R^d}\rho\Big( || g - K_\sigma(\cdot,\boldsymbol{y}) ||_{\mathcal{H}_\sigma} \Big) \ d\mathbb{P}_n(\boldsymbol{y}) = \sum\limits_{i=1}^n w_\sigma(\boldsymbol{X_i}) K_\sigma(\cdot,\boldsymbol{X_i}) $$ </body>] -- * The .dblue.bolder.text-shadow[<body> weight function `\(w_\sigma\)` </body>] weighs-down outliers, and is the *fixed point* of the equation `\begin{align} w_\s(\Xv_i)= \f{\varphi(\Vert\Phis(\Xv_i)-f_{\rho,\s}^n\Vert_{\HH})}{\sum_{j=1}^{n}{\varphi(\Vert\Phis(\Xv_j)-f_{\rho,\s}^{n}\Vert_{\HH})}}, \ \ \ \ \ \text{ where } \varphi(z) = \f{\rho'(z)}{z} \end{align}` -- * `\(\fn\)` and `\(w_\sigma\)` .dorange[do not] have closed form solutions. * But, they can be solved via iteratively reweighted least squares --- # Robust Persistence Diagrams * `\(K_\sigma\)` is a radial kernel and `\(\mathcal{H}_\sigma\)` is its reproducing kernel Hilbert space * Given `\(\mathbb X_n\)` and a robust loss `\(\rho: \mathbb{R}_+ \rightarrow \mathbb R_+\)` the robust KDE [[5](#bib-kim2012robust)] is given by .content-box-grey[<body> $$ f^n_{\rho,\sigma} \stackrel{\small\bullet}{=} \mathop{\text{arg inf}}\limits_{g \in \mathcal{H}_\sigma}\displaystyle\int\limits_{\mathbb R^d}\rho\Big( || g - K_\sigma(\cdot,\boldsymbol{y}) ||_{\mathcal{H}_\sigma} \Big) \ d\mathbb{P}_n(\boldsymbol{y}) = \sum\limits_{i=1}^n w_\sigma(\boldsymbol{X_i}) K_\sigma(\cdot,\boldsymbol{X_i}) $$ </body>] * When `\(\rho(z) = z^2\)`, we recover the classic KDE -- .content-box-grey[<body> $$ f^n_{\sigma} = \mathop{\text{arg inf}}\limits_{g \in \mathcal{H}_\sigma}\displaystyle\int\limits_{\mathbb R^d} || g - K_\sigma(\cdot,\boldsymbol{y}) ||^2_{\mathcal{H}_\sigma} \ d\mathbb{P}_n(\boldsymbol{y}) = \sum\limits_{i=1}^n \f{1}{n} K_\sigma(\cdot,\boldsymbol{X_i}) $$ </body>] --- count: false # Robust Persistence Diagrams * `\(K_\sigma\)` is a radial kernel and `\(\mathcal{H}_\sigma\)` is its reproducing kernel Hilbert space * Given `\(\mathbb X_n\)` and a robust loss `\(\rho: \mathbb{R}_+ \rightarrow \mathbb R_+\)` the robust KDE [[5](#bib-kim2012robust)] is given by .content-box-grey[<body> $$ f^n_{\rho,\sigma} = \mathop{\text{arg inf}}\limits_{g \in \mathcal{H}_\sigma}\displaystyle\int\limits_{\mathbb R^d}\rho\Big( || g - K_\sigma(\cdot,\boldsymbol{y}) ||_{\mathcal{H}_\sigma} \Big) \ d\mathbb{P}_n(\boldsymbol{y}) = \sum\limits_{i=1}^n w_\sigma(\boldsymbol{X_i}) K_\sigma(\cdot,\boldsymbol{X_i}) $$ </body>] * The population counterpart is .content-box-grey[<body> $$ f_{\rho,\sigma} \stackrel{\small\bullet}{=} \mathop{\text{arg inf}}\limits_{g \in \mathcal{H}_\sigma}\displaystyle\int_{\mathbb R^d}\rho\Big( || g - K_\sigma(\cdot,\boldsymbol{y}) ||_{\mathcal{H}_\sigma} \Big) \ d\mathbb{P}(\boldsymbol{y}) $$ </body>] -- .center[ .content-box-orange[<body> `\(\dgm(f^n_{\rho,\sigma}) = \dgm\big( \text{Sup}(f^n_{\rho,\sigma}) \big)\)` is the robust persistence diagram </body>] ] --- # Illustration `\(\dgm\pa{\fns}\)` | `\(\Xn\)` | `\(\dgm\pa{\fn}\)` --- | --- | ---- <img src="code/images/kde-dgm.svg", width="290"> | <iframe src="https://rstudio.aws.science.psu.edu:3838/suv87/comps/kde/" width="290" height="290"></iframe> | <img src="code/images/rkde-dgm.svg", width="290"> --- class: center, middle count: false # Sensitivity Analysis ## .orange[(Part I)] --- # Sensitivity Analysis using Influence Functions * Given a .bolder.dblue[statistical functional] `\(T: \mathcal{P}\pa{\X} \rightarrow \pa{V, \norm{\cdot}_V}\)`: - Takes a probability measure `\(\pr \in \mathcal{P}(\X)\)` on the input space `\(\X\)` - Produces a statistic `\(\pr \mapsto T(\pr)\)` in some normed space `\(\pa{V, \norm{\cdot}_V}\)` -- * The .bolder.dblue[influence function] associated with `\(\xv \in \X\)` is given by .content-box-blue.center[<body> $$ \text{IF}(T; \mathbb P, \boldsymbol x) = \lim\limits_{\epsilon \rightarrow 0}\frac{1}{\epsilon}\Big( T(\mathbb{P}^{\epsilon}_{\boldsymbol x}) - T(\mathbb P) \Big) $$ .center[ where `\(\mathbb{P}^{\epsilon}_{\boldsymbol x} = (1-\e)\pr + \e \delta_{\xv}\)` is a perturbation of `\(\mathbb P\)` by adding `\(\epsilon\)` amount of mass at `\(\boldsymbol{x}\)` ] </body>] -- * `\(\text{IF}(T; \mathbb P, \boldsymbol x)\)` is the .orchid.bolder[Gâteaux derivative] along the perturbation curve `\(\mathbb{P}^{\epsilon}_{\boldsymbol x}\)` <br><br> -- * A persistence diagram, `\(\dgm\pa{\phi_{\pr}}\)` is a statistical functional taking values in the space `\((\mathfrak{D}, W_{\infty})\)` <br><br> -- * But, `\((\mathfrak{D}, W_{\infty})\)`, is not a normed space 🙁 `\(\hspace{.5cm}\)` --
<i class="fas fa-hand-point-right faa-horizontal animated "></i>
`\(\hspace{.5cm}\)` Generalize via metric derivative 🙂 --- # Sensitivity Analysis in .scriptsize[<p> `\(\pa{\mathfrak{D},\Winf}\)` </p>] * Given a filter function, `\(\phi_{\pr}\)`, and `\(\xv \in \R^d\)`, the .dblue.bolder[persistence influence] is given by<br> .center[ .content-box-blue[<body> $$ \Psi\Big( \phi_{\mathbb P}; \mathbb{P}, \boldsymbol{x} \Big) \stackrel{\small\bullet}{=} \lim\limits_{\epsilon \rightarrow 0}\frac{1}{\epsilon} W_{\infty}\Big( \dgm\big(\phi_{\mathbb{P}^{\epsilon}_{\boldsymbol x}}\big), \dgm\big( \phi_{\mathbb P} \big) \Big) $$ </body>] ] -- * Sensitivity analysis for `\(\dgm\pa{\fo}\)`, `\(\dgm\pa{\fs}\)` and `\(\dgm\pa{d_{\pr,m}}\)` .theorem1[For a robust loss `\(\rho\)`, bandwidth `\(\sigma > 0\)`, and smoothing parameter `\(m > 0\)`: 1. **RKDE:**<br> .center[<body> `\(\Psi\Big( f_{\rho,\sigma}; \mathbb{P}, \boldsymbol{x} \Big) \le \sigma^{-d/2} \color{#1058ad}{\boldsymbol{w_{\sigma}(\boldsymbol x)}} || K(\cdot,\boldsymbol{x}) - f_{\rho,\sigma} ||_{\mathcal{H}_\sigma}\)` </body>] 1. **KDE:**<br> .center[<body> `\(\Psi\Big( f_{\sigma}; \mathbb{P}, \boldsymbol{x} \Big) \le \sigma^{-d/2} || K(\cdot,\boldsymbol{x}) - f_{\sigma} ||_{\mathcal{H}_\sigma}\)` </body>] 1. **DTM:**<br> .center[<body> `\(\Psi\Big( d_{m}; \mathbb{P}, \boldsymbol{x} \Big) \le \frac{2}{\sqrt{m}} \sup \Big\{ \big\vert f(\boldsymbol x) - \int f(\boldsymbol y)d\mathbb{P}(\boldsymbol{y}) \big\vert : || \nabla f ||_{L_2(\mathbb P)} \le 1 \Big\}\)` </body>] <br> </body>] .tiny[<body> `\(\color{#1058ad}{\boldsymbol{w_{\sigma}(\boldsymbol x)}}\)` weighs down outliers </body>] --- # Proof of Theorem 1 * For the KDE, `\(\fs = \int\K(\cdot,\yv) \ d\pr(\yv)\)`, the result follows from using <br> (I.) The stability of persistence diagrams [[3](#bib-cohen2007stability); [4](#bib-chazal2016structure)], and (II.) the dual formulation of `\(\norm{\cdot}_{\HH}\)`<br><br> -- * For the DTM, `\(d_{\pr,m}\)`, the stability also guarantees <a name=cite-chazal2011geometric></a>[[6](#bib-chazal2011geometric)] <br> .center[<body> `\(\Winf\pA{\dgm\pa{d_{\pr,m}},\dgm\pa{d_{\qr,m}}} \lesssim W_2(\pr,\qr)\)`. </body>] The result follows from examining the metric derivative of the perturbation curve, `\(\prx\)`, as a geodesic in Wasserstein space `\(\mathcal{W}_2(\R^d)\)` <br><br> -- * For the RKDE, `\(\fo\)` does not admit a closed form solution. We first show that the risk functional $$ \J(g) \defeq \displaystyle\int\rho\pa{\normh{\K(\cdot,\yv) - g}}d\pr(\yv) $$ is .red[equi-coercive] and .red[<body> `\(\Gamma\)`-convergent. </body>] -- From stability and the fundamental theorem of `\(\Gamma\)`-convergence `\begin{align} \lim\limits_{\epsilon \rightarrow 0}\frac{1}{\epsilon} W_{\infty}\Big( \dgm\big(\fox\big), \dgm\big( \fo \big) \Big) \le \lim\limits_{\epsilon \rightarrow 0}\frac{1}{\epsilon} \norminf{\fox-\fo} \le \norminf{\lim\limits_{\epsilon \rightarrow 0}\frac{1}{\epsilon} \pa{\fox - \fo}} = \norminf{\dotfo} \end{align}` -- The result follows from a careful analysis of `\(\dotfo\)` --- # Is .footnotesize[<body> `\(\dgm\pa{\fn}\)` </body>] robust in practice? ### .orange[Experiment] * `\(\color{green}{\Xn}\)` is sampled from a signal, and we construct the persistence diagram `\(\color{green}{\dgm\pa{\phi_n}}\)`<br> -- * Outliers `\(\color{#db43d5}{\mathbb{Y}_m(r)}\)` are added at a distance `\(r\)` away from the support of `\(\color{green}{\Xn}\)`<br> * The persistence diagram `\(\color{#db43d5}{\dgm\pa{\phi_{n+m}}}\)` is constructed on the composite sample `\(\color{green}{\Xn} \cup \color{orchid}{\Yb_m(r)}\)`<br> -- * The empirical .dblue[persistence influence] `\begin{align} \Winf\pA{\color{green}{\dgm\pa{\phi_n}},\color{#db43d5}{\dgm\pa{\phi_{n+m}}}} \end{align}` is computed for different values of `\(r\)`<br><br> --- count: false # Is .footnotesize[<body> `\(\dgm\pa{\fn}\)` </body>] robust in practice? .pull-left[ <img src="images/influence/bottleneck.svg", width="500"> ] -- .pull-right[ <img src="images/influence/dtm-bottleneck.svg", width="500"> ] --- class: center, middle count: false # Statistical Analysis ## .orange[(Part II)] --- # Consistency and Confidence Bands .theorem2[ Given `\(\mathbb X_n\)` observed i.i.d. from `\(\mathbb P\)` with density `\(f\)`, a bandwidth `\(\sigma > 0\)`, and convex loss `\(\rho\)` .center[ `\(W_{\infty}\Big( \textbf{Dgm}\big(f^n_{\rho,\sigma}\big), \textbf{Dgm}\big( f_{\rho,\sigma} \big) \Big) \stackrel{p}{\longrightarrow} 0\)` `\(\hspace{0.5cm}\)` as `\(n \rightarrow \infty\)` ] Furthermore, .center[ `\(W_{\infty}\Big( \textbf{Dgm}\big(f_{\rho,\sigma}\big), \textbf{Dgm}\big( f \big) \Big) \longrightarrow 0\)` `\(\hspace{0.5cm}\)` as `\(\sigma \rightarrow 0\)` ] ] -- .bolder.text-shadow[.large[Confidence Band:]] .content-box-red[.bolder.text-shadow[Definition.] For `\(\Xn\)` sampled i.i.d. from `\(\pr\)`, a signficance `\(\alpha \in (0,1)\)`, and a corresponding `\(\delta_n(\alpha)\)` the set `\begin{align} \mathcal{C}_\alpha \defeq \pB{D \in \pa{\mathfrak{D},\Winf} : \Winf\pa{D, \dgm\pa{\phi_n}} \le \delta_n(\alpha)} \end{align}` is a `\((1-\alpha)\)`-.bolder.text-shadow[confidence set] in the space of persistence diagrams if `\begin{align} \pr^{\otimes n}\pB{\dgm\pa{\phi_{\pr}} \in \mathcal{C}_{\alpha}} \ge 1-\alpha \end{align}` `\(\boldsymbol{\delta_n(\alpha)}\)` determines a .bolder.text-shadow[confidence band] `\(\mathcal{B}_{\alpha} \defeq \Delta \oplus \delta_n(\alpha)\)` in the space `\(\pa{\mathfrak{D},\Winf}\)` ] --- count: false class: center # .left[Consistency and Confidence Bands] .pull-left[ `\(\mathcal{C}_\alpha = \pB{D : \Winf\pa{D, \dgm\pa{\phi_n}} \le \delta_n(\alpha)}\)` <img src="code/images/conf-ex.svg", width="500"> ] -- .pull-right[ `\(\mathcal{B}_{\alpha} = \Delta \oplus \delta_n(\alpha)\)` <img src="code/images/band-ex.svg", width="500"> ] --- # Statistical Efficiency .bolder.text-shadow[Assumption:] For `\(a_\sigma > 1\)` and `\(p \in (0,1)\)`, the entropy numbers of `\(K_\sigma\)` satisfy `\(e_n(\mathcal H_\sigma) \le a_\sigma n^{-{1}/{2p}}\)` -- * `\(e_n(\mathcal H_\sigma)\)` provides a handle for the covering numbers of the unit ball `\(\mathfrak{B}_\HH(\zerov,1)\)` in the RKHS -- <br> .content-box-blue[.bolder.text-shadow[Example.] If `\(\supp\pa{\pr} \subset \R^d\)` with `\(\text{diam}\pa{\supp\pa{\pr}} = r\)`, and `\(\K\)` is `\(m\)`-times differentiable, then `\begin{align} e_n(\mathcal H_\sigma) \le c r^m n^{-m/d}, \end{align}` with `\(p = \f{d}{2m}\)`. This leads to the following observations: 1. In order to ensure statistical efficiency `\(\K\)` has to be sufficiently smooth 1. If `\(d\!\uparrow \ \Longrightarrow \ p \uparrow\)` 1. If `\(m\!\uparrow \ \Longrightarrow \ p \downarrow\)` ] --- count: false # Statistical Efficiency .bolder.text-shadow[Assumption:] For `\(a_\sigma > 1\)` and `\(p \in (0,1)\)`, the entropy numbers of `\(K_\sigma\)` satisfy `\(e_n(\mathcal H_\sigma) \le a_\sigma n^{-{1}/{2p}}\)` .theorem3[ Given `\(\mathbb X_n\)` observed i.i.d. from `\(\mathbb P\)` with density `\(f\)`, a bandwidth `\(\sigma > 0\)`, and a convex loss `\(\rho\)` : <br><br> With probability greater than `\(1- \alpha\)`, and uniformly over `\(\mathbb P\)` `\begin{align} W_{\infty}\Big( \textbf{Dgm}\big(f^n_{\rho,\sigma}\big), \textbf{Dgm}\big( f_{\rho,\sigma} \big) \Big) \le 2C \sigma^{-d/2} \Bigg( \xi(n,p) + \sqrt{\frac{2\log(1/\alpha)}{n}} \Bigg) \end{align}` where `\begin{align} \xi(n,p) = \begin{cases} \mathcal{O}(n^{-1/2}) & \text{ if } p \in (0,\frac{1}{2})\\ \mathcal{O}(n^{-1/2} \log n) & \text{ if } p = \frac{1}{2}\\ \mathcal{O}(n^{-1/4p}) & \text{ if } p \in (\frac{1}{2},1) \end{cases} \end{align}` ] <br> .footnotesize[ Recall that `\(d\!\uparrow \Longrightarrow p\!\uparrow\)` and `\(m\!\uparrow \Longrightarrow p\!\downarrow\)` ] --- count: true # Proof of Theorem 3 ### Part 1: Strong convexity of the risk functional * Let the loss functional be denoted by `\(\ell_g(\xv) \defeq \rho\pa{\normh{\K(\cdot,\xv) - g}}\)` * Recall the risk functional `\(\J(g)\)` given by $$ \J(g) \defeq \displaystyle\int\ell_g(\yv)d\pr(\yv) $$ -- * If the robust loss `\(\rho(\cdot)\)` is convex and `\(M\)`-Lipschitz (and some other technical assumptions) 1. `\(\J(g)\)` is strictly convex, and `\(M\)`-Lipschitz 1. `\(\J(g)\)` is **strongly convex** around its minimizer `\(\fo \defeq \arginf_{g \in \HH}\J(g)\)`, .content-box-blue[ `\begin{align} \J(g) - \J(\fo) \ge \f{\mu}{2}\normh{g-\fo}^2 \end{align}` ] --- count: false # Proof of Theorem 3 ### Part 2: Analysis of the Empirical Process * The empirical risk functional `\(\J_n(g)\)` is given by `\(\J_n(g) = \f{1}{n}\sum_{i=1}^{n}\ell_g(\Xv_i)\)` * `\(\J(g)\)` and `\(\J_n(g)\)` can be written as the empirical processes `\(\pr\ell_g\)` and `\(\pr_n\ell_g\)` respectively * We define the .bolder[fluctuation process], as the empirical processs given by `\begin{align} \Delta_n(g) \defeq \pr_n\pa{\ell_g - \ell_{\fo}} - \pr\pa{\ell_g - \ell_{\fo}} \end{align}` -- * We show that the concentration of `\(\fn\)` near `\(\fo\)` in `\(\normh{\cdot}\)` is controlled by the **tail behaviour** of the fluctuation process `\(\Delta_n(g)\)`, i.e., .content-box-blue[ `\begin{align} \pr^{\otimes n}\pB{\Xv_{1:n} : \normh{\fn - \fo} > \delta} \le \pr^{\otimes n}\pB{\Xv_{1:n} : \sup_{g \in \gdelta}\abs{\Delta_n(g)} > \f{\mu}{2}\delta^2} \end{align}` ] where `\(\gdelta = \pb{g \in \HH : \normh{g - \fo} \le \delta}\)` --- count: false # Proof of Theorem 3 ### Part 3: Controlling the Fluctuation Process * `\(\sup_{g \in \gdelta}\abs{\Delta_n(g)}\)` is the supremum of an empirical process<br><br> -- * Using McDiarmid's inequality and the generalized entropy bound, with probability `\(\ge 1 - e^{-t}\)` `\begin{align} \sup_{g \in \gdelta}\abs{\Delta_n(g)} \le M\delta \xi(n,p) + M\delta\sqrt{\f{2t}{n}} \end{align}` and from part 2, `\begin{align} \normh{\fn - \fo} \le \f{2M}{\mu}\pA{\xi(n,p) + \sqrt{\f{2t}{n}}} \end{align}` -- * Using the fact that `\(\norminf{h} \le \sigma^{-d/2}\normh{h}\)` and the stability of persistence diagrams we obtain the final result `\(\blacksquare\)` --- class: center # .left[Uniform Confidence Band] .pull-left[ `\(\Xn\)` <img src="code/images/points-extreme.svg", width="500"> ] .pull-right[ `\(\dgm\pa{\fn}\)` <img src="code/images/dgm-extreme.svg", width="500"> ] --- class: center, middle count: false # Experiments ## .orange[(Part III)] --- class: center # .left[Signal from Noise] * .orchid[<body> `\(\Xn\)` ( `\(\large\bullet\)` ) </body>] is sampled i.i.d. from .orchid[<body> `\(\pr_{\text{signal}}\)` </body>] and `\(\Yb_m\)` ( `\(\large\circ\)` ) is sampled i.i.d. from `\(\pr_{\text{noise}}\)`, with `\(\f{m}{n} = \pi \in (0,\half)\)`<br> * `\(\color{red}{\Db_\s}\)` & `\(\color{#3e9cfa}{\Db_{\rho,\s}}\)` are the diagrams of `\(\Xn \cup \Yb_m\)` using `\(\fns\)` & `\(\fn\)`, and `\(\Db^\#\)` is the diagram from the KDE on `\(\Xn\)` .pull-left[ <img src="images/expts/cloud.svg", width="320"> <br> `\(\color{#db43d5}\Xn \cup \Yb_m\)` ] -- .pull-right[ <img src="images/expts/box2.svg", width="320"> <br> `\(\color{#3e9cfa}{\Winf\pa{\Db_{\rho,\s},D^\#}}\)` and `\(\color{red}{\Winf\pa{\Db_\s,D^\#}}\)` ] .center[ `\(\pi = 30\%\)` and `\(\boldsymbol{p} = 4 \times 10^{-60}\)` ] --- count: false class: center # .left[Signal from Noise] * .orchid[<body> `\(\Xn\)` ( `\(\large\bullet\)` ) </body>] is sampled i.i.d. from .orchid[<body> `\(\pr_{\text{signal}}\)` </body>] and `\(\Yb_m\)` ( `\(\large\circ\)` ) is sampled i.i.d. from `\(\pr_{\text{noise}}\)`, with `\(\f{m}{n} = \pi \in (0,\half)\)`<br> * `\(\color{red}{\Db_\s}\)` & `\(\color{#3e9cfa}{\Db_{\rho,\s}}\)` are the diagrams of `\(\Xn \cup \Yb_m\)` using `\(\fns\)` & `\(\fn\)`, and `\(\Db^\#\)` is the diagram from the KDE on `\(\Xn\)` .pull-left[ <img src="images/expts/cloud.svg", width="320"> <br> `\(\color{#db43d5}\Xn \cup \Yb_m\)` ] .pull-right[ <img src="images/expts/box3.svg", width="320"> <br> `\(\color{#3e9cfa}{\Winf\pa{\Db_{\rho,\s},D^\#}}\)` and `\(\color{red}{\Winf\pa{\Db_\s,D^\#}}\)` ] .center[ `\(\pi = 40\%\)` and `\(\boldsymbol{p} = 2 \times 10^{-72}\)` ] --- count: false class: center # .left[Signal from Noise] * .orchid[<body> `\(\Xn\)` ( `\(\large\bullet\)` ) </body>] is sampled i.i.d. from .orchid[<body> `\(\pr_{\text{signal}}\)` </body>] and `\(\Yb_m\)` ( `\(\large\circ\)` ) is sampled i.i.d. from `\(\pr_{\text{noise}}\)`, with `\(\f{m}{n} = \pi \in (0,\half)\)`<br> * `\(\color{red}{\Db_\s}\)` & `\(\color{#3e9cfa}{\Db_{\rho,\s}}\)` are the diagrams of `\(\Xn \cup \Yb_m\)` using `\(\fns\)` & `\(\fn\)`, and `\(\Db^\#\)` is the diagram from the KDE on `\(\Xn\)` .pull-left[ <img src="images/expts/cloud.svg", width="320"> <br> `\(\color{#db43d5}\Xn \cup \Yb_m\)` ] .pull-right[ <img src="images/expts/box4.svg", width="320"> <br> `\(\color{#3e9cfa}{\Winf\pa{\Db_{\rho,\s},D^\#}}\)` and `\(\color{red}{\Winf\pa{\Db_\s,D^\#}}\)` ] .center[ `\(\pi = 50\%\)` and `\(\boldsymbol{p} = 2.5 \times 10^{-75}\)` ] --- # Unsupervised Learning * `\(25\)` point-clouds are sampled from each of `\(6\)` objects: * Classes: `cube, circle, sphere, torus, 3clust, 3clustIn3clust` * Points are sampled with additive Gaussian noise (SD=0.1) and ambient Matérn cluster noise<br><br> -- * For each point-cloud `\(\mathbb X_n^{(i)}, i= 1, \dots, 150\)` we compute * Robust persistence diagram: `\(\Db^{(i)} = \dgm\pa{\fn}\)` * Stable vectorization of non-robust diagram: `\(\Img^{(i)} = \Img\pA{\dgm\pa{d_n}; h}\)`, using the distance function `\(d_n\)` for `\(h=0.1\)` <a name=cite-adams2017persistence></a>[[7](#bib-adams2017persistence)]<br><br> -- * Pairwise distance matrices `\(\Delta_1,\Delta_2\)` are computed on `\(\Db^{(i)}\)` and `\(\Img^{(i)}\)` for `\(i = 1,\dots,150\)`, i.e., * `\(\Delta^{ij}_1 = \Winf\pa{\Db^{(i)},\Db^{(j)}}\)` and `\(\Delta^{ij}_2 = \norminf{\Img^{(i)} - \Img^{(j)}}\)`<br><br> -- * Spectral clustering is performed on `\(\Delta_1\)` and `\(\Delta_2\)`, and the quality is assessed using the Rand index .content-box-blue[Distance Matrix | `\(\Delta_1\)` | `\(\Delta_2\)` --- | --- | ---- Rand Index | `\(94.44\%\)` | `\(79.78\%\)` ] --- # Supervised Learning : Classification * For each image `\(\mathcal{I}\)` from the `mpeg7` dataset, the boundary `\(\partial\mathcal{I}\)` is extracted using Laplace convolution * `\(\Xn\)` is sampled from the distribution `\(0.85\times\text{Unif}\pa{\p\mathcal I} + 0.15 \times \text{Unif}(\mathcal I)\)` * For fixed `\(\s, m > 0\)` we compute `\(\dgm\pa{\fn}\)`, `\(\dgm\pa{\fns}\)` and `\(\dgm\pa{d_{n,m}}\)` using `\(\Xn\)` * SVM is trained on persistence images `\(\Img\pa{\dgm;h}\)` for varying `\(h\)` so as to classify the images `\(\mathcal I\)` `\(\Xn\)` when `\(\mathcal I = \texttt{bone}\)` | `\(3\)`-class SVM | `\(5\)`-class SVM --- | --- | ---- <img src="images/expts/bone.svg", width="290"> | <img src="images/expts/3-class-lines.svg", width="290"> | <img src="images/expts/5-class-lines.svg", width="290"> .tiny[Hi] --- # Supervised Learning : Prediction * For `\(\boldsymbol{N} \sim \text{Unif}\pb{1,2,\dots,5}\)`, we generate random circles `\(\mathbb S_1 \dots \mathbb S_{\boldsymbol N} \subset [0,2]^2\)` * `\(\Xn\)` is sampled i.i.d. from `\(\half\text{Unif}\pb{\mathbb S_1 \cup \dots \cup \mathbb S_{\boldsymbol N}} + \half \text{Unif}\pa{[0,2]^2}\)` * `\(\Db_\s\)` and `\(\Db_{\rho,\s}\)` are computed from `\(\Xn\)` using `\(\fns\)` and `\(\fn\)` * SVM is trained on vectorizations `\(\Img\pa{\Db_\s;h}\)` and `\(\Img\pa{\Db_{\rho,\s};h}\)` for varying `\(h\)` to predict `\(\boldsymbol N\)` `\(\Xn\)` when `\(\boldsymbol{N}=5\)` | Bandwidth `\(\s_1\)` | Bandwidth `\(\s_2\)` --- | --- | ---- <img src="images/expts/n23.svg", width="290"> | <img src="images/expts/lines-k5.svg", width="290"> | <img src="images/expts/lines-k7.svg", width="290"> --- class: inverse, center, middle count:false # .small[Future Work] --- # Efficient and Robust Persistence Diagrams * In practice, `\(\dgm\pa{\phi}\)` is computed using .bolder[cubical homology] on a grid `\(G(r)\)` of resolution `\(r\)` <img src="code/images/all.svg" width="950"> * The topological accuracy becomes sensitive to the choice of grid resolution `\(r\)` * Computing `\(\dgm\pa{\phi}\)` is prohibitively expensive in high-dimensions. Space complexity is `\(\mathcal O(r^{-d})\)` --- count: true # Efficient and Robust Persistence Diagrams ### .orange.bolder[Solution \#1:] * Compute the sublevel/superlevel filtration using the .bolder[nerve of the cover] for `\(\Xn\)` directly <a name=cite-bobrowski2017topological></a>[[8](#bib-bobrowski2017topological)] `\begin{align} \pB{\yv \in G(r) : \phi_n(\yv) \ge \epsilon } \rightarrow \mathcal{K}\pA{\pb{\Xv_i \in \Xn : \phi_n(\Xv_i) \ge \epsilon}, \delta} \end{align}` -- .center[ <img src="code/images/wrips3.svg" width="650"> ] --- count: false # Efficient and Robust Persistence Diagrams ### .orange.bolder[Solution \#1:] * Compute the sublevel/superlevel filtration using the .bolder[nerve of the cover] for `\(\Xn\)` directly [[8](#bib-bobrowski2017topological)] `\begin{align} \pB{\yv \in G(r) : \phi_n(\yv) \ge \epsilon } \rightarrow \mathcal{K}\pA{\pb{\Xv_i \in \Xn : \phi_n(\Xv_i) \ge \epsilon}, \delta} \end{align}` .center[ <img src="code/images/wrips4.svg" width="650"> ] --- count: true # Efficient and Robust Persistence Diagrams ### .orange.bolder[Solution \#2:] * Use weighted-Rips filtrations <a name=cite-anai2019dtm></a>[[9](#bib-anai2019dtm)] induced by `\(\fn\)` to construct a RKDE-filtration `\(\dgm\pa{\Xn,\fn}\)` * For `\(\epsilon > 0\)`, fixed `\(p \ge 1\)`, and a function `\(\phi: \R^d \rightarrow \R_+\)`, the .bold[<body> `\(\boldsymbol\phi\)`-power-radius </body>] of `\(\xv\)` is given by `\begin{align} r_{\xv,\phi,p}(\e) = \pa{\e^p - \phi(\xv)^p}^\f{1}{p} \end{align}` * The .bolder.text-shadow[resulting weighted filtration] is given by `\begin{align} \mathbb F\text{ilt}(\Xn,\phi) = \pB{\bigcup_{i=1}^nB(\xv_i,r_{\xv_i,\phi,p}(\e))}_{0 < \e < \infty} \end{align}` * The .dblue.bolder.text-shadow[weighted Rips-filtration] is the .bolder.text-shadow[nerve] of the weighted filtration `\(\mathbb F\text{ilt}(\Xn,\phi)\)` --- count: false # Efficient and Robust Persistence Diagrams .center[ <img src="images/animation/filt.gif" width="500"> ] --- count: false # Efficient and Robust Persistence Diagrams .center[ <img src="images/animation/rkde-filt.gif" width="500"> ] --- count: true # Efficient and Robust Persistence Diagrams ### .orange.bolder[Solution \#3:] * Construct a robust persistence diagram using a robust Median-of-Means (MoM) filtration * Suppose `\(\Xn\)` is partitioned into `\(\pb{S_q}_{q=1}^Q\)` disjoint blocks, and `\(\pr_q\)` is the empirical measure for `\(S_q\)` * We propose using a .dblue.bolder.text-shadow[MoM kernel distance] as the filter function, given by .content-box-blue[ `\begin{align} D^n_{Q,\s}(\xv) = \text{median}\pB{\normh{\mu_{\delta_\xv} - \mu_{\pr_q}}}_{q=1}^{Q} \end{align}` ] * .bold[Advantage:] `\(D^n_{Q,\s}\)` has `\(\mathcal{O}(n)\)` computational cost as opposed to `\(\mathcal{O}(n\ell)\)` for `\(\fn\)`<br><br> .center[ .content-box-orange[<body> `\(\dgm\pa{D^n_{Q,\s}} = \dgm\pa{\text{Sub}\pa{D^n_{Q,\s}}}\)` is the proposed robust MoM persistence diagram </body>] ] --- class: inverse, center, middle count:false # .small[Statistical Invariance of Betti Numbers] ## .tiny.tiny.tiny[<body> S.V., Fukumizu, Kuriki, Sriperumbudur. arXiv:2001.00220 (2020) </body>] --- count: false layout: true class: split-five with-border border-black .column[.content[ .split-five[ .row.bg-main1[.content.center.vmiddle[ # Space ]] .row.bg-main2[.content.center[ ## Circle ]] .row.bg-main3[.content.center[ ## Sphere ]] .row.bg-main4[.content.center[ ## Torus ]] .row.bg-main5[.content.center[ ## `\(3d_{z^2}\)` ]] ]]] .column[.content[ .split-five[ .row.bg-main1[.content.center.vmiddle[ # Shape ]] .row.bg-main2[.content.center[ <img src="images/regimes/circle.svg" width="100",height="100"> ]] .row.bg-main3[.content.center[ <img src="images/regimes/sphere.svg" width="100",height="100"> ]] .row.bg-main4[.content.center[ <img src="images/regimes/torus.svg" width="100",height="100"> ]] .row.bg-main5[.content.center[ <img src="images/regimes/orbital.png" width="100",height="100"> ]] ]]] .column[.content[ .split-five[ .row.bg-main1[.content.center.vmiddle[ # `\(\beta_0\)` ]] .row.bg-main2[.content.center[ ## 1 ]] .row.bg-main3[.content.center[ ## 1 ]] .row.bg-main4[.content.center[ ## 1 ]] .row.bg-main5[.content.center[ ## 1 ]] ]]] .column[.content[ .split-five[ .row.bg-main1[.content.center.vmiddle[ # `\(\beta_1\)` ]] .row.bg-main2[.content.center[ ## 1 ]] .row.bg-main3[.content.center[ ## 0 ]] .row.bg-main4[.content.center[ ## 2 ]] .row.bg-main5[.content.center[ ## 1 ]] ]]] .column[.content[ .split-five[ .row.bg-main1[.content.center.vmiddle[ # `\(\beta_2\)` ]] .row.bg-main2[.content.center[ ## 0 ]] .row.bg-main3[.content.center[ ## 1 ]] .row.bg-main4[.content.center[ ## 1 ]] .row.bg-main5[.content.center[ ## 3 ]] ]]] --- count: false class: fade-row2 fade-row3 fade-row4 fade-row5 gray-row2 gray-row3 gray-row4 gray-row5 --- count: false class: fade-row3 fade-row4 fade-row5 gray-row3 gray-row4 gray-row5 --- count: false class: fade-row2 fade-row4 fade-row5 gray-row2 gray-row4 gray-row5 --- count: false class: fade-row2 fade-row3 fade-row5 gray-row2 gray-row3 gray-row5 --- count: false class: fade-row2 fade-row3 fade-row4 gray-row2 gray-row3 gray-row4 --- layout: false # Random Topology * For `\(\mathbb{X}_n\)` observed iid from `\(\mathbb{P}\)`, a simplicial complex, `\(\mathcal{K}(\mathbb{X}_n,r)\)`, is a random-variable measurable w.r.t. `\(\mathbb{P}^{\otimes n}\)` * The Betti numbers, `\(\beta_k \left( \mathcal{K}(\mathbb{X}_n,r) \right) : \mathcal{X}^n \rightarrow \mathbb{N}\)`, are topological summaries<br><br> -- * What are the properties of these **.dblue[random]** topological summaries? `\begin{align} \text{(LLN)} & & \lim\limits_{n\rightarrow \infty}\frac{1}{n}\beta_k\left( \mathcal{K}(\mathbb{X}_n,r) \right) = \color{#db43d5}{\gamma_k(\mathbb{P})} \ \ \text{a.s.} \hspace{2cm}\\ \\ \text{(CLT)} & & \lim\limits_{n\rightarrow \infty}\frac{\beta_k\left( \mathcal{K}(\mathbb{X}_n,r) \right) - \mathbb{E}(\beta_k\left( \mathcal{K}(\mathbb{X}_n,r) \right))}{\sqrt{n}} \sim \color{#db43d5}{\mathcal{N}(0,\sigma^2)} \end{align}` --- layout: true class: left,split-three # Random Topology * For `\(\mathbb{X}_n\)` observed iid from `\(\mathbb{P}\)`, a simplicial complex, `\(\mathcal{K}(\mathbb{X}_n,r)\)`, is a random-variable measurable w.r.t. `\(\mathbb{P}^{\otimes n}\)`<br> * The Betti numbers, `\(\beta_k \left( \mathcal{K}(\mathbb{X}_n,r) \right) : \mathcal{X}^n \rightarrow \mathbb{N}\)`, are topological summaries<br><br> * What are the properties of these **.dblue[random]** topological summaries? * .orchid[<body> The asymptotic behaviour of `\(\mathcal{K}(\mathbb{X}_n,r_n)\)` depends on `\(r_n = r(n)\)` </body>] .column[.content[ <br><br><br><br><br><br><br><br><br><br> .center[ Dense <img src="images/regimes/dense.gif" width="300",height="300"> `\(nr_n^d \rightarrow \infty\)` ] ]] .column[.content[ <br><br><br><br><br><br><br><br><br><br> .center[ Sparse <img src="images/regimes/sparse.gif" width="300",height="300"> `\(nr_n^d \rightarrow 0\)` ] ]] .column[.content[ <br><br><br><br><br><br><br><br><br><br> .center[ Thermodynamic <img src="images/regimes/thermodynamic.gif" width="300",height="300"> `\(nr_n^d \rightarrow t \in (0,\infty)\)` ] ]] --- class: show-000 count: false <br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> --- class: show-100 count: false <br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> --- class: show-110 count: false <br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> --- class: show-111 count: false <br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> --- layout: false count: false # Statistical Invariance of Betti Numbers * Consider a family of distributions* `\(\mathcal{P} = \{\mathbb{P}_\theta : \theta \in \Theta \}\)` and `\(\mathbb{X}^\theta_n = \{\mathbf{X}^\theta_1,\mathbf{X}^\theta_2,\dots,\mathbf{X}^\theta_n\} \sim \mathbb{P}_\theta\)` for `\(\theta \in \Theta\)` .content-box-blue[ `\begin{align} \mathbf{S}(\mathbb{P}_{\theta}^{\otimes n})\!:= \mathbf{S}(\mathbb{X}^\theta_n)\!= \frac{1}{n} \Big( \beta_0\big( \mathcal{K}(\mathbb{X}^\theta_n,r_n) \big), \beta_1\big( \mathcal{K}(\mathbb{X}^\theta_n,r_n) \big), \dots , \beta_d\big( \mathcal{K}(\mathbb{X}^\theta_n,r_n) \big) \Big) \end{align}` ] -- As `\(n \rightarrow \infty\)` and `\(nr_n^d \rightarrow t\)`, the **.dblue[thermodynamic limit]** is the statistical functional `\begin{align} \mathbf{S}(\mathbb{P}_\theta; t) = \lim_{n \rightarrow\infty}\mathbf{S}(\mathbb{P}_{\theta}^{\otimes n}) \end{align}` -- * **.dblue.bolder.text-shadow[Statistical invariance.]** .center[<body> `\(\bbox[20px, border: 2px solid orange]{ \text{For } \theta_1,\theta_2 \in \Theta, \text{ can we have that } \lim\limits_{n\rightarrow \infty}\mathbf{S}(\mathbb{P}_{\theta_1}^{\otimes n}) {=} \lim\limits_{n\rightarrow \infty}\mathbf{S}(\mathbb{P}_{\theta_2}^{\otimes n}) \text{ ? } }\)`</body>] -- .content-box-blue[ **.bolder.text-shadow[Definition.]** `\(\mathcal{P} = \{\mathbb{P}_\theta : \theta \in \Theta \}\)` is said to admit .dorange.bolder[<body> `\(\beta\)`-equivalence </body>] if `\(\mathbf S(\mathbb{P}_\theta; t) = \eta(t)\)` for all `\(\theta \in \Theta\)` ] --- count: true # Statistical Invariance of Betti Numbers 1. .bolder[<body> Can we characterize .dblue[necessary and sufficient] conditions for invariance à la `\(\beta\)`-equivalence? </body>] -- <br> * Yes. When `\(\mathcal{P}\)` has some algebraic structure, i.e., <br> .center[<body> `\(\mathcal{P}\)` is generated by a parametrized group `\(\G = \pb{g_\theta: \theta \in \Theta}\)` </body>] `\(\mathcal{P}\)` admits `\(\beta\)`-equivalence .bolder.text-shadow[iff] the PDFs satisfy a Jacobian constraint w.r.t. the `\(\G\)`-maximal invariant -- <br><br><br> 1. .bolder[<body> If we relax the algerbraic constraint, can we characterize .dblue[ sufficient] conditions for `\(\beta\)`-equivalence? </body>] -- <br> * Yes. When `\(\X\)` admits a smooth fiber bundle structure, i.e., <br> .center[<body> `\(\mathfrak{X} = \mathcal{Y} \rightarrow \X \stackrel{\pi}{\hookrightarrow} \scr{Z}\)`, with base manifold `\(\scr{Z}\)` and canonical fiber space `\(\mathcal{Y}\)` </body>] We can transport mass along the fibers `\(\mathcal{Y}_{\zv}\)` using the local-trivializations such that it preserves the excess mass function. This, in turn, guarantees `\(\beta\)`-equivalence. -- <br><br><br> 1. .bolder[<body> Given `\(\mathcal P = \{\mathbb{P}_\theta : \theta \in \Theta \}\)`, is there a simple way to .dblue[verify] if `\(\mathcal P\)` admits `\(\beta\)`-equivalence? </body>] -- <br> * Yes. If `\(\mathcal P\)` satisfies standard *stochastic regularity assumptions*, it suffices to check the following orthogonality condition for the score function<br> `\begin{align} \Inner{p_{\theta}^k, \f{\partial}{\partial \theta}\log p_\theta}_{L^2(\X)} = 0, \ \ \ \forall k \in \N_0 \end{align}` --- # References <a name=bib-vishwanath2020robust></a>[[1]](#cite-vishwanath2020robust) S. Vishwanath, K. Fukumizu, S. Kuriki, and B. K. Sriperumbudur. "Robust Persistence Diagrams using Reproducing Kernels". In: _Advances in Neural Information Processing Systems_ 33 (2020). <a name=bib-vishwanath2020statistical></a>[[2]](#cite-vishwanath2020statistical) S. Vishwanath, K. Fukumizu, S. Kuriki, and B. Sriperumbudur. "Statistical Invariance of Betti Numbers in the Thermodynamic Regime". In: _arXiv preprint arXiv:2001.00220_ (2020). <a name=bib-cohen2007stability></a>[[3]](#cite-cohen2007stability) D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. "Stability of persistence diagrams". In: _Discrete & Computational Geometry_ 37.1 (2007), pp. 103-120. <a name=bib-chazal2016structure></a>[[4]](#cite-chazal2016structure) F. Chazal, V. De Silva, M. Glisse, and S. Oudot. _The Structure and Stability of Persistence Modules_. AG, CH: Springer, 2016. <a name=bib-kim2012robust></a>[[5]](#cite-kim2012robust) J. Kim and C. D. Scott. "Robust kernel density estimation". In: _Journal of Machine Learning Research_ 13.Sep (2012), pp. 2529-2565. <a name=bib-chazal2011geometric></a>[[6]](#cite-chazal2011geometric) F. Chazal, D. Cohen-Steiner, and Q. Mérigot. "Geometric inference for probability measures". In: _Foundations of Computational Mathematics_ 11.6 (2011), pp. 733-751. <a name=bib-adams2017persistence></a>[[7]](#cite-adams2017persistence) H. Adams, T. Emerson, M. Kirby, R. Neville, et al. "Persistence Images: A stable vector representation of persistent homology". In: _Journal of Machine Learning Research_ 18.1 (2017), pp. 218-252. --- class: inverse, center, middle count: false # Thank you! ## Questions?