RTG Seminars on Data Science

We invite speakers to present original research in Data Science.

2024 - 2025 Academic Year

Organized by: Wuchen Li (wuchen@mailbox.sc.edu)

This page will be updated as new seminars are scheduled. Make sure to check back each week for information on upcoming seminars.

We will try to offer a virtual option via Zoom, as well as the regular in person option. The Zoom details are listed below:

Join Zoom Meeting

Meeting ID: 942 9769 4178

Passcode: 488494

In addition to the seminars listed here, the department also hosts seminars on Applied and Computational Mathematics.

3/28 - Exploiting time-domain parallelism to accelerate neural network training and PDE constrained optimization

When: 2:30 pm -- 3:30 pm, March 28th, LeConte College 440,

Speaker: Eric C. Cyr, Sandia National Lab

Title: Exploiting time-domain parallelism to accelerate neural network training and PDE constrained optimization

Abstract: This talk will explore methods for accelerating numerical optimization constrained by transient problems using parallelism. Two types of transient problems will be considered. In the first case training algorithms for Neural ODEs will be discussed. Neural ODEs are a class of neural network architecture where the depth of the neural network (the layers) is modeled as a continuous time domain. For the second case, transient PDE-constrained optimization problems will be described. In either case, simulation-based optimization requires repeated executions of the simulator’s forward and backward (adjoint) time integration schemes. Consequently, the arrow of time creates a major sequential bottleneck in the optimization process. Second, for performance these methods rely strongly on the available parallelization for the forward and adjoint solves. Thus, when forward and adjoint solvers are already operating at the limit of strong scaling and hardware utilization, the arrow-of-time bottleneck cannot be overcome by additional parallelization across the spatial grid or network layers.

In the first half of this talk we consider deep neural network models. Deep neural networks are a powerful machine learning tool with the capacity to‚ learn complex nonlinear relationships described by large data sets. Despite their success training these models remains a challenging and computationally intensive undertaking. We will present a layer-parallel training algorithm that exploits a multigrid scheme to accelerate both forward and backward propagation. Introducing a parallel decomposition between layers requires inexact propagation of the neural network. The multigrid method used in this approach stitches these subdomains together with sufficient accuracy to ensure rapid convergence. We demonstrate an order of magnitude wall-clock time speedup over the serial approach, opening a new avenue for parallelism that is complementary to existing approaches. We also discuss applying the layer-parallel methodology to recurrent neural networks. We study the generalized recurrent unit (GRU) architecture. We demonstrate its relation to a simple ODE formulation that facilitates application of the layer-parallel approach. Results are demonstrating performance improvements on a human activity recognition (HAR) data set are presented.

The second half of this talk focuses on PDE-constrained optimization formulations.

Solving optimization problems with transient PDE-constraints is computationally costly due to the number of nonlinear iterations and the cost of solving large-scale KKT matrices. These matrices scale with the size of the spatial discretization times the number of time steps. We propose a new 2-level domain decomposition preconditioner to solve these linear systems when constrained by the heat equation. Our approach leverages the observation that the Schur-complement is elliptic in time, and thus amenable to classical domain decomposition methods. Further, the application of the preconditioner uses existing time integration routines to facilitate implementation and maximize software reuse. The performance of the preconditioner is examined in an empirical study demonstrating the approach is scalable with respect to the number of time steps and subdomains.

8/30 - Wasserstein Gradient Flows of MMD Functionals with Distance Kernel and Cauchy Problems on Quantile Functions - Viktor Stein

When: August 30th 2024 from 3:40 - 4:30 p.m.

Where: LeConte 440

Speaker: Viktor Stein (TU Berlin)

Abstract: We give a comprehensive description of Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals $F_\nu := \text{MMD}_K^2(\cdot, \nu)$ towards given target measures $\nu$ on the real line, where we focus on the negative distance kernel $K(x,y) := -|x-y|$ . In one dimension, the Wasserstein-2 space can be isometrically embedded into the cone $C(0,1) \subset L_2(0,1)$ of quantile functions leading to a characterization of Wasserstein gradient flows via the solution of an associated Cauchy problem on $L_2(0,1)$ . Based on the construction of an appropriate counterpart of $F_\nu$ on $L_2(0,1)$ and its subdifferential, we provide a solution of the Cauchy problem. For discrete target measures $\nu$ , this results in a piecewise linear solution formula. We prove invariance and smoothing properties of the flow on subsets of $C(0,1)$ . For certain $F_\nu$ -flows this implies that initial point measures instantly become absolutely continuous, and stay so over time. Finally, we illustrate the behavior of the flow by various numerical examples using an implicit Euler scheme and demonstrate differences to the explicit Euler scheme, which is easier to compute, but comes with limited convergence guarantees. This is joint work with Richard Duong (TU Berlin), Robert Beinert (TU Berlin), Johannes Hertrich (UCL) and Gabriele Steidl (TU Berlin).

Previous Seminars

2023-2024 Academic Year

Organized by: Wuchen Li (wuchen@mailbox.sc.edu)

This page will be updated as new seminars are scheduled. Make sure to check back each week for information on upcoming seminars.

We will try to offer a virtual option via Zoom, as well as the regular in person option. The Zoom details are listed below:

Zoom Link: https://zoom.us/j/94297694178?pwd=cUs0dTZDeXhjVnN3S1ZIcVJ1RU1sUT09

Meeting ID: 942 9769 4178

Passcode: 488494

4/19 - A First-order computational algorithm for reaction-diffusion type equations via primal-dual hybrid gradient method

When: April 19th 2024 from 3:40 p.m. to 4:30 p.m.

Where: LeConte 444

Speaker: Shu Liu (UCLA)

Abstract: We propose an easy-to-implement iterative method for resolving the implicit (or semi-implicit) schemes arising in reaction-diffusion (RD) type equations. In our treatment, we formulate the nonlinear time implicit scheme on the space-time domain as a min-max saddle point problem and then apply the primal-dual hybrid gradient (PDHG) method. Suitable precondition matrices are applied to accelerate the convergence of our algorithm under different circumstances. Furthermore, we provide conditions that guarantee the convergence of our method for various types of RD-type equations. Several numerical examples as well as comparisons with commonly used numerical methods will also be demonstrated to verify the effectiveness and the accuracy of our method.

4/12 - Sampling in Infinite-Dimensions with Score-Based Diffusion Models

When: April 12^th 2024 from 3:40 p.m. to 4:30 p.m.

Where: Leconte 440

Speaker: Ricardo S. Baptista (California Institute of Technology)

Abstract: Diffusion models have recently emerged as a powerful framework for generative modeling. They consist of a forward process that perturbs input data with Gaussian noise and a reverse process that learns a score function to generate samples by denoising. Despite their tremendous success, they are mostly formulated on finite-dimensional Euclidean spaces, excluding their application to domains such as scientific computing where the data consist of functions. In this presentation, we introduce a framework called Denoising Diffusion Operators (DDOs) for sampling distributions in function space with probabilistic diffusion models. In DDOs, the forward process perturbs input functions using a Gaussian process and learns an appropriate score function in infinite dimensions. We show that our discretized algorithm generates accurate samples at a fixed cost that is independent of the data resolution. We numerically verify that our improvements capture the statistics of high-resolution fluid dynamics problems and popular imaging datasets.

2/28 - Variational Physics-Informed Neural Networks Optimized with Least Squares and Adaptivity in the Test Space

When: February 28^th 2024 from 2:30 p.m. to 3:30 p.m.

Where: Virtual via Zoom

Speaker: David Pardo (University of the Basque Country, Spain)

Abstract: Download here [PDF]

Join Zoom Meeting
Meeting ID: 982 8541 4873
Passcode: 839056

2/2 - Randomized tensor-network algorithms for random data in high-dimensions

When: February 2^nd 2024 from 2:30pm-3:30pm

Where: LeConte 440

Speaker: Yuehaw Khoo, (University of Chicago)

Abstract: Tensor-network ansatz has long been employed to solve the high-dimensional Schrödinger equation, demonstrating linear complexity scaling with respect to dimensionality. Recently, this ansatz has found applications in various machine learning scenarios, including supervised learning and generative modeling, where the data originates from a random process. In this talk, we present a new perspective on randomized linear algebra, showcasing its usage in estimating a density as a tensor-network from i.i.d. samples of a distribution, without the curse of dimensionality, and without the use of optimization techniques. Moreover, we illustrate how this concept can combine the strengths of particle and tensor-network methods for solving high-dimensional PDEs, resulting in enhanced flexibility for both approaches.

11/10 - Geometry of the sliced Wasserstein space

When: November 10^th from 3:40pm-4:40pm

Where: LeConte 440

Speaker: Sangmin Park (Carnegie Mellon University)

Abstract: We study the space of probability measures equipped with the 2-sliced Wasserstein distance SW2, a projection-based variant of the Wasserstein distance with increasing popularity in statistics and machine learning due to computational efficiency especially in high dimensions. Using the language of the Radon transform, we examine the metric differential structure of the sliced Wasserstein space and the induced length space, and deduce that SW2 (and the associated length metric) behave very differently near absolutely continuous and discrete measures. We apply this discrepancy to demonstrate the lack of stability of gradient flows in the sliced Wasserstein (length) space. If time permits, we will also discuss the empirical estimation rate of absolutely continuous measures in the sliced Wasserstein length. This is a joint work with Dejan Slepcev.

10/6 - A bilevel optimization approach for inverse mean-field games

When: October 6^th from 3:40pm-4:40pm

Where: LeConte 440

Speaker: Jiajia Yu (Duke University)

Abstract: Mean-field games study the Nash Equilibrium in a non-cooperative game with infinitely many agents. Most existing works study solving the Nash Equilibrium with given cost functions. However, it is not always straightforward to obtain these cost functions. On the contrary, it is often possible to observe the Nash Equilibrium in real-world scenarios. In this talk, I will discuss a bilevel optimization approach for solving inverse mean-field game problems, i.e., identifying the cost functions that drive the observed Nash Equilibrium. With the bilevel formulation, we retain the essential characteristics of convex objective and linear constraint in the forward problem. This formulation permits us to solve the problem using a gradient-based optimization algorithm with a nice convergence guarantee. We focus on inverse mean-field games with unknown obstacles and unknown metrics and establish the numerical stability of these two inverse problems. In addition, we prove and numerically verify the unique identifiability for the inverse problem with unknown obstacles. This is a joint work with Quan Xiao (RPI), Rongjie Lai (Purdue) and Tianyi Chen (RPI).

9/29 - Entropy dissipation for general Langevin dynamics and its application.

When: September 29^th from 3:40pm--4:40pm

Where: LeConte 440

Speaker: Qi Feng (Florida State University)

Abstract: In this talk, I will discuss long-time dynamical behaviors of Langevin dynamics, including Langevin dynamics on Lie groups and mean-field underdamped Langevin dynamics. We provide unified Hessian matrix conditions for different drift and diffusion coefficients. This matrix condition is derived from the dissipation of a selected Lyapunov functional, namely the auxiliary Fisher information functional. We verify the proposed matrix conditions in various examples. I will also talk about the application in distribution sampling and optimization. This talk is based on several joint works with Erhan Bayraktar and Wuchen Li.

9/22 - High order spatial discretization for variational time implicit schemes: Wasserstein gradient flows and reaction-diffusion system

When: September 22^nd from 3:40pm--4:40pm

Where: LeConte 440 & Zoom (if possible, see link above)

Speaker: Guosheng Fu (University of Norte Dame)

Abstract: We design and compute first-order implicit-in-time variational schemes with high-order spatial discretization for initial value gradient flows in generalized optimal transport metric spaces. We first review some examples of gradient flows in generalized optimal transport spaces from the Onsager principle. We then use a one-step time relaxation optimization problem for time-implicit schemes, namely generalized Jordan-Kinderlehrer-Otto schemes. Their minimizing systems satisfy implicit-in-time schemes for initial value gradient flows with first-order time accuracy. We adopt the first-order optimization scheme ALG2 (Augmented Lagrangian method) and high-order finite element methods in spatial discretization to compute the one-step optimization problem. This allows us to derive the implicit-in-time update of initial value gradient flows iteratively. We remark that the iteration in ALG2 has a simple-to-implement point-wise update based on optimal transport and Onsager's activation functions. The proposed method is unconditionally stable for convex cases. Numerical examples are presented to demonstrate the effectiveness of the methods in two-dimensional PDEs, including Wasserstein gradient flows, Fisher--Kolmogorov-Petrovskii-Piskunov equation, and two and four species reversible reaction-diffusion systems. This is a joint work with Stanley Osher from UCLA and Wuchen Li from University of South Carolina.

Slides

9/1 - Structure-driven algorithm design in reliable and multi-agent machine learning

When: September 1^st from 2:30pm to 3:30pm

Where: LeConte 440

Speaker: Tianyi Lin (MIT)

Abstract: Reliable and multi-agent machine learning has seen tremendous achievements in recent years; yet, the translation from minimization models to min-max optimization models and/or variational inequality models --- two of the basic formulations for reliable and multi-agent machine learning --- is not straightforward. In fact, finding an optimal solution of either nonconvex-nonconcave min-max optimization models or nonmonotone variational inequality models is computationally intractable in general. Fortunately, there exist special structures in many application problems, allowing us to define reasonable optimality criterion and develop simple and provably efficient algorithmic schemes. In this talk, I will present the results on structure-driven algorithm design in reliable and multi-agent machine learning. More specifically, I explain why the nonconvex-concave min-max formulations make sense for reliable machine learning and show how to analyze the simple and widely used two-timescale gradient descent ascent by exploiting such special structure. I also show how a simple and intuitive adaptive scheme leads to a class of optimal second-order variational inequality methods. Finally, I discuss two future research directions for reliable and multi-agent machine learning with potential for significant practical impacts: reliable multi-agent learning and reliable topic modeling.

Notes: This is a joint talk with ACM seminar

2022-2023 Academic Year

March 3, 2023, Wuchen Li, Optimal Ricci curvature Markov chain Monte Carlo methods on finite states

Abstract: In this talk, we construct a new Markov chain Monte Carlo method on finite states with optimal choices of acceptance-rejection ratio functions. We prove that the constructed continuous time Markov jumping process has a global in-time convergence rate in L1 distance. The convergence rate is no less than one-half and is independent of the target distribution. For example, our method recovers the Metropolis-Hastings (MH) algorithm on a two-point state. And it forms a new algorithm for sampling general target distributions. Numerical examples are presented to demonstrate the effectiveness of the proposed algorithm. This is based on a joint work with Linyuan Lu.

February 17, 2023, Changhui Tan, Nonlocal traffic flow models

In this talk, I will discuss a family of traffic flow models. The classical Lighthill-Whitham-Richards model is known to have a finite time shock formation for all generic initial data, which represents the creation of traffic jams. I will introduce a family of nonlocal traffic flow models, with look-ahead interactions. These models can be derived from discrete cellular automata models.
We show an intriguing phenomenon that the nonlocal slowdown interactions prevent traffic jams, under suitable settings. This talk is based on joint works with Thomas Hamori, Yongki Lee and Yi Sun.

January 20, 2023, Zhu Wang, Level Set Learning for Dimensionality Reduction

Abstract: Approximating high-dimensional functions is challenging due to the curse of dimensionality. In this talk, we will discuss the Dimension Reduction via Learning Level Sets for function approximations. The approach contains two major components: one is the pseudo-reversible neural network module that effectively transforms high-dimensional input variables to low-dimensional active variables, the other is the synthesized regression module for approximating function values based on the transformed data in the low-dimensional space. This is a joint work with Prof. Lili Ju and our graduate student Mr. Yuankai Teng, and Dr. Anthony Gruber (Sandia) and Dr. Guannan Zhang (ORNL).

November 4, 2022, Hong Wang, Fractional Calculus for Power-Law Dynamics

Abstract: Anomalously diffusive transport, which exhibits power-law decaying behavior, occurs in many applications along with many other power-law processes. In this talk we will go over related modeling and analysis issues in comparison to normal Fickian diffusive transport that exhibits exponentially decaying behavior. We will show why fractional calculus, in which the order of differentiation may be a function of space, time, the unknown variable, or even a distribution, provides an appropriate modeling tool to these problems than conventional integer-order models do.

October 21, 2022, Tad Dallas, The challenges and frontiers in the analysis of ecological networks

Abstract: Networks in ecology can take many forms, describing interactions between species, dispersal pathways between different habitat patches in space, or associations between different classes of species (e.g., host and parasite species). In this talk, we will explore the different uses and issues present in the analysis of ecological networks and the prediction of potentially missing links in networks. In doing so, we will identify some frontiers in which graph theory may be applied to ecological networks using existing data, model simulations, and laboratory experiments.

Setempber 23, 2022, Wolfgang Dahmen, Compositional Sparsity, Approximation Classes, DNNs, and Parametric Transport Equations

Abstract: This talk is about the intrinsic obstructions encountered when approximating or recovering functions of a large number of variables, commonly subsumed under the term “Curse of Dimensionality”. Problems of this type are ubiquitous in Uncertainty Quantification and machine learning. In particular, we highlight the role of deep neural networks (DNNs) in this context. A new sparsity notion, namely compositional dimension sparsity, is introduced, which is shown to favor efficient approximation by DNNs. It is also indicated that this notion is suited for function classes comprised of solutions to operator equations. This is quantified for solution manifolds of parametric families of transport equations. We focus on this scenario because (i) it cannot be treated well by currently known concepts and (ii) it has interesting ramifications for related more general settings.

September 9, 2022,Xinfeng Liu,Data-driven mathematical modeling, computation and experimental investigation of dynamical heterogeneity in breast cancer

Abstract: Solid tumors are heterogeneous in composition. Cancer stem cells (CSCs) are a highly tumorigenic cell type found in developmentally diverse tumors that are believed to be resistant to standard chemotherapeutic drugs and responsible for tumor recurrence. Thus understanding the tumor growth kinetics is critical for development of novel strategies for cancer treatment. For this talk, I shall introduce mathematical modeling to study Her2 signaling for the dynamical interaction between cancer stem cells (CSCs) and non-stem cancer cells, and our findings reveal that two negative feedback loops are critical in controlling the balance between the population of CSCs and that of non-stem cancer cells. Furthermore, the model with negative feedback suggests that over-expression of the oncogene HER2 leads to an increase of CSCs by regulating the division mode or proliferation rate of CSCs

August 26, 2022, Linyuan Lu, Mean field information Hessian matrices on graphs

We derive mean-field information Hessian matrices on finite graphs. The information'' refers to entropy functions on the probability simplex. And the mean-field" means nonlinear weight functions of probabilities supported on graphs. These two concepts define a mean-field optimal transport type metric. In this metric space, we first derive Hessian matrices of energies on graphs, including linear, interaction energies, entropies. We name their smallest eigenvalues as mean-field Ricci curvature bounds on graphs. We next provide examples on two-point spaces and graph products. We last present several applications of the proposed matrices. E.g., we prove discrete Costa's entropy power inequalities on a two-point space.

2021-2022 Academic Year

April 1, 2022 Peter Binev, Optimal Learning

Abstract:

This talk is about the problem of learning an unknown function f from given data about f. The learning problem is to give an approximation f^{^} to f that predicts the values of f away

from the data. There are numerous settings for this learning problem depending on:

(i) what additional information we have about f (known as a model class assumption);

(ii) how we measure the accuracy of how well f^{^}predicts f;

(iii) what is known about the data and data sites;

(iv) whether the data observations are polluted by noise.

A mathematical description of the optimal performance possible (the smallest possible error of recovery) is known in the presence of a model class assumption. Under standard model class assumptions, we show that a near optimal f^{^}can be found by solving a certain discrete over-parameterized optimization problem with a penalty term. Here, near optimal means that the error is bounded by a fixed constant times the optimal error. This explains the advantage of over-parameterization which is commonly used in modern machine learning. The main results of this talk prove that over-parameterized learning with an appropriate loss function gives a near optimal approximation f^{^}of the function f from which the data is collected. Quantitative bounds are given for how much over-parameterization needs to be employed and how the penalization needs to be scaled in order to guarantee a near optimal recovery of f. An extension of these results to the case where the data is polluted by additive deterministic noise is also given.

This is a joint research project with Andrea Bonito, Ronald DeVore, and Guergana Petrova from Texas A&M University.

Mar. 18, 2022, Wolfgang Dahmen, Accuracy Controlled Data Assimilation for Parabolic Problems

Abstract:
State Estimation or Data Assimilation are about estimating physical states'' of interest from two sources of partial information: data produced by external sensors and a (typically incomplete or uncalibrated) background model, given in terms of a partial differential equation. In this talk we focus on states that ideally satisfy a parabolic equation with known right hand side but unknown initial values. Additional partial information is given in terms of data that represent the unknown state in a subdomain of the whole space-time cylinder up to a fixed time horizon. Recovering the state from this information is known to be a (mildly) ill-posed problem. Earlier contributions employ mesh-dependent regularizations in a fully discrete setting, bypassing a continuous problem formulation. Other contributions, closer to the approach discussed in this talk, consider a regularized least squares formulation first on an
infinite-dimensional level. The essential difference in the present talk is that the least squares formulation exploits the “natural mapping properties” of the underlying forward problem. The main consequences delineating our results from previous work are:

(i) no excess regularity
are needed, thereby mitigating the level of ill-posedness;

(ii) one obtains stronger a priori estimates that are uniform with respect to the regularization parameter;
(iii) error estimates no longer require consistent data; (iv) one obtains rigorous computable a posteriori bounds that
provide stopping criteria for iterative solvers and allow one to estimate data inconsistency and model bias.
The price is to deal with dual norms and their efficient evaluation. We sketch the main concepts and illustrate the results
by numerical experiments.

Mar. 4, 2022, Wuchen Li, Transport optimization methods in Bayesian sampling problems

Abstract:

We present a systematic framework for Nesterov's accelerated gradient flows and Newton flows in the spaces of probabilities embedded with general information metrics. Here two metrics are considered, including the Fisher-Rao metric and the Wasserstein-2 metric. For the Wasserstein-2 metric case, we prove the convergence properties of the accelerated gradient flows and introduce their formulations in Gaussian families. Furthermore, we propose a practical discrete-time algorithm in particle implementations with an adaptive restart technique. Finally, we formulate a novel bandwidth selection method, which learns the Wasserstein-2 gradient direction from Brownian-motion samples. Experimental results, including Bayesian inference, show the strength of the current approach compared with the state-of-the-art. Finally, we discuss some further connections between inverse problems and data/neural network optimization techniques.

Feb. 18 2022, Wolfgang Dahmen, Some Thoughts on PINN - Prediction Capability?

Abstract:

We present some concepts for the construction of nonlinear reduced deep neural network models for parameter dependent families of PDEs. The proposed methodology is based on combining stable variational formulations for the PDE models and regression concepts in machine learning. Central objectives concern:

- avoiding the Curse of Dimensionality in high parameter dimensionality regimes;

- rigorous accuracy quantification of resulting estimators, based on contriving variationally correct training risks that avoid variational crimes, often encountered with Physics Informed Neural Network (PINN) formulations.

We highlight the role of optimization strategies involving dynamic network expansion, currently in progress in our group at UofSC, and high-dimensional sparsity concepts.

Feb. 4 2022, Linyuan Lu, Probabilistic Method for Complex Graph

Abstract: It was observed that many real-world networks such as the
Internet, social networks, biological networks, and Collaboration
graphs have the so-called power law degree distributions.
A graph is called a power law graph if the fraction of vertices with
degree k is approximately proportional to k^{-b} for
some constant b. The classical Erdos and Renyi random graph
model G(n,p) is not suitable for modeling these power law graphs.
Many random graphs models are developed. Among these models, we
directly generalize G(n,p) into random graphs with given expected
degree sequences''. We considered several graph properties such as
the size and volume of the giant component, the average distance/the diameter,
and the spectra. Some theoretic results will be compared to real data.

Department of Mathematics