International Conference:
Differential Equations for Data Science 2023 (DEDS2023)
Date: February 20(Mon)–22(Wed), 2023
Place: Online (Zoom)
Registration: Registration link for Zoom meeting
Links: DEDS2022, DEDS2021
Aim:
Keywords:
Speakers:
Giovanni Ballarin (University of Mannheim, Germany) Program: PDF
 Monday, February 20
 16:55–17:00: Opening
 Session 1: JST 17:00–20:05
(=UTC 08:00–11:05 =CET 09:00–12:05 =PST 00:00–03:05)  On solving/learning nonlinear PDEs with GPs
 Learning and forecasting the effective dynamics of complex systems across scales
 Forecasting Hamiltonian dynamics with computational graph completion
 Forecasting dynamical systems from irregularlysampled data with kernel methods
 Machinelearningbased spectral methods for partial differential equations
 Tuesday, February 21
 Session 2: JST 17:00–20:10
(=UTC 08:00–11:10 =CET 09:00–12:10 =PST 00:00–03:10)  Kernel Flows and Kernel Mode Decomposition for learning dynamical systems from data
 Nonlocal Kramers–Moyal formulas and data science
 Structural inference of networked dynamical systems
 On a singular limit of the Kobayashi–Warren–Carter energy
 Automatically identifying dynamical systems from data
 Momentum Stiefel optimizer, with applications to suitablyorthogonal attention, and optimal transport
 Wednesday, February 22
 Session 3: JST 17:00–20:10
(=UTC 08:00–11:10 =CET 09:00–12:10 =PST 00:00–03:10)  Reservoir kernels and Volterra series
 Learnability of linear portHamiltonian systems
 Memory of recurrent networks: Do we compute it right?
 Neural delay differential equations and their physical implementations
 From seeing double to modelling seizure dynamics with multifunctional reservoir computers
 Prediction of numerical upscaling for Richards equation using deep learning
Abstracts:
T. = Title, A. = Abstract.
 Houman Owhadi (California Institute of Technology, US)
T. On solving/learning nonlinear PDEs with GPs A. We present a simple, rigorous, and unified framework for solving and learning arbitrary nonlinear PDEs with Gaussian Processes (GPs). The proposed approach: (1) provides a natural generalization of collocation kernel methods to nonlinear PDEs and Inverse Problems, (2) has guaranteed convergence for a very general class of PDEs, and (3) comes with a Bayesian interpretation compatible with a UQ pipeline. It inherits (1) the a priori error bounds of kernel interpolation methods and (2) the (nearlinear) stateoftheart computational complexity of linear solvers for dense kernel matrices. Its generalization to highdimensional and parametric PDEs comes with error bounds exhibiting a tradeoff between dimensionality and regularity (the curse of dimensionality disappears when the problem is sufficiently regular). Its formulation can be interpreted and generalized as an extension of Gaussian Process Regression from the approximation of input/output functions to the completion of arbitrary computational graphs representing dependencies between multiple known and unknown functions and variables.  Pantelis R. Vlachas (ETH Zurich/AI2C Technologies, Switzerland)
T. Learning and forecasting the effective dynamics of complex systems across scales A. Predictive simulations of complex systems are essential for applications ranging from weather forecasting to drug design. The veracity of these predictions hinges on their capacity to capture the effective system dynamics. Massively parallel simulations predict the system dynamics by resolving all spatiotemporal scales, often at a cost that prevents experimentation while their findings may not allow for generalisation. On the other hand reduced order models are fast but limited by the frequently adopted linearization of the system dynamics and/or the utilization of heuristic closures. Here we present a novel systematic framework that bridges large scale simulations and reduced order models to Learn the Effective Dynamics (LED) of diverse complex systems. The framework forms algorithmic alloys between nonlinear machine learning algorithms and the EquationFree approach for modeling complex systems. LED deploys autoencoders to formulate a mapping between fine and coarsegrained representations and evolves the latent space dynamics using recurrent neural networks. The algorithm is validated on benchmark problems and we find that it outperforms state of the art reduced order models in terms of predictability and large scale simulations in terms of cost. LED is applicable to systems ranging from chemistry to fluid mechanics and reduces the computational effort by up to two orders of magnitude while maintaining the prediction accuracy of the full system dynamics. We argue that LED provides a novel potent modality for the accurate prediction of complex systems.  Yasamin Jalalian (California Institute of Technology, US)
T. Forecasting Hamiltonian dynamics with computational graph completion A. Hamiltonian dynamics describe many different physical systems and have wide range of applications from classical to statistical and quantum mechanics. As such, datadriven simulations of Hamiltonian systems are important tools for solving many scientific and engineering problems and have thus been more widely explored during recent years. In this work, we combine the newly developed framework for computational graph completion (CGC) with numerical techniques for dataadaptive kernel regression to interpolate and forecast Hamiltonian systems in a datadriven way. The CGC framework allows us to characterize the dependencies between the unknowns of the system and approximate them by imposing Gaussian priors and computing MAP estimators given the available data. We demonstrate that our method is both accurate and dataefficient on a variety of physical problems including massspring systems, a nonlinear pendulum, and the Hénon–Heiles system.  Jonghyeon Lee (California Institute of Technology, US)
T. Forecasting dynamical systems from irregularlysampled data with kernel methods A. A highly efficient way to predict the future of a dynamical system is to interpolate its vector field with a kernel, where the kernel parameters are learned with an algorithm called Kernel Flows (KF), which uses gradientbased optimization to learn a kernel. However, the classical KF algorithm fails if the observed time series is not regularly sampled in time. In our paper, we solve this problem with a generalization of the flow map of the dynamical system by incorporating time differences between observations in the KF dataadapted kernels; this simple modification leads to a greater forecasting accuracy upon comparison with the original KF algorithm.  Saad Qadeer (Pacific Northwest National Laboratory, US)
T. Machinelearningbased spectral methods for partial differential equations A. A major obstacle in the deployment of spectral methods is the choice of appropriate bases for trial and test spaces. If chosen suitably, these basis functions lead invariably to wellposed discretized problems and wellconditioned linear systems, while the resulting approximate solutions are provably highorder accurate. However, barring domain decomposition approaches, devising such functions for arbitrary geometries from scratch is a hugely challenging task. Fortunately, recently developed operator learning approaches for approximating solution operators, e.g., DeepONets, Fourier Neural Operators, etc., suggest a highly promising route for generating machinelearned basis functions. In this talk, we propose a Galerkin approach for timedependent PDEs that is powered by basis functions gleaned from the DeepONet architecture. We shall outline our procedure for obtaining these basis functions and detail their many favourable properties. Next, we shall present the results of numerical tests for various problems, including advection, advectiondiffusion, viscous and inviscid Burgers’, Korteweg–De Vries, and Kuramoto–Sivashinsky equations. Finally, we will identify potential obstacles in the course of generalization to higher dimensions and suggest possible remedies.  Boumediene Hamzi (California Institute of Technology, US)
T. Kernel Flows and Kernel Mode Decomposition for learning dynamical systems from data A. Regressing the vector field of a dynamical system from a finite number of observed states is a natural way to learn surrogate models for such systems. We present variants of the method of Kernel Flows as simple approaches for learning the kernel that appear in the emulators we use in our work. First, we will talk about the method of parametric and nonparametric kernel flows for learning chaotic dynamical systems. We'll also talk about learning dynamical systems from irregularlysampled time series as well as from partial observations. We will also introduce the method of Sparse Kernel Flows and apply it to learn 132 chaotic dynamical systems. Finally, we extend the method of Kernel Mode Decomposition to design kernels in view of detecting critical transitions in some fastslow random dynamical systems.  Jinqiao Duan (Illinois Institute of Technology, US)
T. Nonlocal Kramers–Moyal formulas and data science A. Dynamical systems in engineering and science are usually under random fluctuations (either Gaussian or nonGaussian noise). Observational, experimental and simulation data for such systems are noisy and abundant. The governing laws for complex dynamical systems are sometimes not known or not completely known.
This presentation is about extracting stochastic governing laws from noisy data for dynamical systems under nonGaussian fluctuations, by nonlocalKramers–Moyal formulas. I will also compare this approach with the (local) Kramers–Moyal formulas in classical case when noisy fluctuations are Gaussian.
This is a joint work with Yang Li and Yubin Lu.  James V. Koch (Old Dominion University, US)
T. Structural inference of networked dynamical systems A. Datadriven modeling of dynamical systems has experienced a surge in the method development in concert with advances in machine learning and artificial intelligence. In this work, we restrict our scope specifically to the problem of datadriven modeling of networked systems; a challenging problem in which one needs to elicit not only the intrinsic physics of individual nodes of the network, but also what nodes talk to whom and how that communication influences nodal behaviors. The implication of such methodology is farreaching—one can begin to infer properties of networked systems with respect to network topology and/or external perturbations, answering hypotheticals such as “what would happen if we remove this node?”  Jun Okamoto (Kyoto University, Japan)
T. On a singular limit of the Kobayashi–Warren–Carter energy A. We consider the singular limit problem of a singlewell Modica–Mortola energy and the Kobayashi–Warren–Carter energy. In this study, we introduce a finer topology of sliced graph convergence of functions into the function space and derive the singular limit of a singlewell Modica–Mortola energy and the Kobayashi–Warren–Carter energy energies in the sense of Gammaconvergence. The energy functional obtained as this singular limit is also shown to have the remarkable property of a minimizing function that is concave concerning the strength of jumps of a function.  Rui Carvalho (Durham University, UK)
T. Automatically identifying dynamical systems from data A. Discovering governing equations from data provides a clearer understanding of the world around us. Scientists have recently deployed machine learning to develop prediction models representing the expansion of many natural occurrences over time. Here, we present our automatic regression for governing equations (ARGOS) method to extract dynamical systems from noisy data. We expand several linear and nonlinear examples to develop a systematic comparison between the identification performance of ARGOS and the recently proposed SINDy with AIC. Our results show that ARGOS demonstrates a higher identification probability for systems of ordinary differential equations contaminated with state measurement noise.  Lingkai Kong (Georgia Institute of Technology, US)
T. Momentum Stiefel optimizer, with applications to suitablyorthogonal attention, and optimal transport A. This talk will report a construction of momentumaccelerated gradient descent algorithms on Riemannian manifolds, focusing on a particular case known as Stiefel manifold. The treatment will be based on, firstly, the design of continuoustime optimization dynamics on the manifold, and then a thoughtful timediscretization that preserves all geometric structures. Since Stiefel manifold corresponds to matrices that satisfy orthogonality constraint, two practical applications will also be described: (1) we markedly improved the performance of trainedfromscratch Vision Transformer by appropriately placing orthogonality into its selfattention mechanism, and (2) our optimizer also makes the useful notion of Projection Robust Wasserstein Distance for highdim. optimal transport even more effective.  Lyudmila Grigoryeva (University of St. Gallen, Switzerland, and University of Warwick, UK)
T. Reservoir kernels and Volterra series A. A universal kernel is constructed whose sections approximate any causal and timeinvariant filter in the fading memory category with inputs and outputs in a finitedimensional Euclidean space. This kernel is built using the reservoir functional associated with a statespace representation of the Volterra series expansion available for any analytic fading memory filter. It is hence called the Volterra reservoir kernel. Even though the statespace representation and the corresponding reservoir feature map are defined on an infinitedimensional tensor algebra space, the kernel map is characterized by explicit recursions that are readily computable for specific data sets when employed in estimation problems using the representer theorem. We showcase the performance of the Volterra reservoir kernel in a popular data science application in relation to bitcoin price prediction.
Paper: https://arxiv.org/abs/2212.14641
Joint work with Lukas Gonon (Imperial College London) and JuanPablo Ortega (NTU, Singapore)  Daiying Yin (Nanyang Technological University, Singapore)
T. Learnability of linear portHamiltonian systems A. A wellspecified parametrization for singleinput/singleoutput (SISO) linear portHamiltonian systems amenable to structurepreserving supervised learning is provided. The construction is based on normal form controllable and observable Hamiltonian representations for those systems, which reveal fundamental relationships between classical notions in control theory and crucial properties in the machine learning context, like structurepreservation and expressive power. More explicitly, it is shown that the equivalence classes of system automorphisms of linear portHamiltonian systems can be explicitly identified with a smooth manifold endowed with global Euclidean coordinates, which allows concluding that the parameter complexity necessary for the replication of the dynamics is only $\mathcal{O}(n)$ and not $\mathcal{O}(n^2)$, as suggested by the standard parametrization of these systems. Furthermore, we show that linear portHamiltonian systems can be learned while remaining agnostic about the dimension of the underlying datagenerating system. Numerical experiments show that this methodology can be used to efficiently estimate linear portHamiltonian systems out of inputoutput realizations, making the contributions in this paper the first example of a structurepreserving machine learning paradigm for linear portHamiltonian systems based on explicit representations of this model category.  Giovanni Ballarin (University of Mannheim, Germany)
T. Memory of recurrent networks: Do we compute it right? A. Numerical evaluations of memory capacity (MC) of linear recurrent networks reported in the literature often contradict known theoretical bounds. In this paper, we study the case of linear echo state networks, for which total memory capacity is proven to be equal to the range of the Kalman controllability matrix. We shed light on various issues that lead to inaccurate numerical estimations of the memory. We investigate and explain in detail the consequences of neglecting them and we prove that when the Krylov structure of linear MC is ignored, it introduces a "memory gap'' between its theoretical and empirical values. As a solution, we develop robust numerical approaches by exploiting a neutrality result of MC with respect to the input mask matrix. Simulations show that memory curves which fully agree with the theory are recovered using the proposed methods. (This is a joint work with Lyudmila Grigoryeva and JuanPablo Ortega.)  Satoshi Sunada (Kanazawa University/JST PRESTO, Japan)
T. Neural delay differential equations and their physical implementations A. Recent work has revealed an interesting connection between deep neural networks and dynamical systems. In this context, the layertolayer information propagation in neural nets can be expressed as the time evolution of dynamical systems. The training of deep neural nets has an association with optimal control of dynamical systems. Here, based on our previous work [1], we introduce a new class of neural ordinary differential equations with time delay and its training scheme based on an optimal control theory. We show that an optimallycontrolled delay system can perform pattern recognition only with a few control signals and a single node [1], in contrast to standard deep neural nets with a huge number of weight parameters and neurons. The feature of controlled delay equations practically allows for its simple physical implementations. In this talk, we will introduce an optoelectronic neural delay system [2] and a new class of training strategies without backpropagation, which is based on direct feedback alignment (DFA) with correlated randomness [3].
[1] G. Furuhata, T. Niiyama, and S. Sunada, “Physical deep learning based on optimally controlled dynamical systems,” Phys. Rev. Applied 15, 034092 (2021).
[2] R. Nogami, K. Kanno, S. Sunada, and A. Uchida, “Experimental demonstration of physical deep learning based on optimal control using optoelectronic delay system,” Proc of NOLTA 2022, B3LC02 (2022).
[3] In preparation.  Andrew Flynn (University College Cork, Ireland)
T. From seeing double to modelling seizure dynamics with multifunctional reservoir computers A. In the pursuit of developing artificially intelligent systems there is much to be gained from dually integrating further physiological features of biological neural networks and knowledge of dynamical systems into machine learning environments. In this talk such a twoarmed approach is employed in order to translate ‘multifunctionality’ from biological to artificial neural networks via the reservoir computing machine learning paradigm. Multifunctionality describes the ability of a single neural network to perform a multitude of mutually exclusive tasks by exploiting a form of multistability. The dynamics of how a reservoir computer achieves multifunctionality when tasked with solving the ‘seeing double’ problem are presented. These results help to identify many new application areas for reservoir computers which are also explored in this talk including, datadriven modelling of multistability, generating chaotic itinerancy for memory recall, and reconstructing dynamical transitions present in the epileptic brain.  Tina Mai (Duy Tan University, Vietnam, and Texas A&M University, US)
T. Prediction of numerical upscaling for Richards equation using deep learning A. In [Sergei Stepanov, Denis Spiridonov, and Tina Mai. Prediction of numerical homogenization using deep learning for the Richards equation. Journal of Computational and Applied Mathematics, 424:114980, 2023. https://doi.org/10.1016/j.cam.2022.114980], for an unsaturated flow in the form of nonlinear Richards equation over heterogeneous media, we build a new coarsescale approximation scheme based on numerical homogenization. Using deep neural networks (DNNs), this strategy provides frequent and rapid estimates of macroscopic parameters. To be more precise, during training a neural network, we employ a training set of random permeability realizations and correspondingly computed macroscopic targets (effective permeability tensor, homogenized stiffness matrix, and righthand side vector). Our proposed deep learning approach, which constructs nonlinear maps between such permeability fields and macroscopic properties, is novel in that it treats the nonlinearity of Richards equation in the predicted coarsescale homogenized stiffness matrix. Numerous numerical experiments on problems involving twodimensional models show how well this method predicts the macroscopic features and hence solutions. Supports:
MIRS, Kanazawa University Link Organizers:
Hayato Chiba (Tohoku University, Japan)
After registering, you will receive a confirmation email containing information about joining the meeting.
(Once you register from the link above, you are able to join all the sessions.)
Note that the Zoom meeting will be available from 30 minutes before each session start time and that the maximum number of registrants is 500.
#registrants: 198, as of February 20.
If the link does not work, copy and paste the following entire URL into your internet browser:
https://us06web.zoom.us/meeting/register/tZ0pcu6trz4uGtMPKKIqFYduL2dbN1SmwJU
This conference is mainly devoted to new mathematical aspects on machine learning algorithms, big data analysis, and other topics in data science area, from a viewpoint of differential equations. In recent years, several interesting connections between differential equations and data science have been found and attract attention from researchers of differential equations. In this conference, we will gather such researchers of differential equations who have interest in data science and try to shed new light on mathematical foundations on the topics in machine learning/data science.
ODE, PDE, Delay DE, Neural ODE, Machine learning, Deep learning, Data science, Big data, Reservoir computing (RC), Physical RC, Graph Laplacian, Universal approximation theory, Edge of chaos, Echo state property, Graphon, Dynamical System, Singular valued decomposition, Variational auto encoder
Rui Carvalho (Durham University, UK)
Jinqiao Duan (Illinois Institute of Technology, US)
Andrew Flynn (University College Cork, Ireland)
Lyudmila Grigoryeva (University of St. Gallen, Switzerland, and University of Warwick, UK)
Boumediene Hamzi (California Institute of Technology, US)
Yasamin Jalalian (California Institute of Technology, US)
James V. Koch (Pacific Northwest National Laboratory, US)
Lingkai Kong (Georgia Institute of Technology, US)
Jonghyeon Lee (California Institute of Technology, US)
Tina Mai (Duy Tan University, Vietnam, and Texas A&M University, US)
Jun Okamoto (Kyoto University, Japan)
Houman Owhadi (California Institute of Technology, US)
Saad Qadeer (Pacific Northwest National Laboratory, US)
Satoshi Sunada (Kanazawa University/JST PRESTO, Japan)
Pantelis R. Vlachas (ETH Zurich/AI2C Technologies, Switzerland)
Daiying Yin (Nanyang Technological University, Singapore)
*This is a version as of February 14, 2023.
*All lectures will be given by invited speakers.
Chair: Hirofumi Notsu
17:00–17:45  Houman Owhadi

17:45–17:55  Break 
17:55–18:40  Pantelis R. Vlachas

18:40–18:50  Break 
18:50–19:15  Yasamin Jalalian

19:15–19:40  Jonghyeon Lee

19:40–20:05  Saad Qadeer

Chair: TBD
17:00–17:45  Boumediene Hamzi

17:45–17:55  Break 
17:55–18:20  Jinqiao Duan

18:20–18:45  James V. Koch

18:45–18:55  Break 
18:55–19:20  Jun Okamoto

19:20–19:45  Rui Carvalho

19:45–20:10  Lingkai Kong

Chair: TBD
17:00–17:45  Lyudmila Grigoryeva

17:45–17:55  Break 
17:55–18:20  Daiying Yin

18:20–18:45  Giovanni Ballarin

18:45–18:55  Break 
18:55–19:20  Satoshi Sunada

19:20–19:45  Andrew Flynn

19:45–20:10  Tina Mai

20:10–20:15  Closing 
JST, CREST, JPMJCR2014 Link
Yoshikazu Giga (The University of Tokyo, Japan)
Lyudmila Grigoryeva (University of St. Gallen, Switzerland)
Boumediene Hamzi (California Institute of Technology, US)
Masato Kimura (Kanazawa University, Japan)
Hiroshi Kokubu (Kyoto University, Japan)
Kohei Nakajima (The University of Tokyo, Japan)
Hirofumi Notsu (Kanazawa University, Japan, Chair)
JuanPablo Ortega (Nanyang Technological University, Singapore)