Advanced Studies Institute in Mathematics of Data Science & Machine Learning

Date: January 5-13, 2024
Venue: Urgench State University (Uzbekistan)
Contact: Zair Ibragimov (California State University, Fullerton)
E-mail: zibragimov@fullerton.edu

Key Lecturer:

Guido Montufar

Guido Montufar

University of California

Los Angeles

 

Overview

This series of lectures will cover introductory and advanced topics in data science and machine learning with a focus on mathematical and statistical aspects. Knowledge outcomes: Understanding of the mathematical and statistical foundations of data science and machine learning. Tradeoffs of machine learning in regard to approximation, estimation, and optimization. Contemporary views of machine learning with overparametrized models, learning regimes, and algorithmic regularization. Parameter space and function space perspectives in learning. Quantitative analysis of parameter optimization and
statistical generalization in overparametrized learning models covering methods such as the neural tangent kernel and neural network Gaussian processes. Overview of learning modalities and architectures,
such as graph neural networks, generative models, and transformers.

Topics

  • Learning from data
  • Statistical learning theory
  • Models, architectures, regimes
  • Geometric techniques
  • Optimization and algorithmic biases

 

Key Lecturer:

Volodymy Melnykov

Volodymyr Melnykov

University of Alabama

 

 

Overview

Cluster analysis is one of the fundamental unsupervised machine learning problems that aims at constructing data groups in such a way so that observations within each group are similar but data points in different groups are relatively distinct. Applications of cluster analysis can be found in image analysis, pattern recognition, and social network analysis. Model-based clustering is a popular clustering technique relying on the notion of finite mixture models. It assumes the existence of a one-to-one correspondence between data groups and mixture components and provides a highly interpretable and remarkably flexible approach to data partitioning. We will consider several recent developments in model-based clustering supporting the discussion with various illustrative real-life applications.

Topics

  • Introduction to unsupervised machine learning and model-based clustering
  • Modeling matrix- and tensor-variate data
  • Semi-supervised machine learning and model-based clustering
  • Modeling time-dependent categorical sequences
  • Finite mixture modeling in stylometry

Invited Speakers

Jamolbek Mattiev

Jamolbek Mattiev

Urgench State University, Uzbekistan

Title: Associative classification model based on clustering

Abstract: The size of collected data is increasing and the number of
rules generated on those datasets is getting bigger. Producing compact
and accurate models is being the most important task of data mining. In
this talk, we present a new associative classifier that utilizes
agglomerative hierarchical clustering. Experimental evaluations show
that proposed method achieves significantly better results than
classical rule learning algorithms in terms of rules on bigger datasets
while maintaining classification accuracy on those datasets.

 

 


 

Michael Murray

Michael Murray

University of California, Los Angeles

Title:  An Introduction to Benign Overfitting

Abstract: Conventional machine learning wisdom suggests that the generalization error of a complex model will typically be worse versus a simpler model when both are trained to interpolate data. Indeed, the bias-variance trade-off implies that although choosing a complex model is advantageous in terms of approximation error, it comes at the price of an increased risk of overfitting. The traditional solution to managing this trade-off is to use some form of regularization, allowing the optimizer to select a predictor from a rich class of functions while at the same time encouraging it to choose one that is in some sense simple. However, in recent years it has been observed that many models, including deep neural networks, trained with minimal if any form of explicit regularization, can almost perfectly interpolate noisy data with nominal cost to their generalization performance. This phenomenon is referred to as benign or tempered overfitting and there is now great interest in characterizing it mathematically. In this talk I’ll give an introduction and motivation for the topic, describe some of the key results derived so far and and highlight open questions.

 

Michael Porter

Michael Porter

University of Virginia

Title: Modeling contagion, excitation, and social influence with Hawkes point processes

Abstract: Many social and physical processes (e.g., crime, conflict, social media activity, financial markets, new product adoption, social network communication, earthquakes, neural spiking, disease spread) produce event point patterns that exhibit clustering. Hawkes, or self-exciting point process, models are a popular choice for modeling the clustering patterns which can be driven by both exogenous influences and endogenous forces like contagion/self-excitement. These models stipulate that each event can be triggered by past events creating a branching structure that produces the endogenous clustering. The contagion effects are modeled by a shot-noise term which aggregates the influence of past events to temporarily increase the event rate following each event. This talk will introduce the hawkes process and illustrate some extensions and uses of hawkes models in three areas: modeling contagion in terrorist attacks, forecasting crime hotspots, and identifying social influence in Yelp restaurant reviews.

Rishi Sonthalia

Rishi Sonthalia

University of California, Los Angeles

Title: From Classical Regression to the Modern Regime: Surprises for Linear Least Squares Problems

Abstract: Linear regression is a problem that has been extensively studied. However, modern machine learning has brought to light many new and exciting phenomena due to overparameterization. In this talk, I briefly introduce the new phenomena observed in recent years. Then, building on this, I present recent theory work on linear denoising. Despite the importance of denoising in modern machine learning and ample empirical work on supervised denoising, its theoretical understanding is still relatively scarce. One concern about studying supervised denoising is that one might not always have noiseless training data from the test distribution. It is more reasonable to have access to noiseless training data from a different dataset than the test dataset. Motivated by this, we study supervised denoising and noisy-input regression under distribution shift. We add three considerations to increase the applicability of our theoretical insights to real-life data and modern machine learning. First, we assume that our data matrices are low-rank. Second, we drop independence assumptions on our data. Third, the rise in computational power and dimensionality of data have made it essential to study non-classical learning regimes. Thus, we work in the non-classical proportional regime, where data dimension $d$ and number of samples N grow as d/N = c + o(1). For this setting, we derive general test error expressions for both denoising and noisy-input regression and study when overfitting the noise is benign, tempered, or catastrophic. We show that the test error exhibits double descent under general distribution shifts, providing insights for data augmentation and the role of noise as an implicit regularizer. We also perform experiments using real-life data, matching the theoretical predictions with under 1% MSE error for low-rank data.

 

Angelica Torres

Angelica Torres

Max Planck Institute, Germany

Title: Algebraic Geometry meets Structure from Motion

Abstract:  The Structure from Motion (SfM) pipeline aims to create a 3D model of a scene using two-dimensional images as input. The process has four main stages: Feature detection, matching, camera pose, and triangulation. In the first step, features such as points and lines are detected in the images, then they are matched to features appearing in other images. After the matching stage, the actual images are forgotten and the data that remains are tuples of points or lines that are believed to come from the same world object. This is geometric data, hence the toolbox coming from Algebraic Geometry can be used to estimate the camera positions and to triangulate the objects in the scene. In this talk I will introduce the SfM pipeline, and present some of the algebraic varieties that arise when the pinhole camera model is assumed. During the talk we will highlight how some properties of the varieties translate into properties of data and how it can affect the image reconstruction process.

 Program Schedule

 

January 5, 2024

8:30 – 10:00                     Arrival/Hotel Check-in

10:00 – 12:30                   Rest 

12:30 – 14:00                   Lunch

14:00 – 14:20                   Registration

14:20 – 14:50                   Opening Remarks

                                    Bakhrom Abdullaev - Rector, Urgench State University

                                    Guido Montufar - University of California, Los Angeles

                                    Volodymyr Melnykov - University of Alabama

                                           Zair Ibragimov - California State University, Fullerton

15:00 – 15:50                    Speaker: Guido Montufar         

                                              Title: Learning from data, I

16:00 – 16:50                   Speaker: Volodymyr Melnykov

                                    Title: Introduction to unsupervised machine learning  and model-based clustering, I

17:00 – 17:50                   Speaker: Jamolbek Mattiev

                                    Title: Associative classification model based on clustering

18:30 – 21:30                   Welcome Reception and Dinner

 

January 6, 2024

09:00 – 09:50                  Speaker: Guido Montufar         

                                              Title: Learning from data, II

10:00 – 10:50                   Speaker: Guido Montufar

                                    Title: Statistical learning theory, I

11:00 – 11:30                     Coffee Break

11:30 – 12:20                    Recitation Session (Moderator: Kedar Karhadkar)

12:30 – 14:00                   Lunch

14:00 – 16:00                   Free Time/Rest (Faculty Housing)

16:00 – 16:50                   Speaker: Volodymyr Melnykov

                                   Title: Introduction to unsupervised machine learning  and model-based clustering, II

17:00 – 17:50                  Speaker: Volodymyr Melnykov

                                  Title: Advances in model-based clustering

18:30 – 21:30                  Dinner

 

January 7, 2024

09:00 – 09:50                Speaker: Guido Montufar         

                                            Title: Statistical learning theory, II

10:00 – 10:50                 Speaker: Guido Montufar

                                  Title: Models, architectures, regimes

11:00 – 11:30                   Coffee Break

11:30 – 12:20                  Recitation Session (Moderator: Kedar Karhadkar)

12:30 – 14:00                 Lunch

14:00 – 16:00                Free Time/Rest (Faculty Housing)

16:00 – 16:50                 Speaker: Volodymyr Melnykov

                                 Title: Modeling matrix- and tensor variate data

17:00 – 17:50                 Speaker: Volodymyr Melnykov

                                 Title: Semi-supervised machine learning and model-based clustering

18:30 – 21:30                 Dinner

 

January 8, 2024 (KHORAZM MAMUN ACADEMY in khiva)

09:00 – 09:50               Speaker: Rishi Sonthalia         

                                           Title: From Classical Regression to the Modern Regime: Surprises for LLS Problem

10:00 – 10:50                Speaker: Michael Murray

                                Title: An Introduction to Benign Overfitting

11:00 – 11:50                 Guided Tour of Mamun Academy

12:00 – 12:50                Speaker: Michael Porter

                                Title: Modeling contagion, excitation, and social influence with Hawkes point processes

13:30 – 14:30                Lunch (Ichan Kala)

14:30 – 18:00                Guided Tour/Free time/shopping in Ichan Kala

18:30 – 21:30                Dinner (Ichan Kala)

 

January 9, 2024

09:00 – 09:50              Speaker: Guido Montufar         

                                          Title: Geometric techniques

10:00 – 10:50               Speaker: Guido Montufar

                               Title: Optimization and algorithmic biases

11:00 – 11:50                Speaker: Angelica Torres

                                         Title: Algebraic Geometry meets Structure from Motion

13:00 – 14:00              Lunch

14:00 – 16:00              Free Time/Rest (Faculty Housing)

16:20 – 17:10              Speaker: Volodymyr Melnykov

                              Title: Modeling time-dependent categorical sequences

17:30 – 18:20              Speaker: Volodymyr Melnykov

                              Title: Finite mixture modeling in stylometry         

19:00 – 22:00              Banquet in Urgench

List of Student Participants

 

  • Ryan Anderson (University of California, Los Angeles)
  • Navya Annapareddy (University of Virginia)
  • Shoira Atanazarova (Romanovski Institute of Mathematics)
  • Oygul Babajanova (Romanovski Institute of Mathematics)
  • Sardor Bekchanov (Urgench State University)
  • Joshua Berlinski (Iowa State University)
  • Hao Duan (University of California, Los Angeles)
  • Adriana Duncan (University of Texas at Austin)
  • Isabella Foes (University of Alabama)
  • Juliann Geraci (University of Nebraska-Lincoln)
  • Chase Holcombe (University of Alabama)
  • Sarvar Iskandarov (Urgench State University)
  • Kedar Karhadkar (University of California, Los Angeles)
  • Adrienne Kinney (University of Arizona)
  • Elmurod Kuriyozov (Urgench State University)
  • Sarah McWaid (Sonoma State University)
  • Sean Mulherin (University of California, Los Angeles)
  • Abhijeet Mulgund (University of Illinois at Chicago)
  • Alexander Myers (University of Nebraska-Lincoln)
  • Klara Olloberganova (Novosibirsk State University)
  • Mahliyo Qodirova (Novosibirsk State University)
  • Ilhom Rahimov (Urgench State University)
  • Ulugbek Salaev (Urgench State University)
  • Raj Sawhney (Claremont Graduate University)
  • Nodirbek Shavkatov (Urgench State University)
  • Jonah Smith (University of Kentucky)
  • Ogabek Sobirov (Urgench State University)
  • Shakhnoza Takhirova (Bowling Green State University)
  • Spencer Wadsworth (Iowa State University)
  • Sheila Whitman (University of Arizona)