Advanced Studies Institute in Mathematics of Data Science & Machine Learning
Date: January 5-13, 2024
Venue: Urgench State University (Uzbekistan)
Contact: Zair Ibragimov (California State University, Fullerton)
E-mail:
zibragimov@fullerton.edu
Key Lecturer:
Guido Montufar
University of California
Los Angeles
Overview
This series of lectures will cover introductory and advanced topics in
data science and machine learning with a focus on mathematical and
statistical aspects. Knowledge outcomes: Understanding of the
mathematical and statistical foundations of data science and machine
learning. Tradeoffs of machine learning in regard to approximation,
estimation, and optimization. Contemporary views of machine learning
with overparametrized models, learning regimes, and algorithmic
regularization. Parameter space and function space perspectives in
learning. Quantitative analysis of parameter optimization and
statistical generalization in overparametrized learning models
covering methods such as the neural tangent kernel and neural network
Gaussian processes. Overview of learning modalities and architectures,
such as graph neural networks, generative models, and transformers.
Topics
- Learning from data
- Statistical learning theory
- Models, architectures, regimes
- Geometric techniques
- Optimization and algorithmic biases
Key Lecturer:
Volodymyr Melnykov
University of Alabama
Overview
Cluster analysis is one of the fundamental unsupervised machine learning problems that aims at constructing data groups in such a way so that observations within each group are similar but data points in different groups are relatively distinct. Applications of cluster analysis can be found in image analysis, pattern recognition, and social network analysis. Model-based clustering is a popular clustering technique relying on the notion of finite mixture models. It assumes the existence of a one-to-one correspondence between data groups and mixture components and provides a highly interpretable and remarkably flexible approach to data partitioning. We will consider several recent developments in model-based clustering supporting the discussion with various illustrative real-life applications.
Topics
- Introduction to unsupervised machine learning and model-based clustering
- Modeling matrix- and tensor-variate data
- Semi-supervised machine learning and model-based clustering
- Modeling time-dependent categorical sequences
- Finite mixture modeling in stylometry
Invited Speakers
Jamolbek Mattiev
Urgench State University, Uzbekistan
Title: Associative classification model based on clustering
Abstract:
The size of collected data is increasing and the number of
rules generated on those datasets is getting bigger. Producing compact
and accurate models is being the most important task of data mining. In
this talk, we present a new associative classifier that utilizes
agglomerative hierarchical clustering. Experimental evaluations show
that proposed method achieves significantly better results than
classical rule learning algorithms in terms of rules on bigger datasets
while maintaining classification accuracy on those datasets.
Michael Murray
University of California, Los Angeles
Title: An Introduction to Benign Overfitting
Abstract: Conventional machine learning wisdom suggests that the generalization error of a complex model will typically be worse versus a simpler model when both are trained to interpolate data. Indeed, the bias-variance trade-off implies that although choosing a complex model is advantageous in terms of approximation error, it comes at the price of an increased risk of overfitting. The traditional solution to managing this trade-off is to use some form of regularization, allowing the optimizer to select a predictor from a rich class of functions while at the same time encouraging it to choose one that is in some sense simple. However, in recent years it has been observed that many models, including deep neural networks, trained with minimal if any form of explicit regularization, can almost perfectly interpolate noisy data with nominal cost to their generalization performance. This phenomenon is referred to as benign or tempered overfitting and there is now great interest in characterizing it mathematically. In this talk I’ll give an introduction and motivation for the topic, describe some of the key results derived so far and and highlight open questions.
Michael Porter
University of Virginia
Title:
Modeling contagion, excitation, and social influence with Hawkes point processes
Abstract:
Many social and physical processes (e.g., crime, conflict, social media activity, financial markets, new product adoption, social network communication, earthquakes, neural spiking, disease spread) produce event point patterns that exhibit clustering. Hawkes, or self-exciting point process, models are a popular choice for modeling the clustering patterns which can be driven by both exogenous influences and endogenous forces like contagion/self-excitement. These models stipulate that each event can be triggered by past events creating a branching structure that produces the endogenous clustering. The contagion effects are modeled by a shot-noise term which aggregates the influence of past events to temporarily increase the event rate following each event. This talk will introduce the hawkes process and illustrate some extensions and uses of hawkes models in three areas: modeling contagion in terrorist attacks, forecasting crime hotspots, and identifying social influence in Yelp restaurant reviews.
Rishi Sonthalia
University of California, Los Angeles
Title: From Classical Regression to the Modern Regime: Surprises for Linear Least Squares Problems
Abstract:
Angelica Torres
Max Planck Institute, Germany
Title: Algebraic Geometry meets Structure from Motion
Abstract: The Structure from Motion (SfM) pipeline aims to create a 3D model of a scene using two-dimensional images as input. The process has four main stages: Feature detection, matching, camera pose, and triangulation. In the first step, features such as points and lines are detected in the images, then they are matched to features appearing in other images. After the matching stage, the actual images are forgotten and the data that remains are tuples of points or lines that are believed to come from the same world object. This is geometric data, hence the toolbox coming from Algebraic Geometry can be used to estimate the camera positions and to triangulate the objects in the scene. In this talk I will introduce the SfM pipeline, and present some of the algebraic varieties that arise when the pinhole camera model is assumed. During the talk we will highlight how some properties of the varieties translate into properties of data and how it can affect the image reconstruction process.
Program Schedule
January 5, 2024
11:00 – 11:20 Registration
11:20 – 11:40 Opening Remarks
Bakhrom Abdullaev - Rector, Urgench State University
Guido Montufar - University of California, Los Angeles
Volodymyr Melnykov - University of Alabama
11:40 – 12:30 Speaker: Zair Ibragimov
Title: Algorithm in the 21st Century
12:30 – 13:30 Lunch
13:30 – 16:00 Free Time/Rest (Faculty Housing)
16:00 – 16:50 Speaker: Guido Montufar
Title: Learning from data, I
17:00 – 17:50 Speaker: Volodymyr Melnykov
Title: Introduction to unsupervised machine learning and model-based clustering, I
18:30 – 21:00 Welcome Reception and Dinner
January 6, 2024
09:00 – 09:50 Speaker: Guido Montufar
Title: Learning from data, II
10:00 – 10:50 Speaker: Guido Montufar
Title: Statistical learning theory, I
11:00 – 11:30 Coffee Break
11:30 – 12:20 Recitation Session (Moderator: Kedar Karhadkar)
12:30 – 13:30 Lunch
13:30 – 16:00 Rest (Faculty Housing)
16:00 – 16:50 Speaker: Volodymyr Melnykov
Title: Introduction to unsupervised machine learning and model-based clustering, II
17:00 – 17:50 Speaker: Volodymyr Melnykov
Title: Advances in model-based clustering
18:30 – 20:30 Dinner
January 7, 2024
09:00 – 09:50 Speaker: Guido Montufar
Title: Statistical learning theory, II
10:00 – 10:50 Speaker: Guido Montufar
Title: Models, architectures, regimes
11:00 – 11:30 Coffee Break
11:30 – 12:20 Recitation Session (Moderator: Kedar Karhadkar)
12:30 – 13:30 Lunch
13:30 – 16:00 Rest (Faculty Housing)
16:00 – 16:50 Speaker: Volodymyr Melnykov
Title: Modeling matrix- and tensor variate data
17:00 – 17:50 Speaker: Volodymyr Melnykov
Title: Semi-supervised machine learning and model-based clustering
18:30 – 20:30 Dinner
January 8, 2024 (KHORAZM MAMUN ACADEMY)
09:00 – 09:50 Speaker: Rishi Sonthalia
Title: TBA
10:00 – 10:50 Speaker: Michael Murray
Title: TBA
11:00 – 11:50 Guided Tour of Mamun Academy
12:00 – 12:50 Speaker: Michael Porter
Title: TBA
13:30 – 15:00 Lunch (Ichan Kala)
15:00 – 19:00 Guided Tour of Ichan Kala
19:00 – 21:00 Dinner (Ichan Kala)
January 9, 2024
09:00 – 09:50 Speaker: Guido Montufar
Title: Geometric techniques
10:00 – 10:50 Speaker: Guido Montufar
Title: Optimization and algorithmic biases
11:00 – 11:30 Coffee Break
11:30 – 12:20 Speaker: Angelica Torres
Title: Algebraic Geometry meets Structure from Motion
12:30 – 13:30 Lunch
13:30 – 15:00 Free Time/Rest (Faculty Housing)
15:00 – 15:50 Speaker: Jamolbek Mattiev
Title: TBA
16:00 – 16:50 Speaker: Volodymyr Melnykov
Title: Modeling time-dependent categorical sequences
17:00 – 17:50 Speaker: Volodymyr Melnykov
Title: Finite mixture modeling in stylometry
18:30 – 21:30 Banquet
Travel dates and cultural program
January 5: Arrival to Urgench via Tashkent (HY53) or via Istanbul (TK262)
January 5-9: Workshop
January 10-11: Khiva-Bukhara (Charter Bus), local sightseeing in Bukhara
January 11-12: Bukhara-Samarkand (Charter Bus), local sightseeing in Samarkand
January 12 (evening): Samarkand-Tashkent (Charter Bus)
January 13: Sightseeing in Tashkent
January 14: Departure from Tashkent
Travel logistics
Uzbekistan Airways has non-stop flights to Tashkent from New York (JFK), Frankfurt (FRA), Munich (MUC), London (LHR), Milan (MXP), Rome (FCO), Paris (CDG), Istanbul (IST), Dubai (DXB), Delhi (DEL) and Beijing (PEK) as well as non-stop flights to/from Urgench (2-3 flights/day).
Turkish Airlines has non-stop flights to Istanbul from major U.S. and European cities as well as non-stop flights from Istanbul to Tashkent (3-4 flights/day) and Urgench (Mondays and Fridays).
Visa to Uzbekistan
U.S. passport holders need a visa to travel to Uzbekistan and electronic visas are available (costs - $20, validity - 30 days, issued in 2-3 business days). Visa applications are processed through the official website pf the Ministry of Foreign Affairs of Uzbekistan at https://e-visa.gov.uz/main. See https://uzbekistan.org/visa/ for Visa Requirements for US citizens and https://mfa.uz/en/pages/visa-republic-uzb for more information including list of coutries whose citizens can travel to Uzbekistan visa-free.
List of Student Participants
- coming soon..