Thèse en cours
Equipe : Apprentissage et Optimisation
Generative modeling: statistical physics of Restricted Boltzmann Machines, learning with missing information and scalable training of Linear Flows
Début le 01/10/2017
Direction : FURTLEHNER, Cyril
Ecole doctorale : ED STIC 580
Etablissement d'inscription : Université Paris-Saclay
Lieu de déroulement : LRI - AO
Soutenue le 09/03/2022 devant le jury composé de :
Directeur de thèse :
- Cyril Furtlehner, Inria Saclay
Co-directeur :
- Aurélien Decelle, Universidad Complutense de Madrid
Rapporteurs:
- Alexandre Allauzen, École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris
- Carlo Baldassi, Bocconi University
- Andrew Saxe, University of Oxford
Examinateurs:
- Muneki Yasuda, Yamagata University
- Martin Weigt, Sorbonne Université
Invité:
- Pierfrancesco Urbani, CNRS, IPhT
Activités de recherche :
Résumé :
Neural network models able to approximate and sample high dimensional probability distributions are known as generative models.
In recent years this class of models has received tremendous attention due to their potential in automatically learning meaningful representations of the vast amount of data that we produce and consume daily.
This thesis presents theoretical and algorithmic results pertaining to generative models and it is divided in two parts.
In the first part, we focus on the Restricted Boltzmann Machine (RBM) and its statistical physics formulation.
Historically, statistical physics has played a central role in studying the theoretical foundations and providing inspiration for neural network models.
The first neural implementation of an associative memory (Hopfield, 1982) is a seminal work in this context.
The RBM can be regarded as a development of the Hopfield model, and it is of particular interest due to its role at the forefront of the deep learning revolution (Hinton et al. 2006).
Exploiting its statistical physics formulation, we derive a mean-field theory of the RBM that allows us to characterize both its functioning as a generative model and the dynamics of its training procedure.
This analysis proves useful in deriving a robust mean-field imputation strategy that makes it possible to use the RBM to learn empirical distributions in the challenging case in which the dataset to model is only partially observed and presents high percentages of missing information.
In the second part we consider a class of generative models known as Normalizing Flows (NF), whose distinguishing feature is the ability to model complex high-dimensional distributions by employing invertible transformations of a simple tractable distribution.
The invertibility of the transformation allows expressing the probability density through a change of variables, whose optimization by Maximum Likelihood (ML) is rather straightforward but computationally expensive.
The common practice is to impose architectural constraints on the class of transformations used for NF, in order to make the ML optimization efficient.
Proceeding from geometrical considerations, we propose a stochastic gradient descent optimization algorithm that exploits the matrix structure of fully connected neural networks without imposing any constraints on their structure other than the fixed dimensionality required by invertibility.
This algorithm is computationally efficient and can scale to very high dimensional datasets.
We demonstrate its effectiveness in training a multilayer nonlinear architecture employing fully connected layers.