Séminaire d'équipe(s) Large-scale Heterogeneous DAta and Knowledge
Distributing Frank-Wolfe via Map-Reduce
Stratis Ioannidis
12 October 2017, 10:30 - 12 October 2017, 12:00 Salle/Bat : 455/PCRI-N
Contact :
Activités de recherche : Web data management
Résumé :
Large-scale optimization problems abound in data mining and
machine learning applications, and the computational challenges they pose are often addressed through parallelization. We identify structural properties under which a convex optimization problem can be massively parallelized via map-reduce operations using the Frank-Wolfe (FW) algorithm. The class of problems that can be tackled this way is quite broad and includes experimental design, AdaBoost, and projection
to a convex hull. Implementing FW via map-reduce eases parallelization and deployment via commercial distributed computing frameworks. We demonstrate this by implementing FW over Spark, an engine for parallel data processing, and establish that parallelization through map-reduce yields significant performance improvements: we solve problems with 10 million variables using 350 cores in 44 minutes; the same operation takes 133 hours when executed serially.