"A New Platform for Cloud-based Distributed Machine Learning on

Fr i d a y, A p r i l 3 r d , 2 0 1 5 / 2 : 3 0 P M , D o h e r t y H a l l 2 2 1 0
"A New Platform for Cloud-based
Distributed Machine Learning on Big Data"
ABSTRACT
In many modern applications such as web-scale content extraction via topic models, genomewide association mapping via sparse regression, and image understanding via deep neural
networks, one needs to handle BIG machine learning (ML) problems that threaten to exceed the
limit of current architectures and algorithms. While several new system frameworks beyond
Hadoop, notably Spark and GraphLab, have emerged for parallelizing ML programs, good dialogs
between system and ML remain difficult --- most system designs are agnostic to the distinctive
characteristics of ML programs, treating them literally as operation sets as in traditional
programs instead of iterative convergent procedures for optimizing a function, and hence ignore
important properties thereof, such as error tolerance, non-uniform convergence, and structural
coupling, which can fundamentally influence the priorities and goals for system design and open
up new opportunities for improving efficiency. In this talk, I will discuss these opportunities and
present a new framework, Petuum, for distributed machine learning that leverages these
opportunities, and demonstrate how system innovations in light of ML-first principles lead to
multiple orders of magnitude of scalability on a modest lab cluster for a wide range of large scale
problems in text modeling (topic model with 1M topics), social network (mixed-membership
inference on 100M node), personalized genome medicine (sparse regression on 100M
dimensions), and computer vision (deep neural network with billions of parameters), with
provable guarantee on correctness of distributed inference.
Bio
Eric Xing
CMU-LTI
Dr. Eric Xing is a Professor of Machine Learning in the School of Computer
Science at Carnegie Mellon University, and the director of the CMU Center
for Machine Learning and Health under the Pittsburgh Health Data Alliance.
His principal research interests lie in the development of machine learning
and statistical methodology; especially for solving problems involving
automated learning, reasoning, and decision-making in high-dimensional,
multimodal, and dynamic possible worlds in social and biological systems.
Professor Xing received his Ph.D. in Computer Science from UC Berkeley. He
is an associate editor of the Annals of Applied Statistics (AOAS), the Journal
of American Statistical Association (JASA), the IEEE Transaction of Pattern
Analysis and Machine Intelligence (PAMI), the PLoS Journal of
Computational Biology, and an Action Editor of the Machine Learning Journal
(MLJ), the Journal of Machine Learning Research (JMLR). He is a member of
the DARPA Information Science and Technology (ISAT) Advisory Group, and
a Program Chair of ICML 2014.
* LTI colloquium: http://colloquium.lti.cs.cmu.edu
Speaker webpage: http://www.cs.cmu.edu/~epxing
Instructor: Alon Lavie
Administrator: Benjamin Cook