Project

Massive Computational Experiments, Painlessly

Goal: Our goal is reduce the burden of managing massive computational experiments while conducting them in a reproducible way. To this end, we are working on the open source project ClusterJob which handles massive computations and makes it painless to track, harvest and analyze millions of computational jobs. More info on our paper https://web.stanford.edu/~vcs/papers/osbg-MDS2016.pdf
or http://clusterjob.org

We are also teaching a course STATS 285 in which students will learn state-of-the-art technique and tools for painless massive computing. http://explorecourses.stanford.edu/search?view=catalog&filter-coursestatus-Active=on&page=0&catalog=&q=STATS285

Updates
0 new
33
Recommendations
0 new
0
Followers
0 new
84
Reads
3 new
3651

Project log

Hatef Monajemi
added an update
Watch Ali Zaidi as he described distributed tools on Azure for data scientists.
 
Hatef Monajemi
added an update
Watch Riccardo Murri from the university of Zurich address challenges for taking scientific computing to the cloud:
Lecture 08 video:
 
Hatef Monajemi
added an update
Watch Greg Kurtzer (CEO of Sylabs.io) give an in-depth explanation of container technologies, in particular Singularity.
 
Hatef Monajemi
added an update
Lecture 6: “Some reflections about data science“ by John Chambers
Full lecture video:
 
Hatef Monajemi
added an update
Part 1) XYZ Studies, A paradigm for research in data science:
Part 2) Science in the cloud:
 
Hatef Monajemi
added an update
Watch lecture 3 as Hatef Monajemi discusses automation in data science and the necessity of Experiment Management Systems (EMS) in data science.
 
Hatef Monajemi
added an update
Watch Mark Piercy from Stanford Research Computing Center (SRCC) talk about cluster computing basics and the features of Stanford medium-risk research cluster Sherlock.
 
Hatef Monajemi
added an update
Hatef Monajemi
added an update
Abstract:
With increasing computational demands due to ambitious data science studies and the scarcity of computational resources (e.g., GPUs) on university campuses, researchers are forced inescapably to adopt cloud-based solutions for their computing needs.
In this lecture, we lay out the foundation of a new computing model in which researchers build their own ephemeral personal clusters on the cloud, conduct their experiments, and destroy their clusters when they are no longer needed. This is a departure from traditional research computing model where many researchers share a in-house HPC cluster (e.g., Sherlock) with a set of determined policies. The new model will integrate building personal clusters seamlessly with experiment design, job management, data harvesting and data analysis.
Deep learning (DL) research is a prime example that requires massive computational resources, and in particular access to many GPUs. In the lecture, we will review deep learning and explain the computations that are involved in a DL experiment. We will then teach how to do these experiments at scale on Google Cloud push-button: With one push of a button, the researcher builds her own personal cluster and with another push she will fire up 1000's of jobs to her cluster on the cloud.
 
Hatef Monajemi
added an update
Venue: Thornt110
Time : 3:00 PM on Monday Nov 27
Title : Push-button Deep Learning on the Cloud
Abstract:
With increasing computational demands due to ambitious data science studies and the scarcity of computational resources (e.g., GPUs) on university campuses, researchers are forced inescapably to adopt cloud-based solutions for their computing needs.
In this lecture, we lay out the foundation of a new computing model in which researchers build their own ephemeral personal clusters on the cloud, conduct their experiments, and destroy their clusters when they are no longer needed. This is a departure from traditional research computing model where many researchers share a in-house HPC cluster (e.g., Sherlock) with a set of determined policies. The new model will integrate building personal clusters seamlessly with experiment design, job management, data harvesting and data analysis.
Deep learning (DL) research is a prime example that requires massive computational resources, and in particular access to many GPUs. In the lecture, we will review deep learning and explain the computations that are involved in a DL experiment. We will then teach how to do these experiments at scale on Google Cloud push-button: With one push of a button, the researcher builds her own personal cluster and with another push she will fire up 1000's of jobs to her cluster on the cloud.
 
Hatef Monajemi
added an update
Hatef Monajemi
added an update
Lecture03: Occupy The Cloud (Eric Jonas):
Lecture04: Reproducibility in Computational Science (Victoria Stodden):
 
Hatef Monajemi
added an update
Hatef Monajemi
added an update
The slides of the first lecture now are now available on Stats285 website:
 
Hatef Monajemi
added an update
Follow course updates on https://twitter.com/stats285 starting next week.
 
Hatef Monajemi
added an update
Great guest speakers this quarter for Stats285. It is going to be a fun quarter at Stanford: https://stats285.github.io
 
Hatef Monajemi
added a project goal
Our goal is reduce the burden of managing massive computational experiments while conducting them in a reproducible way. To this end, we are working on the open source project ClusterJob which handles massive computations and makes it painless to track, harvest and analyze millions of computational jobs. More info on our paper https://web.stanford.edu/~vcs/papers/osbg-MDS2016.pdf
We are also teaching a course STATS 285 in which students will learn state-of-the-art technique and tools for painless massive computing. http://explorecourses.stanford.edu/search?view=catalog&filter-coursestatus-Active=on&page=0&catalog=&q=STATS285