Conference Paper

Second Workshop on Educational A/B Testing at Scale

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
In this issue, Cantor and colleagues synthesize a broad representation of the literature on the science of learning, and how learning changes over the course of development. Their perspective highlights three important factors about the emerging field of science of learning and development: (1) that it draws insights from increasingly diverse fields of research inquiry, from neuroscience and social science to computer science and adversity science; (2) that it provides a means to understand principles that generalize across learners, and yet also allow individual differences in learning to emerge and inform; and (3) that it recognizes that learning occurs in context, and is thus a shared responsibility between the learner, the instructor, and the environment. Here I discuss how this complex systems dynamical perspective can be integrated with the emerging framework of ‘learning engineering’ to provide a blueprint for significant innovations in education.
Conference Paper
Full-text available
Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At Microsoft’s Bing, the use of controlled experiments has grown exponentially over time, with over 200 concurrent experiments now running on any given day. Running experiments at large scale requires addressing multiple challenges in three areas: cultural/organizational, engineering, and trustworthiness. On the cultural and organizational front, the larger organization needs to learn the reasons for running controlled experiments and the tradeoffs between controlled experiments and other methods of evaluating ideas. We discuss why negative experiments, which degrade the user experience short term, should be run, given the learning value and long-term benefits. On the engineering side, we architected a highly scalable system, able to handle data at massive scale: hundreds of concurrent experiments, each containing millions of users. Classical testing and debugging techniques no longer apply when there are millions of live variants of the site, so alerts are used to identify issues rather than relying on heavy up-front testing. On the trustworthiness front, we have a high occurrence of false positives that we address, and we alert experimenters to statistical interactions between experiments. The Bing Experimentation System is credited with having accelerated innovation and increased annual revenues by hundreds of millions of dollars, by allowing us to find and focus on key ideas evaluated through thousands of controlled experiments. A 1% improvement to revenue equals $10M annually in the US, yet many ideas impact key metrics by 1% and are not well estimated a-priori. The system has also identified many negative features that we avoided deploying, despite key stakeholders’ early excitement, saving us similar large amounts.
The job of a college president
  • A Herbert
  • Simon
  • Herbert A.