Article

Experimentation growth: Evolving trustworthy A/B testing capabilities in online software companies

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Companies need to know how much value their ideas deliver to customers. One of the most powerful ways to accurately measure this is by conducting online controlled experiments (OCEs). To run experiments, however, companies need to develop strong experimentation practices as well as align their organization and culture to experimentation. The main objective of this paper is to demonstrate how to run OCEs at large scale using the experience of companies that succeeded in scaling. Based on case study research at Microsoft, Booking.com, Skyscanner, and Intuit, we present our main contribution—The Experiment Growth Model. This four‐stage model addresses the seven critical aspects of experimentation and can help companies to transform their organizations into learning laboratories where new ideas can be tested with scientific accuracy. Ultimately, this should lead to better products and services.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... INTRODUCTION A/B testing enables companies to make trustworthy datadriven decisions at scale and has been a research area in the software industry for many years [1], [2]. Companies run A/B tests to assess ideas and to safely validate [3] what delivers value to their customers. ...
... Two companies participated in our study, Microsoft, and Outreach. Microsoft is a large-scale software company with many diverse products that are running A/B tests [2] for many years. At Microsoft, over tens of thousands of A/B tests are run every year across the products on the web, client applications infrastructure, user experience, etc. using the experimentation platform ExP where authors of this paper are working at -taking the A/B testing platform team perspective. ...
... However, in case of B2B partnerships there is an alternativeintegrators can sometimes reuse A/B platforms UI and there are good reasons to do this. A/B testing platform teams have been publishing research on the importance of intuitive and comprehensive User Interface (UI) for the process of running and operating an A/B testing platform for many years [1], [2]. A/B testing UI can be as simple as a notebook with sample code on how to make an API call to start an A/B test for new teams starting to run A/B tests or a well-designed and comprehensive user experience. ...
Conference Paper
Full-text available
A/B tests are the gold standard for evaluating product changes. At Microsoft, for example, we run tens of thousands of A/B tests every year to understand how users respond to new designs, new features, bug fixes, or any other ideas we might have on what will deliver value to users. In addition to testing product changes, however, A/B testing is starting to gain momentum as a differentiating feature of platforms or products whose primary purpose may not be A/B testing. As we describe in this paper, organizations such as Azure PlayFab and Outreach have integrated experimentation platforms and offer A/B testing to their customers as one of the many features in their product portfolio. In this paper and based on multiple-case studies, we present the lessons learned from enabling A/B integrations-integrating A/B testing into software products. We enrich each of the learnings with a motivating example, share the trade-offs made along this journey, and provide recommendations for practitioners. Our learnings are most applicable for engineering teams developing experimentation platforms, integrators considering embedding A/B testing into their products, and for researchers working in the A/B testing domain.
... EEM outlines (Fig. 2) that a company typically undergoes 4 phases, namely Crawl, W alk, Run, and F ly, and has to evolve technical, organizational, and business aspects of its operations. [12] Interestingly, the same group of scholars proposed Experimentation Growth Model [15] in an attempt to define what a mature company in regards to controlled experimentation does. The model argues that a mature company should run experiments across most of its functionality, and that all proposed software changes should be a subject to experimentation. ...
... Evidently, numerous works explore the challenges of A/B-testing at scale for large organizations [16], [15], [12], [17], as software giants such as Google and Microsoft utilize A/B testing [20], [21]. The research is scarce on the adoption of A/B testing by smaller companies and development teams, despite its clear benefits [7]. ...
Conference Paper
Full-text available
Online controlled experimentation, and more specifically A/B testing, is an effective method for assessing the impact of software changes. However, when adopting A/B testing, a development team faces various organizational and technical challenges. In this paper, we propose a new notion of reusable controlled experiments (RCE) to simplify and accelerate the adoption of A/B testing for software teams. In its essence, an RCE is a reusable software component supplied with the built-in A/B testing functionality. We provide a proof-of-concept implementation of an RCE, integrate it into a mobile applications in the field of educational technology, and run a experiment to validate the proposed solution. We conclude by checking the resulting integration against the six criteria categories of Experimentation Evolution Model (EEM) to identify the maturity phase for each category. The resulting RCE is found to correspond to the experimentation evolution model's Walk maturity phase in three out of six categories, and to the Crawl phase in the other three categories.
... A/B testing is also actively applied in E-commerce (27 occurrences), with examples from retail giant Amazon [52], the fashion industry [26], and C2C (consumer-to-consumer) businesses, such as Etsy [83] and Facebook marketplace [77]. Next we observe the application of A/B testing in what we group under "interaction" (22 occurrences), with digital communication software, such as Snap [167] and Skype [60], user-operating system interaction [74,56], and application software, such as an App store [33] and mobile games [173]. Lastly, we note the financial application domain (16 occurrences), including studies at Yahoo finance [179] and Alipay [24], transportation (4 occurrences) at for instance Didi Chuxing [66]. ...
... Application of A/B testing 51 [175,123,95,102,16,15,121,171,99,148,33,70,66,52,20,174,107,63,155,150,65,163,170,27,143,2,5,135,149,7,98,147,141,173,19,26,8,114,6,122,50,97,136,125,22,124,128,159,67,3,176] Improving efficiency of A/B testing 20 [1,28,127,164,23,85,47,39,44,86,40,45,46,109,100,83,18,78,37,64] Beyond standard A/B testing 18 [166,38,82,48,77,79,112,134,72,139,126,29,118,75,49,117,30,151] Concrete A/B testing problems 17 [138,73,168,105,146,43,162,14,103,111,71,24,153,137,96,101,25] Pitfalls and challenges of A/B testing 13 [91,88,54,60,167,42,169,120,11,41,110,140,90] Experimentation frameworks and platforms 13 [144,154,106,156,108,9,131,74,21,36,179,177,152] A/B testing at scale 9 [89,160,58,165,81,157,76,57,56] ...
Preprint
In A/B testing two variants of a piece of software are compared in the field from an end user's point of view, enabling data-driven decision making. While widely used in practice, no comprehensive study has been conducted on the state-of-the-art in A/B testing. This paper reports the results of a systematic literature review that analyzed 141 primary studies. The results shows that the main targets of A/B testing are algorithms and visual elements. Single classic A/B tests are the dominating type of tests. Stakeholders have three main roles in the design of A/B tests: concept designer, experiment architect, and setup technician. The primary types of data collected during the execution of A/B tests are product/system data and user-centric data. The dominating use of the test results are feature selection, feature rollout, and continued feature development. Stakeholders have two main roles during A/B test execution: experiment coordinator and experiment assessor. The main reported open problems are enhancement of proposed approaches and their usability. Interesting lines for future research include: strengthen the adoption of statistical methods in A/B testing, improving the process of A/B testing, and enhancing the automation of A/B testing.
... This approach is a process of continuously validating product assumptions, transforming them into hypotheses, prioritizing, and applying the scientific method to test these hypotheses, supporting or refuting them [2]. In this context, practitioners can employ several techniques like iterations with prototypes, gradual rollouts, and controlled experiments [4] but also problem and solution interviews [2]. ...
... Recently, the term is used to describe the process of continuously validating product assumptions, transforming them into hypotheses, prioritizing, and applying the scientific method to test these hypotheses, supporting or refuting them [2]. This concept encompasses several techniques like iterations with prototypes, gradual rollouts, and controlled experiments [4] but also problem and solution interviews [2]. We adopted this second connotation in this paper. ...
Article
Context: Software startups develop innovative, software-intensive products. Given the uncertainty associated with such an innovative context, experimentation, an approach based on validating assumptions about the software product through data obtained from diverse techniques, like A/B tests or interviews, is valuable for these companies. Relying on data rather than opinions reduces the chance of developing unnecessary products or features, improving the likelihood of success, especially in early development stages, when implementing unnecessary features represents a higher risk for companies’ survival. Nevertheless, researchers have argued that the lack of clearly defined practices led to limited adoption of experimentation. Since the first step of the approach is to define hypotheses, testable statements about the software product features, based on which software development teams will create experiments, eliciting hypotheses is a natural first step to develop practices. Objective: We aim to develop a systematic technique for identifying hypotheses in early-stage software startups to support experimentation in these companies and, consequently, improve their software products. Methods: We followed a Design Science approach consisting of an artifact construction process, divided in three phases, and an evaluation within three startups. Results: We developed the HyMap, a hypotheses elicitation technique based on cognitive mapping. It consists of a process conducted by a facilitator using pre-defined questions, supported by a visual language to depict a cognitive map representing the founder’s understanding of the product. Our evaluation showed that founders perceived the artifacts as clear, easy to use, and useful leading to hypotheses and facilitating their idea’s visualization. Conclusion: From a theoretical perspective, our study provides a better understanding of the guidance founders use to develop their startups and, from a practical point of view, a technique to identify hypotheses in early-stage software startups.
... Conducting an analysis of the trial is the third step in A/B testing. Following completion of the A/B test, the original hypothesis is then tested, usually using a statistical test like a students test or Welsh's t-test [13,14]. The designer can then proceed with the feature rollout or the creation of new A/B variations to test in future A/B tests depending on the test's outcome. ...
Article
Full-text available
A high-quality product experience is becoming more important to consumers as a whole as a result of societal progress. The competition is constantly raising the bar for product specifics in their pursuit of a high profit conversion rate. Rapid, high-quality product iteration is essential for product providers looking to increase user viscosity and activity, which in turn increases the profit conversion rate. By inserting logs and analysing statistical data, A/B testing can determine which iterative strategy is more effective by conducting experiments on target users. This paper introduces GirGut, an innovative, open-source AB testing platform designed specifically for web developers. GirGut leverages artificial intelligence to generate and evolve test variants, mimicking the process of natural selection to optimize user engagement and conversion rates. By combining ease of use with powerful AI capabilities, GirGut aims to revolutionize the way developers approach experimentation and optimization in web applications.
... Therefore, Claps et al. [4] suggested using blogs to advertise new content. In addition, applying deployment techniques such as canary and blue-green, or A/B testing, allows the deployment organization to validate the value of new hypotheses on scale [15]. ...
Conference Paper
Full-text available
DevOps has become a widely adopted approach in the software industry, especially among companies developing web-based applications. The main focus of DevOps is to address social and technical bottlenecks along the software flow, from the developers' code changes to delivering these changes to the production environments used by customers. However, DevOps does not consider the software flow's content, e.g., new features, bug fixes, or security patches, and the customer value of each content. In addition, DevOps assumes that a streamlined software flow leads to a continuous value flow, as customers use the new software and extract value-adding content intuitively. However, in a Software-intensive System of Systems (SiSoS), customers need to understand the content of the software flow to validate, test, and adopt their operation procedures before using the new software. Thus, while DevOps has been extensively studied in the context of web-based applications, its adoption in SiSoS is a relatively unexplored area. Therefore, we conducted a case study at a multinational telecommunications provider focusing on 5G systems. Our findings reveal that DevOps has three sub-flows: legacy, feature, and solution. Each sub-flow has distinct content and customer value, requiring a unique approach to extracting it. Our findings highlight the importance of understanding the software flow's content and how each content's value can be extracted when adopting DevOps in SiSoS.
... Rather than separated concerns as generally described, business and technological decisions are mingled, requiring specific practices, as proposed in the Entrepreneurial Software Engineering Model [58]. This aspect is related to the increasing research on experimentation in software engineering [59][60][61][62], especially in software startups [8,63] given the uncertainty regarding users' needs and market [4][5][6]. ...
Article
Full-text available
Context: Defining and designing a software product is not merely a technical endeavor, but also a socio-technical journey. As such, its success is associated with human-related aspects, such as the value users perceive. To handle this issue, the product manager role has become more evident in software-intensive companies. A unique, challenging context for these professionals is constituted by software startups, emerging companies developing novel solutions looking for sustainable and scalable business models. Objective: This study aims to describe the role of product managers in the context of software startups. Method: We performed a Socio-Technical Grounded Theory study using data from blog posts and interviews. Results: The results describe the product manager as a multidisciplinary, general role, not only guiding the product by developing its vision but also as a connector that emerges in a growing company, enabling communication of software development with other areas, mainly business and user experience. The professional performing this role has a background in one of these areas but a broad knowledge and understanding of key concepts of the other areas is needed. We also describe how differences of this role to other lead roles are perceived in practice. Conclusions: Our findings represent several implications for research, such as better understanding of the role transformation in growing software startups, practice, e.g., identifying the points to which a professional migrating to this role should pay attention, and the education of future software developers, by suggesting the inclusion of related topics in the education and training of future software engineers.
... 35 We obtained feedback from the end users with evaluative methods that included qualitative app assessments. We adopted a hybrid mixed-method technique, called usability tests, [36][37][38][39] and qualitative methods, such as user interviews 40 and A/B testing [41][42][43][44] in the development phase. ...
Article
Full-text available
Background Hypertension affects 28.5% of Indians aged 18–69. Real-time registration and follow-up of persons with hypertension are possible with point-of-care digital information systems. We intend to describe herein the experiences of discovering, developing, and deploying a point-of-care digital information system for public health facilities under the India Hypertension Control Initiative. Methods We have adopted an agile and user-centered approach in each phase in selected states of India since 2017. A multidisciplinary team adopted a hybrid approach with quantitative and qualitative methods, such as contextual inquiries, usability testing, and semi-structured interviews with healthcare workers, to document and monitor utility and usability. Results During the discovery phase, we adopted a storyboard technique to understand the requirement of a digital information system. The participatory approach in discovery phase co-designed the information system with the nurses and doctors at Punjab state of India. Simple, which is the developed information system, has a front-end Android mobile application for healthcare workers and a backend dashboard for program managers. As of October 2022, over 24,31,962 patients of hypertension and 8,99,829 diabetes were registered in the information system of 10,017 health facilities. The median duration of registering a new patient was 50 seconds, and for recording a follow-up visit was 14 seconds in the app. High satisfaction was reported in 100 app users’ quarterly interviews. Conclusion Simple was implemented by administering a user-centered approach and agile techniques. It demonstrated high utility and usability among users, highlighting the benefits of a user-centered approach for effective digital health solutions.
... The final MuMIC model supports multiple downstream use cases. For each application, online A/B test (Fabijan et al. 2018) is required to be conducted on real traffic of Booking.com and typically lasts for several weeks. Below are application examples that can benefit from MuMIC: ...
Article
Multi-label image classification is a foundational topic in various domains. Multimodal learning approaches have recently achieved outstanding results in image representation and single-label image classification. For instance, Contrastive Language-Image Pretraining (CLIP) demonstrates impressive image-text representation learning abilities and is robust to natural distribution shifts. This success inspires us to leverage multimodal learning for multi-label classification tasks, and benefit from contrastively learnt pretrained models. We propose the Multimodal Multi-label Image Classification (MuMIC) framework, which utilizes a hardness-aware tempered sigmoid based Binary Cross Entropy loss function, thus enables the optimization on multi-label objectives and transfer learning on CLIP. MuMIC is capable of providing high classification performance, handling real-world noisy data, supporting zero-shot predictions, and producing domain-specific image embeddings. In this study, a total of 120 image classes are defined, and more than 140K positive annotations are collected on approximately 60K Booking.com images. The final MuMIC model is deployed on Booking.com Content Intelligence Platform, and it outperforms other state-of-the-art models with 85.6% GAP@10 and 83.8% GAP on all 120 classes, as well as a 90.1% macro mAP score across 32 majority classes. We summarize the modelling choices which are extensively tested through ablation studies. To the best of our knowledge, we are the first to adapt contrastively learnt multimodal pretraining for real-world multi-label image classification problems, and the innovation can be transferred to other domains.
... Experimentation maturity models (Fabijan et al. 2017(Fabijan et al. , 2018Optimizely 2018;Wider Funnel 2018;Brooks Bell 2015) consist of the phases organizations are likely to go through on the way to being data-driven and running every change through A/B experiments: Crawl, Walk, Run, and Fly. ...
Chapter
Full-text available
Many good resources are available with motivation and explanations about online controlled experiments (Kohavi et al. 2009a, 2020; Thomke 2020; Luca and Bazerman 2020; Georgiev 2018, 2019; Kohavi and Thomke 2017; Siroker and Koomen 2013; Goward 2012; Schrage 2014; King et al. 2017; McFarland 2012; Manzi 2012; Tang et al. 2010). For organizations running online controlled experiments at scale, Gupta et al. (2019) provide an advanced set of challenges. We provide a motivating visual example of a controlled experiment that ran at Microsoft’s Bing. The team wanted to add a feature allowing advertisers to provide links to the target site. The rationale is that this will improve ads quality by giving users more information about what the advertiser’s site provides and allow users to directly navigate to the sub-category matching their intent. Visuals of the existing ads layout (Control) and the new ads layout (Treatment) with site links added are shown in Fig. 1.
... In addition to the adoption of cross-functional teams, sprints, and iterative development, innovation initiatives that aim to generate new and recurring revenue streams require a shift towards customer-driven innovation and lean start-up ways-of-working [21], [22], [23], [24]. In addition, companies need experimentation practices and mechanisms that help them continuously deploy, measure and evaluate what constitutes customer value [25], [26], [27], [28]. As recognized in [24], agile development methods help answer 'how' to build products and how to increase speed in development. ...
... The final MuMIC model supports multiple downstream use cases. For each application, online A/B test (Fabijan et al. 2018) is required to be conducted on real traffic of Booking.com and typically lasts for several weeks. Below are application examples that can benefit from MuMIC: ...
Preprint
Full-text available
Multi-label image classification is a foundational topic in various domains. Multimodal learning approaches have recently achieved outstanding results in image representation and single-label image classification. For instance, Contrastive Language-Image Pretraining (CLIP) demonstrates impressive image-text representation learning abilities and is robust to natural distribution shifts. This success inspires us to leverage multimodal learning for multi-label classification tasks, and benefit from contrastively learnt pretrained models. We propose the Multimodal Multi-label Image Classification (MuMIC) framework, which utilizes a hardness-aware tempered sigmoid based Binary Cross Entropy loss function, thus enables the optimization on multi-label objectives and transfer learning on CLIP. MuMIC is capable of providing high classification performance, handling real-world noisy data, supporting zero-shot predictions, and producing domain-specific image embeddings. In this study, a total of 120 image classes are defined, and more than 140K positive annotations are collected on approximately 60K Booking.com images. The final MuMIC model is deployed on Booking.com Content Intelligence Platform, and it outperforms other state-of-the-art models with 85.6% GAP@10 and 83.8% GAP on all 120 classes, as well as a 90.1% macro mAP score across 32 majority classes. We summarize the modeling choices which are extensively tested through ablation studies. To the best of our knowledge, we are the first to adapt contrastively learnt multimodal pretraining for real-world multi-label image classification problems, and the innovation can be transferred to other domains.
... • Time and resource-heavy: The traditional approaches are a two-step process. Through a test campaign, it explores the opportunities, and then based on the comparative analysis, it exploits the winning digital advertisement or creative to reap the $ benefit (Fabijan et al., 2018;Kohavi & Longbotham, 2017). As a result, it makes the whole test very slow and expensive. ...
Article
Full-text available
One of the core challenges in digital marketing is that the business conditions continuously change, which impacts the reception of campaigns. A winning campaign strategy can become unfavored over time, while an old strategy can gain new traction. In data driven digital marketing and web analytics, A/B testing is the prevalent method of comparing digital campaigns, choosing the winning ad, and deciding targeting strategy. A/B testing is suitable when testing variations on similar solutions and having one or more metrics that are clear indicators of success or failure. However, when faced with a complex problem or working on future topics, A/B testing fails to deliver and achieving long-term impact from experimentation is demanding and resource intensive. This study proposes a reinforcement learning based model and demonstrates its application to digital marketing campaigns. We argue and validate with actual-world data that reinforcement learning can help overcome some of the critical challenges that A/B testing, and popular Machine Learning methods currently used in digital marketing campaigns face. We demonstrate the effectiveness of the proposed technique on real actual data for a digital marketing campaign collected from a firm.
... This means that new features are added only after being tested with real users in live action, unbeknownst to those users, to see that they lead to a favorable response. This practice is widely used in many if not all web-based companies [33], [32], [20]. However, it introduces two ethical difficulties. ...
Preprint
Full-text available
A tenet of open source software development is to accept contributions from users-developers (typically after appropriate vetting). But should this include interventions done as part of research on open source development? Following an incident in which buggy code was submitted to the Linux kernel to see whether it would be caught, we conduct a survey among open source developers and empirical software engineering researchers to see what behaviors they think are acceptable. This covers two main issues: the use of publicly accessible information, and conducting active experimentation. The survey had 224 respondents. The results indicate that open-source developers are largely open to research, provided it is done transparently. In other words, many would agree to experiments on open-source projects if the subjects were notified and provided informed consent, and in special cases also if only the project leaders agree. While researchers generally hold similar opinions, they sometimes fail to appreciate certain nuances in the stand of developers. Examples include observing license restrictions on publishing open-source code and safeguarding the code. Conversely, researchers seem to be more concerned than developers about privacy issues. Based on these results, it is recommended that open source repositories and projects include research considerations in their access guidelines, and that researchers take care to ask permission also when not formally required to do so. We note too that the open source community wants to be heard, so professional societies and IRBs should consult with them when formulating ethics codes.
... Additionally, CE is discussed mainly through randomized controlled experiments (A/B testing). However, CE constitutes a group of techniques that goes beyond randomized controlled experiments, 16,17 and that can encompass many other activities and techniques. ...
Article
Full-text available
Continuous experimentation (CE) refers to a set of practices used by software companies to rapidly assess the usage, value, and performance of deployed software using data collected from customers and systems in the field using an experimental methodology. However, despite its increasing popularity in developing web‐facing applications, CE has not been studied in the development process of business‐to‐business (B2B) mission‐critical systems. By observing the CE practices of different teams, with a case study methodology inside Ericsson, we were able to identify the different practices and techniques used in B2B mission‐critical systems and a description and classification of the four possible types of experiments. We present and analyze each of the four types of experiments with examples in the context of the mission‐critical long‐term evolution (4G) product. These examples show the general experimentation process followed by the teams and the use of the different CE practices and techniques. Based on these examples and the empirical data, we derived the HURRIER process to deliver high‐quality solutions that the customers value. Finally, we discuss the challenges, opportunities, and lessons learned from applying CE and the HURRIER process in B2B mission‐critical systems. The HURRIER process combines existing validation techniques together with experimentation practices to deliver high‐quality software that customers value.
Article
Full-text available
Machine learning (ML) models have gained significant attention in a variety of applications, from computer vision to natural language processing, and are almost always based on big data. There are a growing number of applications and products with built-in machine learning models, and this is the area where software engineering, artificial intelligence and data science meet. The requirement for a system to operate in a real-world environment poses many challenges, such as how to design for wrong predictions the model may make; How to assure safety and security despite possible mistakes; which qualities matter beyond a model’s prediction accuracy; How can we identify and measure important quality requirements, including learning and inference latency, scalability, explainability, fairness, privacy, robustness, and safety. It has become crucial to test thoroughly these models to assess their capabilities and potential errors. Existing software testing methods have been adapted and refined to discover faults in machine learning and deep learning models. This paper covers a taxonomy, a methodologically uniform presentation of all presented solutions to the aforementioned issues, as well as conclusions about possible future development trends. The main contributions of this paper are a classification that closely follows the structure of the ML-pipeline, a precisely defined role of each team member within that pipeline, an overview of trends and challenges in the combination of ML and big data analytics, with uses in the domains of industry and education.
Article
Innovations in software no longer originate in academic research, but rather in industry. This changes the role of academic research from being the wellspring of new innovations to identifying, generalizing and accelerating the adoption of new innovations that originate in industry. In this paper is we argue that the traditional view on the role of academic research in software is outdated and present an alternative approach to academic research. Finally, we exemplify our approach through a concrete case, Software Center.
Article
Technological advancements have made it possible to deliver mobile health interventions to individuals. A novel framework that has emerged from such advancements is the just-in-time adaptive intervention, which aims to suggest the right support to the individuals when their needs arise. The micro-randomized trial design has been proposed recently to test the proximal effects of the components of these just-in-time adaptive interventions. However, the extant micro-randomized trial framework only considers components with a fixed number of categories added at the beginning of the study. We propose a more flexible micro-randomized trial design which allows addition of more categories to the components during the study. Note that the number and timing of the categories added during the study need to be fixed initially. The proposed design is motivated by collaboration on the Diabetes and Mental Health Adaptive Notification Tracking and Evaluation study, which learns to deliver effective text messages to encourage physical activity among patients with diabetes and depression. We developed a new test statistic and the corresponding sample size calculator for the flexible micro-randomized trial using an approach similar to the generalized estimating equation for longitudinal data. Simulation studies were conducted to evaluate the sample size calculators and an R shiny application for the calculators was developed.
Chapter
Recent advancements and trends in data engineering, data analytics, entrepreneurship, and business and societal context in which all this happens will usher a new wave in data entrepreneurship. This calls for new theories, approaches, methods, and techniques and opens up new possibilities for companies to find a competitive edge and, hopefully, to reap the associated benefits. This chapter concludes the book with a kaleidoscopic overview of several important developments, exploring their implications for areas where data science and entrepreneurship meet. Next to a number of implications for practice, this chapter ends with a brief discussion of interesting avenues for future research.KeywordsData entrepreneurshipPractical implicationsFuture research avenuesAI software: MLOpsEdge computingDigital twinsLarge-scale experimentationGovernment regulation
Chapter
Context: Exploratory testing plays an important role in the continuous integration and delivery pipelines of large-scale software systems, but a holistic and structured approach is needed to realize efficient and effective exploratory testing. Objective: This paper seeks to address the need for a structured and reliable approach by providing a tangible model, supporting practitioners in the industry to optimize exploratory testing in each individual case. Method: The reported study includes interviews, group interviews and workshops with representatives from six companies, all multi-national organizations with more than 2,000 employees. Results: The ExET model (Excellence in Exploratory Testing) is presented. It is shown that the ExET model allows companies to identify and visualize strengths and improvement areas. The model is based on a set of key factors that have been shown to enable efficient and effective exploratory testing of large-scale software systems, grouped into four themes: “The testers’ knowledge, experience and personality”, “Purpose and scope”, “Ways of working” and “Recording and reporting”. Conclusions: The validation of the ExET model showed that the model is novel, actionable and useful in practice, showing companies what they should prioritize in order to enable efficient and effective exploratory testing in their organization.
Chapter
Continuous experimentation (CE) refers to a group of practices used by software companies to rapidly assess the usage, value and performance of deployed software using data collected from customers and the deployed system. Despite its increasing popularity in the development of web-facing applications, CE has not been discussed in the development process of business-to-business (B2B) mission-critical systems. We investigated in a case study the use of CE practices within several products, teams and areas inside Ericsson. By observing the CE practices of different teams, we were able to identify the key activities in four main areas and inductively derive an experimentation process, the HURRIER process, that addresses the deployment of experiments with customers in the B2B and with mission-critical systems. We illustrate this process with a case study in the development of a large mission-critical functionality in the Long Term Evolution (4G) product. In this case study, the HURRIER process is not only used to validate the value delivered by the solution but to increase the quality and the confidence from both the customers and the R&D organization in the deployed solution. Additionally, we discuss the challenges, opportunities and lessons learned from applying CE and the HURRIER process in B2B mission-critical systems.
Chapter
Measuring properties of software systems, organizations, and processes has much more to it than meets the eye. Numbers and quantities are at the center of it, but that is far from everything. Software measures (or metrics, as some call them) exist in a context of a measurement program, which involves the technology used to measure, store, process, and visualize data, as well as people who make decisions based on the data and software engineers who ensure that the data can be trusted. z
Chapter
Software developers in big and medium-size companies are working with millions of lines of code in their codebases. Assuring the quality of this code has shifted from simple defect management to proactive assurance of internal code quality. Although static code analysis and code reviews have been at the forefront of research and practice in this area, code reviews are still an effort-intensive and interpretation-prone activity. The aim of this research is to support code reviews by automatically recognizing company-specific code guidelines violations in large-scale, industrial source code. In our action research project, we constructed a machine-learning-based tool for code analysis where software developers and architects in big and medium-sized companies can use a few examples of source code lines violating code/design guidelines (up to 700 lines of code) to train decision-tree classifiers to find similar violations in their codebases (up to 3 million lines of code). Our action research project consisted of (i) understanding the challenges of two large software development companies, (ii) applying the machine-learning-based tool to detect violations of Sun’s and Google’s coding conventions in the code of three large open source projects implemented in Java, (iii) evaluating the tool on evolving industrial codebase, and (iv) finding the best learning strategies to reduce the cost of training the classifiers. We were able to achieve the average accuracy of over 99% and the average F-score of 0.80 for open source projects when using ca. 40K lines for training the tool. We obtained a similar average F-score of 0.78 for the industrial code but this time using only up to 700 lines of code as a training dataset. Finally, we observed the tool performed visibly better for the rules requiring to understand a single line of code or the context of a few lines (often allowing to reach the F-score of 0.90 or higher). Based on these results, we could observe that this approach can provide modern software development companies with the ability to use examples to teach an algorithm to recognize violations of code/design guidelines and thus increase the number of reviews conducted before the product release. This, in turn, leads to the increased quality of the final software.
Chapter
Continuous Integration is a software practice where developers integrate frequently, at least daily. While this is an ostensibly simple concept, it does leave ample room for interpretation: what is it the developers integrate with, what happens when they do, and what happens before they do? These are all open questions with regards to the details of how one implements the practice of continuous integration, and it is conceivable that not all such implementations in the industry are alike. In this paper we show through a literature review that there are differences in how the practice of continuous integration is interpreted and implemented from case to case. Based on these findings we propose a descriptive model for documenting and thereby better understanding implementations of the continuous integration practice and their differences. The application of the model to an industry software development project is then described in an illustrative case study.
Chapter
Measurement programs in large software development organizations contain a large number of indicators, base and derived measures to monitor products, processes and projects. The diversity and the number of these measures causes the measurement programs to become large, combining multiple needs, measurement tools and organizational goals. For the measurement program to effectively support organization’s goals, it should be scalable, automated, standardized and flexible – i.e. robust. In this paper we present a method for assessing the robustness of measurement programs. The method is based on the robustness model which has been developed in collaboration between seven companies and a university. The purpose of the method is to support the companies to optimize the value obtained from the measurement programs and their cost. We evaluated the method at the seven companies and the results from applying the method to each company quantified the robustness of their programs, reflecting the real-world status of the programs and pinpointed strengths and improvements of the programs.
Chapter
In many ways, digitalization has confirmed that the success of new technologies and innovations is fully realized only when these are effectively adopted and integrated into the daily practices of a company. During the last decade, we have seen how the speed of technology developments only accelerates, and there are numerous examples of innovations that have fundamentally changed businesses as well as everyday life for the customers they serve. In the manufacturing industry, automation is key for improving efficiency as well as for increasing safety. In the automotive domain, electrification of cars and autonomous drive technologies are replacing mechanical power and human intervention. In the telecom domain, seamless connectivity and digital infrastructures allow systems to adapt and respond within the blink of an eye. In the security and surveillance domain, intelligent technologies provide organizations with the ability to detect, respond, and mitigate potential risks and threats with an accuracy and preciseness we could only dream about a few decades ago. While these are only a few examples, they reflect how digital technologies, and the ever-increasing access to data, are transforming businesses to an extent that we have only seen the beginnings of.
Chapter
Context: Agile methods have become mainstream even in large-scale systems engineering companies that need to accommodate different development cycles of hardware and software. For such companies, requirements engineering is an essential activity that involves upfront and detailed analysis which can be at odds with agile development methods. Objective: This paper presents a multiple case study with seven large-scale systems companies, reporting their challenges, together with best practices from industry. We also analyse literature about two popular large-scale agile frameworks, SAFe® and LeSS, to derive potential solutions for the challenges. Method: Our results are based on 20 qualitative interviews, five focus groups, and eight cross-company workshops which we used to both collect and validate our results. Results: We found 24 challenges which we grouped in six themes, then mapped to solutions from SAFe®, LeSS, and our companies, when available. Conclusion: In this way, we contribute a comprehensive overview of RE challenges in relation to large-scale agile system development, evaluate the degree to which they have been addressed, and outline research gaps. We expect these results to be useful for practitioners who are responsible for designing processes, methods, or tools for large scale agile development as well as guidance for researchers.
Chapter
Large software companies need to support continuous and fast delivery of customer value both in the short and long term. However, this can be hindered if both the evolution and maintenance of existing systems are hampered by Technical Debt. Although a lot of theoretical work on Technical Debt has been produced recently, its practical management lacks empirical studies. In this paper, we investigate the state of practice in several companies to understand what the cost of managing TD is, what tools are used to track TD, and how a tracking process is introduced in practice. We combined two phases: a survey involving 226 respondents from 15 organizations and an in-depth multiple case study in three organizations including 13 interviews and 79 Technical Debt issues. We selected the organizations where Technical Debt was better tracked in order to distill best practices. We found that the development time dedicated to managing Technical Debt is substantial (an average of 25% of the overall development), but mostly not systematic: only a few participants (26%) use a tool, and only 7.2% methodically track Technical Debt. We found that the most used and effective tools are currently backlogs and static analyzers. By studying the approaches in the companies participating in the case study, we report how companies start tracking Technical Debt and what the initial benefits and challenges are. Finally, we propose a Strategic Adoption Model for the introduction of tracking Technical Debt in software organizations.
Chapter
Even though there were many forerunners, the most widespread reference to Continuous Integration as a method was put forward by Grady Booch in 1990 [44] (page 209). In the 1995 book Microsoft Secrets, Cusumano and Selby interviewed 38 Microsoft employees to document how the world’s leading software provider was managing its own software development [80]. One of the key practices found was the Daily Build concept. In the popular literature, this was described as everyone needed to check in their code and build the product at the end of the workday. In tight connection to the build, some smoke tests were run to ensure that the individual contributions could be integrated. If the build was broken, the person who broke it had to fix his/her code before going home. There was also modern folklore mentioning that the breaker of the build had to wear a funny hat for the remainder of the day.
Chapter
The term artificial intelligence (AI) triggers many things in terms of its inherent meaning and potential. The notion of a machine with the same level of intellect as a human or even far exceeding it is enthralling and scary at the same time. Several science fiction movies build on the HAL 9000 or Terminator theme of artificial intelligence bent on controlling or even exterminating humankind.
Chapter
Development of high-quality complex software, in particular in modern embedded and cyber-physical systems, requires careful attention to the software architecture and design in order to achieve the desired quality attributes. Generally speaking, the evolution in software development methods during the last decade, towards more agile practices with short iterations and early feedback, has focused more on implementation and validation activities than architectural design. It is sometimes argued, even, that the concept of architecture is obsolete in modern software development. However, architectural decisions still have a significant impact on software quality, including crucial aspects like performance, safety, and security. Moreover, although architecture can, and should, evolve over time, it does so at a slow pace compared to implementation changes, meaning that the architecture impacts how quickly new functionality can be implemented in response to changed market needs. Thus, for any long-lived systems, but in particular for systems where for example safety assurance is critical, there is definitely a need to document and reason about architecture. Architectural documentation no longer plays the role of a static, a priori, specification for developers to follow but should rather be viewed as an artifact that continuously evolves together with the implementation.
Chapter
Agile software development is increasingly adopted by companies evolving and maintaining software products to support better planning and tracking the realization of user stories and features. While convincing success stories help to further spread the adoption of Agile, mechatronics-driven companies need guidance to implement Agile for non-software teams. In this comparative case study of three companies from the Nordic region, we systematically investigate expectations and challenges from scaling Agile in organizations dealing with mechatronics development by conducting on-site workshops and surveys. Our findings show that all companies have already successfully implemented Agile in their software teams. The expected main benefit of successfully scaling agile development is a faster time-to-market product development; however, the two main challenges are: (a) An inflexible test environment that inhibits fast feedback to changed or added features, and (b) the existing organizational structure including the company’s mind-set that needs to be opened-up for agile principles.
Chapter
Full-text available
Artificial intelligence (AI) and machine learning (ML) are increasingly broadly adopted in industry, However, based on well over a dozen case studies, we have learned that deploying industry-strength, production quality ML models in systems proves to be challenging. Companies experience challenges related to data quality, design methods and processes, performance of models as well as deployment and compliance. We learned that a new, structured engineering approach is required to construct and evolve systems that contain ML/DL components. In this chapter, we provide a conceptualization of the typical evolution patterns that companies experience when employing ML as well as an overview of the key problems experienced by the companies that we have studied. The main contribution of the chapter is a research agenda for AI engineering that provides an overview of the key engineering challenges surrounding ML solutions and an overview of open items that need to be addressed by the research community at large.
Chapter
Background: Profiling software development projects, in order to compare them, find similar sub-projects or sets of activities, helps to monitor changes in software processes. Since we lack objective measures for profiling or hashing, researchers often fall back on manual assessments. Objective: The goal of our study is to define an objective and intuitive measure of similarity between software development projects based on software defect-inflow profiles. Method: We defined a measure of project similarity called SimSAX which is based on segmentation of defect-inflow profiles, coding them into strings (sequences of symbols) and comparing these strings to find so-called motifs. We use simulations to find and calibrate the parameters of the measure. The objects in the simulations are two different large industry projects for which we know the similarity a priori, based on the input from industry experts. Finally, we apply the measure to find similarities between five industrial and six open source projects. Results: Our results show that the measure provides the most accurate simulated results when the compared motifs are long (32 or more weeks) and we use an alphabet of 5 or more symbols. The measure provides the possibility to calibrate for each industrial case, thus allowing to optimize the method for finding specific patterns in project similarity. Conclusions: We conclude that our proposed measure provides a good approximation for project similarity. The industrial evaluation showed that it can provide a good starting point for finding similar periods in software development projects.
Chapter
In model-based development projects, models at different abstraction levels capture different aspects of a software system, e.g., specification or design. Inconsistencies between these models can cause inefficient and incorrect development. A tool-based framework to assist developers creating and maintaining models conforming to different languages (i.e. heterogeneous models) and consistency between them is not only important but also much needed in practice. In this work, we focus on assisting developers bringing about multi-view consistency in the context of agile model-based development, through frequent, lightweight consistency checks across views and between heterogeneous models. The checks are lightweight in the sense that they are easy to create, edit, use and maintain, and since they find inconsistencies but do not attempt to automatically resolve them. With respect to ease of use, we explicitly separate the two main concerns in defining consistency checks, being (i) which modelling elements across heterogeneous models should be consistent with each other and (ii) what constitutes consistency between them. We assess the feasibility and illustrate the potential usefulness of our consistency checking approach, from an industrial agile model-based development point-of-view, through a proof-of-concept implementation on a sample project leveraging models expressed in SysML and Simulink. A continuous integration pipeline hosts the initial definition and subsequent execution of consistency checks, it is also the place where the user can view results of consistency checks and reconfigure them.
Chapter
Agile software development is well-known for its focus on close customer collaboration and customer feedback. In emphasizing flexibility, efficiency and speed, agile practices have led to a paradigm shift in how software is developed. However, while agile practices have succeeded in involving the customer in the development cycle, there is an urgent need to learn from customer usage of software also after delivering and deployment of the software product. The concept of continuous deployment, i.e. the ability to deliver software functionality frequently to customers and subsequently, the ability to continuously learn from real-time customer usage of software, has become attractive to companies realizing the potential in having even shorter feedback loops. However, the transition towards continuous deployment involves a number of barriers. This paper presents a multiple-case study in which we explore barriers associated with the transition towards continuous deployment. Based on interviews at four different software development companies we present key barriers in this transition as well as actions that need to be taken to address these.
Chapter
Software development companies are increasingly aiming to become data-driven by trying to continuously experiment with the products used by their customers. Although familiar with the competitive edge that the A/B testing technology delivers, they seldom succeed in evolving and adopting the methodology. In this paper, and based on an exhaustive and collaborative case study research in a large software-intense company with highly developed experimentation culture, we present the evolution process of moving from ad-hoc customer data analysis towards continuous controlled experimentation at scale. Our main contribution is the “Experimentation Evolution Model” in which we detail three phases of evolution: technical, organizational and business evolution. With our contribution, we aim to provide guidance to practitioners on how to develop and scale continuous experimentation in software organizations with the purpose of becoming data-driven at scale.
Conference Paper
Continuous Experimentation (CE) has become increasingly popular across industry and academic communities. Given its rapid evolution in software engineering (SE), the lack of a common understanding of CE can jeopardize new implementations and justify research efforts. Therefore, this literature study characterizes CE in SE based on its definitions, processes, and strategies for experimentation available in the technical literature. Seventy-six sources of information provided many different definitions, processes, and experimental procedures used to describe CE in SE. Despite the increasing use of CE in SE, it is impossible to observe a common terminology yet to support its characterization and use.
Article
Stochastic models are widely used to verify whether systems satisfy their reliability, performance and other nonfunctional requirements. However, the validity of the verification depends on how accurately the parameters of these models can be estimated using data from component unit testing, monitoring, system logs, etc. When insufficient data are available, the models are affected by epistemic parametric uncertainty, the verification results are inaccurate, and any engineering decisions based on them may be invalid. To address these problems, we introduce VERACITY, a tool-supported iterative approach for the efficient and accurate verification of nonfunctional requirements under epistemic parameter uncertainty. VERACITY integrates confidence-interval quantitative verification with a new adaptive uncertainty reduction heuristic that collects additional data about the parameters of the verified model by unit-testing specific system components over a series of verification iterations. VERACITY supports the quantitative verification of discrete-time Markov chains, deciding which components are to be tested in each iteration based on factors that include the sensitivity of the model to variations in the parameters of different components, and the overheads (e.g., time or cost) of unit-testing each of these components. We show the effectiveness and efficiency of VERACITY by using it for the verification of the nonfunctional requirements of a tele-assistance service-based system and an online shopping web application.
Article
Full-text available
Online experimentation platforms abstract away many of the details of experimental design, ensuring experimenters do not have to worry about sampling, randomisation, subject tracking, data collection, metric definition and interpretation of results. The recent success and rapid adoption of these platforms in the industry might in part be attributed to the ease-of-use these abstractions provide. Previous authors have pointed out there are common pitfalls to avoid when running controlled experiments on the web and emphasised the need for experts familiar with the entire software stack to be involved in the process. In this paper, we argue that these pitfalls and the need to understand the underlying complexity are not the result of shortcomings specific to existing platforms which might be solved by better platform design. We postulate that they are a direct consequence of what is commonly referred to as "the law of leaky abstractions". That is, it is an inherent feature of any software platform that details of its implementation leak to the surface, and that in certain situations, the platform's consumers necessarily need to understand details of underlying systems in order to make proficient use of it. We present several examples of this concept, including examples from literature, and suggest some possible mitigation strategies that can be employed to reduce the impact of abstraction leakage. The conceptual framework put forward in this paper allows us to explicitly categorize experimentation pitfalls in terms of which specific abstraction is leaking, thereby aiding implementers and users of these platforms to better understand and tackle the challenges they face.
Conference Paper
Full-text available
Online controlled experiments (for example A/B tests) are increasingly being performed to guide product development and accelerate innovation in online software product companies. The benefits of controlled experiments have been shown in many cases with incremental product improvement as the objective. In this paper, we demonstrate that the value of controlled experimentation at scale extends beyond this recognized scenario. Based on an exhaustive and collaborative case study in a large software-intensive company with highly developed experimentation culture, we inductively derive the benefits of controlled experimentation. The contribution of our paper is twofold. First, we present a comprehensive list of benefits and illustrate our findings with five case examples of controlled experiments conducted at Microsoft. Second, we provide guidance on how to achieve each of the benefits. With our work, we aim to provide practitioners in the online domain with knowledge on how to use controlled experimentation to maximize the benefits on the portfolio, product and team level.
Conference Paper
Full-text available
Software development companies are increasingly aiming to become data-driven by trying to continuously experiment with the products used by their customers. Although familiar with the competitive edge that the A/B testing technology delivers, they seldom succeed in evolving and adopting the methodology. In this paper, and based on an exhaustive and collaborative case study research in a large software-intense company with highly developed experimentation culture, we present the evolution process of moving from ad-hoc customer data analysis towards continuous controlled experimentation at scale. Our main contribution is the " Experimentation Evolution Model " in which we detail three phases of evolution: technical, organizational and business evolution. With our contribution, we aim to provide guidance to practitioners on how to develop and scale continuous experimentation in software organizations with the purpose of becoming data-driven at scale.
Chapter
Full-text available
The Internet connectivity of client software (e.g., apps running on phones and PCs), websites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control/treatment tests, and online field experiments. Unlike most data mining techniques for finding correlational patterns, controlled experiments allow establishing a causal relationship with high probability. Experimenters can utilize the scientific method to form a hypothesis of the form “If a specific change is introduced, will it improve key metrics?” and evaluate it with real users.
Article
Full-text available
The emerging internet of things (IoT) technology has immense potential for unprecedented business offerings in various domains. To provide reliable IoT products and services that comply with regulatory demands, businesses must meet users’ data protection and privacy needs. With the General Data Protection Regulation (GPDR) coming into force from 24th May, 2016 and applicable from 25th May, 2018, IoT businesses must strategise privacy alignment for their products or services by incorporating in their design the privacy and data protection capabilities necessary for regulatory compliance and gaining user trust. This paper discusses the associated data protection and user privacy concerns, making reference to such IoT service offerings as smart retail, the smart home, smart wearables, smart health devices, smart television and smart toys. The three steps to privacy alignment strategy discussed in this paper comprise the privacy inquisition (PI) analysis model, the IoT privacy impact assessment (iPIA) and the privacy state transition process through which IoT businesses pass on their path to attaining ‘perfect alignment’ with respect to the GDPR data protection requirements and user privacy needs. Privacy inquisition, iPIA and privacy state transition should be performed on a periodic basis, preferably under the guidance of a privacy governance board with supervisory authority and representation from the organisation’s board of directors, the controller and the data protection officer. Available at: http://www.ingentaconnect.com/content/hsp/jdpp/2016/00000001/00000001/art00009
Conference Paper
Full-text available
Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested on web sites, mobile applications, desktop applications, services, and operating system features. One of the key challenges for organizations that run controlled experiments is to select an Overall Evaluation Criterion (OEC), i.e., the criterion by which to evaluate the different variants. The difficulty is that short-term changes to metrics may not predict the long-term impact of a change. For example, raising prices likely increases short-term revenue but also likely reduces long-term revenue (customer lifetime value) as users abandon. Degrading search results in a Search Engine causes users to search more, thus increasing query share short-term, but increasing abandonment and thus reducing long-term customer lifetime value. Ideally, an OEC is based on metrics in a short-term experiment that are good predictors of long-term value. To assess long-term impact, one approach is to run long-term controlled experiments and assume that long-term effects are represented by observed metrics. In this paper we share several examples of long-term experiments and the pitfalls associated with running them. We discuss cookie stability, survivorship bias, selection bias, and perceived trends, and share methodologies that can be used to partially address some of these issues. While there is clearly value in evaluating long-term trends, experimenters running long-term experiments must be cautious, as results may be due to the above pitfalls more than the true delta between the Treatment and Control. We hope our real examples and analyses will sensitize readers to the issues and encourage the development of new methodologies for this important problem
Article
Full-text available
The proper alignment of requirements engineering and testing (RET) can be key to software's success. Three practices can provide effective RET alignment: using test cases as requirements, harvesting trace links, and reducing distances between requirements engineers and testers. The Web extra https://youtu.be/M65ZKxfxqME is an audio podcast of author Elizabeth Bjarnason reading the the Requirements column she cowrote with Markus Borg.
Chapter
Full-text available
The internet connectivity of client software (e.g., apps running on phones and PCs), web sites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control/treatment tests, and online field experiments. Unlike most data mining techniques for finding correlational patterns, controlled experiments allow establishing a causal relationship with high probability. Experimenters can utilize the Scientific Method to form a hypothesis of the form " If a specific change is introduced, will it improve key metrics? " and evaluate it with real users.
Article
Full-text available
Logging statements are used to record valuable runtime information about applications. Each logging statement is assigned a log level such that users can disable some verbose log messages while allowing the printing of other important ones. However, prior research finds that developers often have difficulties when determining the appropriate level for their logging statements. In this paper, we propose an approach to help developers determine the appropriate log level when they add a new logging statement. We analyze the development history of four open source projects (Hadoop, Directory Server, Hama, and Qpid), and leverage ordinal regression models to automatically suggest the most appropriate level for each newly-added logging statement. First, we find that our ordinal regression model can accurately suggest the levels of logging statements with an AUC (area under the curve; the higher the better) of 0.75 to 0.81 and a Brier score (the lower the better) of 0.44 to 0.66, which is better than randomly guessing the appropriate log level (with an AUC of 0.50 and a Brier score of 0.80 to 0.83) or naively guessing the log level based on the proportional distribution of each log level (with an AUC of 0.50 and a Brier score of 0.65 to 0.76). Second, we find that the characteristics of the containing block of a newly-added logging statement, the existing logging statements in the containing source code file, and the content of the newly-added logging statement play important roles in determining the appropriate log level for that logging statement.
Conference Paper
Full-text available
Continuous deployment is the software engineering practice of deploying many small incremental software updates into production, leading to a continuous stream of 10s, 100s, or even 1,000s of deployments per day. High-profile Internet firms such as Amazon, Etsy, Facebook, Flickr, Google, and Netflix have embraced continuous deployment. However, the practice has not been covered in textbooks and no scientific publication has presented an analysis of continuous deployment. In this paper, we describe the continuous deployment practices at two very different firms: Facebook and OANDA. We show that continuous deployment does not inhibit productivity or quality even in the face of substantial engineering team and code size growth. To the best of our knowledge, this is the first study to show it is possible to scale the size of an engineering team by 20X and the size of the code base by 50X without negatively impacting developer productivity or software quality. Our experience suggests that top-level management support of continuous deployment is necessary, and that given a choice, developers prefer faster deployment. We identify elements we feel make continuous deployment viable and present observations from operating in a continuous deployment environment.
Conference Paper
Full-text available
With agile teams becoming increasingly multi-disciplinary and including all functions, the role of customer feedback is gaining momentum. Today, companies collect feedback directly from customers, as well as indirectly from their products. As a result, companies face a situation in which the amount of data from which they can learn about their customers is larger than ever before. In previous studies, the collection of data is often identified as challenging. However, and as illustrated in our research, the challenge is not the collection of data but rather how to share this data among people in order to make effective use of it. In this paper, and based on case study research in three large software-intensive companies, we (1) provide empirical evidence that ‘lack of sharing’ is the primary reason for insufficient use of customer and product data, and (2) develop a model in which we identify what data is collected, by whom data is collected and in what development phases it is used. In particular, the model depicts critical hand-overs where certain types of data get lost, as well as the implications associated with this. We conclude that companies benefit from a very limited part of the data they collect, and that lack of sharing of data drives inaccurate assumptions of what constitutes customer value.
Article
Full-text available
Context: An experiment-driven approach to software product and service development is gaining increasing attention as a way to channel limited resources to the efficient creation of customer value. In this approach, software capabilities are developed incrementally and validated in continuous experiments with stakeholders such as customers and users. The experiments provide factual feedback for guiding subsequent development. Objective: This paper explores the state of the practice of experimentation in the software industry. It also identifies the key challenges and success factors that practitioners associate with the approach. Method: A qualitative survey based on semi-structured interviews and thematic coding analysis was conducted. Ten Finnish software development companies, represented by thirteen interviewees, participated in the study. Results: The study found that although the principles of continuous experimentation resonated with industry practitioners, the state of the practice is not yet mature. In particular, experimentation is rarely systematic and continuous. Key challenges relate to changing the organizational culture, accelerating the development cycle speed, and finding the right measures for customer value and product success. Success factors include a supportive organizational culture, deep customer and domain knowledge, and the availability of the relevant skills and tools to conduct experiments. Conclusions: It is concluded that the major issues in moving towards continuous experimentation are on an organizational level; most significant technical challenges have been solved. An evolutionary approach is proposed as a way to transition towards experiment-driven development.
Article
Full-text available
Context: Development of software-intensive products and services increasingly occurs by continuously deploying product or service increments, such as new features and enhancements, to customers. Product and service developers must continuously find out what customers want by direct customer feedback and usage behaviour observation. Objective: This paper examines the preconditions for setting up an experimentation system for continuous customer experiments. It describes the RIGHT Model for Continuous Experimentation (Rapid Iterative value creation Gained through High-frequency Testing), illustrating the building blocks required for such a system. Method: An initial model for continuous experimentation is analytically derived from prior work. The model is matched against empirical case study findings from two startup companies and further developed. Results: Building blocks for a continuous experimentation system and infrastructure are presented. Conclusions: A suitable experimentation system requires at least the ability to release minimum viable products or features with suitable instrumentation, design and manage experiment plans, link experiment results with a product roadmap, and manage a flexible business strategy. The main challenges are proper and rapid design of experiments, advanced instrumentation of software to collect, analyse, and store relevant data, and the integration of experiment results in both the product development cycle and the software development process. Our findings suggest that it is important to identify fundamental assumptions before designing experiments and validate those first in order to avoid unneccessary experimentation. Deriving experiments that properly test product strategies requires special expertise and skill. Finally, we claim that integrating experimentation outcomes into decision-making is a particular challenge for product management in companies.
Article
Full-text available
While innovation, such as development of new features, is critical for any organization, it is hard to get right. In both our case companies, the selection of ideas is usually driven by previous experiences, and very often the process becomes politicized and based on peoples’ opinions. To address this, we present the Hypothesis Experiment Data-Driven Development (HYPEX) model. Our model is an alternative development process that helps companies shorten the feedback loop to customers. The model supports companies in running feature experiments and advocates development of small parts of features that are continuously evaluated with customers. In our study we validate the model in two software development companies. Although the companies involved in the study have not yet completed a full experiment cycle, we see that feature experiments are beneficial for improving at least four activities within the companies: (1) data-driven development (the ease of collecting customer feedback allows for a real-time connection between the quantified business goals of the organization and the operational metrics collected from the installed customer base), (2) customer responsiveness (the ease of collecting customer feedback allows product management to respond rapidly and dynamically to any changes to the use of the products, as well as to emerging customer requests), (3) R&D efficiency (the ease of collecting customer feedback gives the development teams a real-time goal and metrics to strive for and provides focus for their work), and (4) R&D accuracy (the ease of collecting customer feedback enables the development teams to align their efforts with what the customers appreciate the most). The HYPEX model is a development process that helps software development companies move away from building large chunks of functionality with little feedback from customers and instead continuously validate with customers that the functionality under development is of value to customers.
Conference Paper
Full-text available
Software-intensive product companies are becoming increasingly data-driven as can be witnessed by the big data and Internet of Things trends. However, optimally prioritizing customer needs in a mass-market context is notoriously difficult. While most companies use product owners or managers to represent the customer, research shows that the prioritization made is far from optimal. In earlier research, we have coined the term ‘the open loop problem’ to characterize this challenge. For instance, research shows that up to half of all the features in products are never used. This paper presents a conceptual model that emphasizes the need for combining qualitative feedback in early stages of development with quantitative customer observation in later stages of development. Our model is inductively derived from an 18 months close collaboration with six large global software-intensive companies.
Conference Paper
Full-text available
In many companies, product management struggles in getting accurate customer feedback. Often, validation and confirmation of functionality with customers takes place only after the product has been deployed, and there are no mechanisms that help product managers to continuously learn from customers. Although there are techniques available for collecting customer feedback, these are typically not applied as part of a continuous feedback loop. As a result, the selection and prioritization of features becomes far from optimal, and product deviates from what the customers need. In this paper, we present a literature review of currently recognized techniques for collecting customer feedback. We develop a model in which we categorize the techniques according to their characteristics. The purpose of this literature review is to provide an overview of current software engineering research in this area and to better understand the different techniques that are used for collecting customer feedback.
Conference Paper
Full-text available
Continuous Delivery (CD) has emerged as an auspicious software development discipline, with the promise of providing organizations the capability to release valuable software continuously to customers. Our organization has been implementing CD for the last two years. Thus far, we have moved 22 software applications to CD. I observed that CD has created a new context for architecting these applications. In this paper, I will try to characterize such a context of CD, explain why we need to architect for CD, describe the implications of architecting for CD, and discuss the challenges this new context creates. This information can provide insights to other practitioners for architecting their software applications, and provide researchers with input for developing their research agendas to further study this increasingly important topic.
Conference Paper
Full-text available
Traditional software development focuses on specifying and freezing requirements early in the, typically yearly, product development lifecycle. The requirements are defined based on product management’s best understanding. The adoption of SaaS and cloud computing has shown a different approach to managing requirements, adding frequent and rigorous experimentation to the development process with the intent of minimizing R&D investment between customer proof points. This offers several benefits including increased customer satisfaction, improved and quantified business goals and the transformation to a continuous rather than waterfall development process. In this paper, we present our learnings from studying software companies applying an innovation experiment system approach to product development. The approach is illustrated with three cases from Intuit, the case study company.
Article
Full-text available
In this article the author addresses the history of reliability and validity in qualitative research as this method of inquiry has progressed through various paradigms. The importance of the concepts of reliability and validity in research findings is traced from the traditional era, where there was only a modest distinction between qualitative and quantitative researchers involving their definitions of research reliability and validity, through the current era, where some researchers question the need to be restricted in their research by attempting to control for or account for the reliability and validity of their research findings. The author rejects a strict need for reliability and validity as traditionally defined in quantitative research and outlines a less restrictive approach to ensuring reliability and validity in qualitative research.
Conference Paper
Full-text available
Rapid value delivery requires a company to utilize empirical evaluation of new features and products in order to avoid unnecessary product risks. This helps to make data-driven decisions and to ensure that the development is focused on features that provide real value for customers. Short feedback loops are a prerequisite as they allow for fast learning and reduced reaction times. Continuous experimentation is a development practice where the entire R\&D process is guided by constantly conducting experiments and collecting feedback. Although principles of continuous experimentation have been successfully applied in domains such as game software or SAAS, it is not obvious how to transfer continuous experimentation to the B2B domain. In this article, a case study from a medium-sized software company in the B2B domain is presented. The study objective is to analyze the challenges, benefits and organizational aspects of continuous experimentation in the B2B domain. The results suggest that technical challenges are only one part of the challenges a company encounters in this transition. The company also has to address challenges related to the customer and organizational culture. Unique properties in each customer’s business play a major role and need to be considered when designing experiments. Additionally, the speed by which experiments can be conducted is relative to the speed by which production deployments can be made. Finally, the article shows how the study results can be used to modify the development in the case company in a way that more feedback and data is used instead of opinions.
Conference Paper
Full-text available
System logs are widely used in various tasks of software system management. It is crucial to avoid logging too little or too much. To achieve so, developers need to make informed decisions on where to log and what to log in their logging practices during development. However, there exists no work on studying such logging practices in industry or helping developers make informed decisions. To fill this significant gap, in this paper, we systematically study the logging practices of developers in industry, with focus on where developers log. We obtain six valuable findings by conducting source code analysis on two large industrial systems (2.5M and 10.4M LOC, respectively) at Microsoft. We further validate these findings via a questionnaire survey with 54 experienced developers in Microsoft. In addition, our study demonstrates the high accuracy of up to 90% F-Score in predicting where to log.
Conference Paper
Full-text available
Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Google, LinkedIn, Microsoft, and Yahoo, attempt to improve their web sites, optimizing for criteria ranging from repeat usage, time on site, to revenue. Having been involved in running thousands of controlled experiments at Amazon, Booking.com, LinkedIn, and multiple Microsoft properties, we share seven rules of thumb for experimenters, which we have generalized from these experiments and their results. These are principles that we believe have broad applicability in web optimization and analytics outside of controlled experiments, yet they are not provably correct, and in some cases exceptions are known. To support these rules of thumb, we share multiple real examples, most being shared in a public paper for the first time. Some rules of thumb have previously been stated, such as “speed matters,” but we describe the assumptions in the experimental design and share additional experiments that improved our understanding of where speed matters more: certain areas of the web page are more critical. This paper serves two goals. First, it can guide experimenters with rules of thumb that can help them optimize their sites. Second, it provides the KDD community with new research challenges on the applicability, exceptions, and extensions to these, one of the goals for KDD’s industrial track.
Article
Full-text available
Requirement engineering is the important phase in the software development. Here, we gather all requirements of the software which is proposed for development. An agile methodology produces high quality software and takes less time in comparison to traditional methods. Agile was discovered for managing the development process in the environment where requirement could be changed during development process. Requirement engineering is the important phase in the agile development methodology. In this paper, we are providing a hybrid approach for requirement engineering in the agile with the help of JAD and the prioritization of the requirements in the agile is helped by the viewpoint.
Article
Full-text available
Internet companies such as Facebook operate in a "perpetual development" mindset. This means that the website continues to undergo development with no predefined final objective, and that new developments are deployed so that users can enjoy them as soon as they're ready. To support this, Facebook uses both technical approaches such as peer review and extensive automated testing, and a culture of personal responsibility.
Conference Paper
Full-text available
Context: Development of software-intensive products and services increasingly occurs by continuously deploying product or service increments, such as new features and enhancements, to customers. Product and service developers need to continuously find out what customers want by direct customer feedback and observation of usage behaviour, rather than indirectly through up-front business analyses. Objective: This paper examines the preconditions for setting up an experimentation system for continuous customer experiments. It describes the building blocks required for such a system. Method: A model for continuous experimentation is analytically derived from prior work. The proposed model is validated against a case study examining a startup company. Results: Building blocks for a continuous experimentation system and infrastructure are presented. Conclusion: A suitable experimentation system requires at least the ability to release minimum viable products or features with suitable instrumentation, design and manage experiment plans, link experiment results with a product roadmap, and manage a flexible business strategy. The main challenges are proper and rapid design of experiments, advanced instrumentation of software to collect, analyse, and store relevant data, and the integration of experiment results in both the product development cycle and the software development process.
Article
Context Continuous experimentation guides development activities based on data collected on a subset of online users on a new experimental version of the software. It includes practices such as canary releases, gradual rollouts, dark launches, or A/B testing. Objective Unfortunately, our knowledge of continuous experimentation is currently primarily based on well-known and outspoken industrial leaders. To assess the actual state of practice in continuous experimentation, we conducted a mixed-method empirical study. Method In our empirical study consisting of four steps, we interviewed 31 developers or release engineers, and performed a survey that attracted 187 complete responses. We analyzed the resulting data using statistical analysis and open coding. Results Our results lead to several conclusions: (1) from a software architecture perspective, continuous experimentation is especially enabled by architectures that foster independently deployable services, such as microservices-based architectures; (2) from a developer perspective, experiments require extensive monitoring and analytics to discover runtime problems, consequently leading to developer on call policies and influencing the role and skill sets required by developers; and (3) from a process perspective, many organizations conduct experiments based on intuition rather than clear guidelines and robust statistics. Conclusion Our findings show that more principled and structured approaches for release decision making are needed, striving for highly automated, systematic, and data- and hypothesis-driven deployment and experimentation.
Article
Continuous experimentation is an up-and-coming technique for requirements engineering and testing, particularly for Web-based systems. Based on a practitioner survey, we give an overview of challenges, implementation techniques, and current research in the field.
Conference Paper
Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested in web sites, mobile applications, desktop applications, services, and operating systems. One of the key challenges for organizations that run controlled experiments is to come up with the right set of metrics [1] [2] [3]. Having good metrics, however, is not enough. In our experience of running thousands of experiments with many teams across Microsoft, we observed again and again how incorrect interpretations of metric movements may lead to wrong conclusions about the experiment's outcome, which if deployed could hurt the business by millions of dollars. Inspired by Steven Goodman's twelve p-value misconceptions [4], in this paper, we share twelve common metric interpretation pitfalls which we observed repeatedly in our experiments. We illustrate each pitfall with a puzzling example from a real experiment, and describe processes, metric design principles, and guidelines that can be used to detect and avoid the pitfall. With this paper, we aim to increase the experimenters' awareness of metric interpretation issues, leading to improved quality and trustworthiness of experiment results and better data-driven decisions.
Article
- This paper describes the process of inducting theory using case studies from specifying the research questions to reaching closure. Some features of the process, such as problem definition and construct validation, are similar to hypothesis-testing research. Others, such as within-case analysis and replication logic, are unique to the inductive, case-oriented process. Overall, the process described here is highly iterative and tightly linked to data. This research approach is especially appropriate in new topic areas. The resultant theory is often novel, testable, and empirically valid. Finally, framebreaking insights, the tests of good theory (e.g., parsimony, logical coherence), and convincing grounding in the evidence are the key criteria for evaluating this type of research.
Conference Paper
You get what you measure, and you can't manage what you don't measure. Metrics are a powerful tool used in organizations to set goals, decide which new products and features should be released to customers, which new tests and experiments should be conducted, and how resources should be allocated. To a large extent, metrics drive the direction of an organization, and getting metrics 'right' is one of the most important and difficult problems an organization needs to solve. However, creating good metrics that capture long-term company goals is difficult. They try to capture abstract concepts such as success, delight, loyalty, engagement, life-time value, etc. How can one determine that a metric is a good one? Or, that one metric is better than another? In other words, how do we measure the quality of metrics? Can the evaluation process be automated so that anyone with an idea of a new metric can quickly evaluate it? In this paper we describe the metric evaluation system deployed at Bing, where we have been working on designing and improving metrics for over five years. We believe that by applying a data driven approach to metric evaluation we have been able to substantially improve our metrics and, as a result, ship better features and improve search experience for Bing's users.
Conference Paper
User feedback like clicks and ratings on recommended items provides important information for recommender systems to predict users' interests in unseen items. Most systems rely on models trained using a single type of feedback, e.g., ratings for movie recommendation and clicks for online news recommendation. However, in addition to the primary feedback, many systems also allow users to provide other types of feedback, e.g., liking or sharing an article, or hiding all articles from a source. These additional feedback potentially provides extra information for the recommendation models. To optimize user experience and business objectives, it is important for a recommender system to use both the primary feedback and additional feedback. This paper presents an empirical study on various training methods for incorporating multiple user feedback types based on LinkedIn recommendation products. We study three important problems that we face at LinkedIn: (1) Whether to send an email based on clicks and complaints, (2) how to rank updates in LinkedIn feeds based on clicks and hides and (3) how jointly optimize for viral actions and clicks in LinkedIn feeds. Extensive offline experiments on historical data show the effectiveness of these methods in different situations. Online A/B testing results further demonstrate the impact of these methods on LinkedIn production systems.
Conference Paper
Online controlled experiments, also called A/B testing, have been established as the mantra for data-driven decision making in many web-facing companies. In recent years, there are emerging research works focusing on building the platform and scaling it up, best practices and lessons learned to obtain trustworthy results, and experiment design techniques and various issues related to statistical inference and testing. However, despite playing a central role in online controlled experiments, there is little published work on treating metric development itself as a data-driven process. In this paper, we focus on the topic of how to develop meaningful and useful metrics for online services in their online experiments, and show how data-driven techniques and criteria can be applied in metric development process. In particular, we emphasize two fundamental qualities for the goal metrics (or Overall Evaluation Criteria) of any online service: directionality and sensitivity. We share lessons on why these two qualities are critical, how to measure these two qualities of metrics of interest, how to develop metrics with clear directionality and high sensitivity by using approaches based on user behavior models and data-driven calibration, and how to choose the right goal metrics for the entire online services.
Article
Two theoretical approaches have recently emerged to characterize new digital objects of study in the media landscape: infrastructure studies and platform studies. Despite their separate origins and different features, we demonstrate in this article how the cross-articulation of these two perspectives improves our understanding of current digital media. We use case studies of the Open Web, Facebook, and Google to demonstrate that infrastructure studies provides a valuable approach to the evolution of shared, widely accessible systems and services of the type often provided or regulated by governments in the public interest. On the other hand, platform studies captures how communication and expression are both enabled and constrained by new digital systems and new media. In these environments, platform-based services acquire characteristics of infrastructure, while both new and existing infrastructures are built or reorganized on the logic of platforms. We conclude by underlining the potential of this combined framework for future case studies.
Conference Paper
Large software organizations are transitioning to event data platforms as they culturally shift to better support data-driven decision making. This paper offers a case study at Microsoft during such a transition. Through qualitative interviews of 28 participants, and a quantitative survey of 1,823 respondents, we catalog a diverse set of activities that leverage event data sources, identify challenges in conducting these activities, and describe tensions that emerge in data-driven cultures as event data flow through these activities within the organization. We find that the use of event data span every job role in our interviews and survey, that different perspectives on event data create tensions between roles or teams, and that professionals report social and technical challenges across activities.
Article
Working with diverse stakeholders is a fact of life for any requirements engineer. And learning to bring out the best in each of them is an art acquired over time. This article shares effective stakeholder interaction techniques for solving three common problems. The Web extra at https://youtu.be/1df3HmRTbBk is an audio podcast in which author Jane Cleland-Huang provides an audio recording of the Requirements column.
Book
This book provides essential insights on the adoption of modern software engineering practices at large companies producing software-intensive systems, where hundreds or even thousands of engineers collaborate to deliver on new systems and new versions of already deployed ones. It is based on the findings collected and lessons learned at the Software Center (SC), a unique collaboration between research and industry, with Chalmers University of Technology, Gothenburg University and Malmö University as academic partners and Ericsson, AB Volvo, Volvo Car Corporation, Saab Electronic Defense Systems, Grundfos, Axis Communications, Jeppesen (Boeing) and Sony Mobile as industrial partners. The 17 chapters present the "Stairway to Heaven" model, which represents the typical evolution path companies move through as they develop and mature their software engineering capabilities. The chapters describe theoretical frameworks, conceptual models and, most importantly, the industrial experiences gained by the partner companies in applying novel software engineering techniques. The book's structure consists of six parts. Part I describes the model in detail and presents an overview of lessons learned in the collaboration between industry and academia. Part II deals with the first step of the Stairway to Heaven, in which R&D adopts agile work practices. Part III of the book combines the next two phases, i.e., continuous integration (CI) and continuous delivery (CD), as they are closely intertwined. Part IV is concerned with the highest level, referred to as "R&D as an innovation system," while Part V addresses a topic that is separate from the Stairway to Heaven and yet critically important in large organizations: organizational performance metrics that capture data, and visualizations of the status of software assets, defects and teams. Lastly, Part VI presents the perspectives of two of the SC partner companies. The book is intended for practitioners and professionals in the software-intensive systems industry, providing concrete models, frameworks and case studies that show the specific challenges that the partner companies encountered, their approaches to overcoming them, and the results. Researchers will gain valuable insights on the problems faced by large software companies, and on how to effectively tackle them in the context of successful cooperation projects. © 2014 Springer International Publishing Switzerland. All rights reserved.
Article
BACKGROUND - The software intensive industry is moving towards the adoption of a value-driven and adaptive real-time business paradigm. The traditional view of software as an item that evolves through releases every few months is being replaced by continuous evolution of software functionality. OBJECTIVE - This study aims to classify and analyse literature related to continuous deployment in the software domain in order to scope the phenomenon, provide an overview of its state-of-the-art, investigate the scientific evidence in the reported results and identify areas that are suitable for further research. METHOD - We conducted a systematic mapping study and classified the continuous deployment literature. The benefits and challenges related to continuous deployment were also analyzed. RESULTS - The systematic mapping study includes 50 primary studies published between 2001 and 2014. An in-depth analysis of the primary studies revealed ten recurrent themes that characterize continuous deployment and provide researchers with directions for future work. In addition, a set of benefits and challenges of which practitioners may take advantage were identified. CONCLUSION - Overall, although the topic area is very promising, it is still in its infancy, thus offering a plethora of new opportunities for both researchers and software intensive companies.
Article
Nowadays, users can easily submit feedback about software products in app stores, social media, or user groups. Moreover, software vendors are collecting massive amounts of implicit feedback in the form of usage data, error logs, and sensor data. These trends suggest a shift toward data-driven user-centered identification, prioritization, and management of software requirements. Developers should be able to adopt the requirements of masses of users when deciding what to develop and when to release. They could systematically use explicit and implicit user data in an aggregated form to support requirements decisions. The goal is data-driven requirements engineering by the masses and for the masses.
Article
Managers regularly implement new ideas without evidence to back them up. They act on hunches and often learn very little along the way. That doesn't have to be the case. With the help of broadly available software and some basic investments in building capabilities, managers don't need a PhD in statistics to base consequential decisions on scientifically sound experiments. Some companies with rich consumer-transaction data-Toronto-Dominion, CKE Restaurants, eBay, and others-are routinely testing innovations well outside the realm of product R&D. As randomized testing becomes standard procedure in certain settings (website analysis, for instance), firms learn to apply it in other areas as well. Entire organizations that adopt a "test and learn" culture stand to realize the greatest benefits. That said, firms need to determine when formal testing makes sense. Generally, it's much more applicable to tactical decisions (such as choosing a new store format) than to strategic ones (such as figuring out whether to acquire a business). Tests are useful only if managers define and measure desired outcomes and formulate logical hypotheses about how proposed interventions will play out. To begin incorporating more scientific management into your business, acquaint managers at all levels with your organization's testing process. A shared understanding of what constitutes a valid test-and how it jibes with other processes-helps executives to set expectations and innovators to deliver on them. The process always begins with creating a testable hypothesis. Then the details of the test are designed, which means identifying sites or units to be tested, selecting control groups, and defining test and control situations. After the test is carried out for a specified period, managers analyze the data to determine results and appropriate actions. Results ideally go into a "learning library," so others can benefit from them.
Article
Customer relationship management is a buzzword around businesses. "CRM comprises a set of processes and enabling systems supporting a business strategy to build long term, profitable relationships with specific customers." There are a few components of CRM implementation, most importantly, the IT component. While revolutionary in many respects, CRM is also a natural and predictable extension of how marketing and sales have evolved over the years and, in many ways, are coming full circle. The Customer Relationship Management Cycle consists of an evaluation phase, a planning phase, an implementation phase, and a review phase. Several intrinsic business reasons are driving a great number of companies going onto the CRM highway. Companies can overlay their own data to better understand customer attributes and get better results. The most substantial impacts from CRM transitions fall into the category of information support. Customer information and the associated technology tools are the foundation upon which any successful CRM strategy is built. CRM is theoretically sound; however, in practice the implementation of CRM has some pitfalls. The critical success factors in implementing CRM are theoretically analyzed. Achieving a successful CRM environment needs not only the critical success factors analyzed but also some practical aspects, especially the cooperation of the IT and other functionalities within the enterprise.
Chapter
The experiment data from the operation is input to the analysis and interpretation. After collecting experimental data in the operation phase, we want to be able to draw conclusions based on this data. To be able to draw valid conclusions, we must interpret the experiment data.
Article
DESIGNERS HAVE TRADIIONALLY FOCUSED ON ENHANCING THE LOOK AND FUNCTIONALITY OF PRODUCTS. RECENTLY, THEY HAVE BEGUN USING DESIGN TOOLS TO TACKLE MORE COMPLEX PROBLEMS, SUCH AS FINDING WAYS TO PROVIDE LOW-COST HEALTH CARE THROUGHOUT THE WORLD. BUSINESSES WERE FIRST TO EMBRACE THIS NEW APPROACH—CALLED DESIGN THINKING—NOW NONPROFITS ARE BEGINNING TO ADOPT IT TOO.
Article
General theories of software engineering must balance between providing full understanding of a single case and providing partial understanding of many cases. In this paper we argue that for theories to be useful in practice, they should give sufficient understanding of a sufficiently large class of cases, without having to be universal or complete. We provide six strategies for developing such theories of the middle range. In lab-to-lab strategies, theories of laboratory phenomena are developed and generalized to other laboratory phenomena. This is a characteristic strategy for basic science. In lab-to-field strategies, theories are developed of artifacts that first operate under idealized laboratory conditions, which are then scaled up until they can operate under uncontrolled field conditions. This is the characteristic strategy for the engineering sciences. In case-based strategies, we generalize about components of real-world cases, that are supposed to exhibit less variation than the cases as a whole. In sample-based strategies, we generalize about the aggregate behavior of samples of cases, which can exhibit patterns not visible at the case level. We discuss three examples of sample-based strategies. Throughout the paper, we use examples of theories and generalization strategies from software engineering to illustrate our analysis. The paper concludes with a discussion of related work and implications for empirical software engineering research.
Conference Paper
Today’s leading enterprises operate in a global multi-actor environment, cooperating with suppliers, partners and other stakeholders in order to deliver services and/or products. It is a question of value creation in a global economy, of how to create value in buyer– seller relationships within global product–service networks. The traditional supply chain can be used for product deliveries, but locally delivered services require a network of collaboration partners. The unique characteristics of service include a high level of customer contact and influence, simultaneity of production and consumption, intangibility, non-storability, perishability, and labour intensity. This paper discusses the challenges when moving from a linear product supply chain to a product–service supply network. The aim of this paper is to increase the understanding of how companies manage their customer value creation within integrated product–service supply networks. When analysing the benefits of product–service offerings, we focus on two categories: tangible and intangible determinants.
Article
This article traces the development of the design of experiments from origins in the mind and professional experience of R.A. Fisher between 1922 and 1926. The article indicates how the analysis of variance procedure stimulated design, being justified by the principle of randomization that Fisher introduced with the analysis, and exploited by his use of blocking and replication. The article indicates the radically new form and efficiency of factorial block designs, shows the further advantages accruing to factorial arrangements through confounding, and suggests how Fisher's close collaboration with experimenters stimulated these developments.