Conference Paper

A Goal Driven Framework for Software Project Data Analytics

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The life cycle activities of industrial software systems are often complex, and encompass a variety of tasks. Such tasks are supported by integrated environments (IDEs) that allow for project data to be collected and analyzed. To date, most such analytics techniques are based on quantitative models to assess project features such as effort, cost and quality. In this paper, we propose a project data analytics framework where first, analytics objectives are represented as goal models with conditional contributions; second, goal models are transformed to rules that yield a Markov Logic Network (MLN) and third, goal models are assessed by an MLN probabilistic reasoner. This approach has been applied with promising results to a sizeable collection of software project data obtained by ISBSG repository, and can yield results even with incomplete or partial data.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this paper, we present a complementary to probabilistic reasoning approach that aims to model by a confidence score the strength by which an expert modeler considers that one requirement or system configuration affects another. More specifically, we build on previous work we have conducted [13,14], and we introduce a fuzzy goal model reasoning process which can be parallelized and applied at run time. Such parallelization serves to tackle tractability issues either when large-scale models are involved or when models should be repeatedly and frequently evaluated against high velocity and high frequency logged event streams. ...
... Other examples of approaches that use quantitative reasoning over goal models are the ones presented in [14] and [11]. In [14], goal trees are utilized to denote dependencies between certain software project management metrics and specific goals related to cost, effort, and quality of the software system being built or maintained. ...
... Other examples of approaches that use quantitative reasoning over goal models are the ones presented in [14] and [11]. In [14], goal trees are utilized to denote dependencies between certain software project management metrics and specific goals related to cost, effort, and quality of the software system being built or maintained. Using data from past similar projects, weights are assigned to contribution links, and an MLN reasoner [44] is used to calculate the probabilities of root goal nodes to be satisfied. ...
Article
Full-text available
As software applications become highly interconnected in dynamically provisioned platforms, they form the so-called systems-of-systems. Therefore, a key issue that arises in such environments is whether specific requirements are violated, when these applications interact in new unforeseen ways as new resources or system components are dynamically provisioned. Such environments require the continuous use of frameworks for assessing compliance against specific mission critical system requirements. Such frameworks should be able to (a) handle large requirements models, (b) assess system compliance repeatedly and frequently using events from possibly high velocity and high frequency data streams, and (c) use models that can reflect the vagueness that inherently exists in big data event collection and in modeling dependencies between components of complex and dynamically re-configured systems. In this paper, we introduce a framework for run time reasoning over medium and large-scale fuzzy goal models, and we propose a process which allows for the parallel evaluation of such models. The approach has been evaluated for time and space performance on large goal models, exhibiting that in a simulation environment, the parallel reasoning process offers significant performance improvement over a sequential one.
Article
Full-text available
The use of end-to-end data mining methodologies such as CRISP-DM, KDD process, and SEMMA has grown substantially over the past decade. However, little is known as to how these methodologies are used in practice. In particular, the question of whether data mining methodologies are used ‘as-is’ or adapted for specific purposes, has not been thoroughly investigated. This article addresses this gap via a systematic literature review focused on the context in which data mining methodologies are used and the adaptations they undergo. The literature review covers 207 peer-reviewed and ‘grey’ publications. We find that data mining methodologies are primarily applied ‘as-is’. At the same time, we also identify various adaptations of data mining methodologies and we note that their number is growing rapidly. The dominant adaptations pattern is related to methodology adjustments at a granular level (modifications) followed by extensions of existing methodologies with additional elements. Further, we identify two recurrent purposes for adaptation: (1) adaptations to handle Big Data technologies, tools and environments (technological adaptations); and (2) adaptations for context-awareness and for integrating data mining solutions into business processes and IT systems (organizational adaptations). The study suggests that standard data mining methodologies do not pay sufficient attention to deployment issues, which play a prominent role when turning data mining models into software products that are integrated into the IT architectures and business processes of organizations. We conclude that refinements of existing methodologies aimed at combining data, technological, and organizational aspects, could help to mitigate these gaps.
Article
Context: Modern software systems often are distributed, run on virtualized platforms, implement complex tasks and operate on dynamically changing and unpredictable environments. Such systems need to be dynamically reconfigured or evolve in order to continue to meet their functional and non-functional requirements, as load and computation need to change. Such reconfiguration and/or evolution actions may cause other requirements to fail. Objective: Given models that describe with a degree of confidence the requirements that should hold in a running software system, along with their inter-dependencies, our objective is to propose a framework that can process these models and estimate the degree requirements hold as the system is dynamically altered or adapted. Method: We present an approach where requirements and their inter-dependencies are modeled using conditional goal models with weighted contributions. These models can be translated into fuzzy rules, and fuzzy reasoners can determine whether and to what degree, a requirement may be affected by a system change, or by actions related of other requirements. Results: The proposed framework is evaluated for its performance and stability on goal models of varying size and complexity. The experimental results indicate that the approach is tractable even for large models and allows for dealing with models where contribution links are of varying importance or weight. Conclusion: The use of conditional weighted goal models combined with fuzzy reasoners allowed for the tractable run-time evaluation of the degree by which system requirements are believed to hold, when such systems are dynamically altered or adapted. The approach aims to shed light towards the development of run-time requirements verification and validation techniques for adaptive systems or systems that undergo continuous, or frequent evolution.
Conference Paper
Full-text available
Root cause analysis for software systems is a challenging diagnostic task, due to the complexity emanating from the interactions between system components and the sheer size of logged data. This diagnostic task is usually assisted by human experts who create mental models of the system-at-hand, in order to generate hypotheses and conduct the analysis. In this paper, we propose a root cause analysis framework based on requirement goal models. We consequently use these models to generate a Markov Logic Network that serves as a diagnostic knowledge repository. The network can be trained and used to provide inferences as to why and how a particular failure observation may be explained by collected logged data. The proposed framework improves over existing approaches by handling uncertainty in observations, using natively generated log data, and by providing ranked diagnoses. The framework is illustrated using a test environment based on commercial off-the-shelf software components.
Article
Full-text available
Gaming companies now routinely apply data mining to their user data in order to plan the next release of their software. We predict that such software development analytics will become commonplace, in the near future. For example, as large software systems migrate to the cloud, they are divided and sold as dozens of smaller apps; when shopping inside the cloud, users are free to mix and match their apps from multiple vendors (e.g. Google Docs' word processor with Zoho's slide manager); to extend, or even retain, market share cloud vendors must mine their user data in order to understand what features best attract their clients. This panel will address the open issues with analytics. Issues addressed will include the following. What is the potential for software development analytics? What are the strengths and weaknesses of the current generation of analytics tools? How best can we mature those tools?
Article
Full-text available
Conference Paper
Full-text available
Goal modeling is an important part of various types of activities such as requirements engineering, business management, and compliance as-sessment. The Goal-oriented Requirement Language is a standard and mature goal modeling language supported by the jUCMNav tool. However, recent ap-plications of GRL to a regulatory context highlighted several analysis issues and limitations whose resolutions are urgent, and also likely applicable to other languages and tools. This paper investigates issues related to the computation of strategy and model differences, the management of complexity and uncertainty, sensitivity analysis, and various domain-specific considerations. For each, a so-lution is proposed, implemented in jUCMNav, and illustrated through simple examples. These solutions greatly increase the analysis capabilities of GRL and jUCMNav in order to handle real problems.
Conference Paper
Full-text available
Over the past decade, goal models have been used in Computer Science in order to represent software requirements, business objectives and design qualities. Such models extend traditional AI planning techniques for representing goals by allowing for partially defined and possibly inconsistent goals. This paper presents a formal framework for reasoning with such goal models. In particular, the paper proposes a qualitative and a numerical axiomatization for goal modeling primitives and introduces label propagation algorithms that are shown to be sound and complete with respect to their respective axiomatizations. In addition, the paper reports on preliminary experimental results on the propagation algorithms applied to a goal model for a US car manufacturer.
Conference Paper
Full-text available
Communication & Co-ordination activities are central to large software projects, but are difficult to observe and study in traditional (closed-source, commercial) settings because of the prevalence of informal, direct communication modes. OSS projects, on the other hand, use the internet as the communication medium,and typically conduct discussions in an open, public manner. As a result, the email archives of OSS projects provide a useful trace of the communication and co-ordination activities of the participants. However, there are various challenges that must be addressed before this data can be effectively mined. Once this is done, we can construct social networks of email correspondents, and begin to address some interesting questions. These include questions relating to participation in the email; the social status of different types of OSS participants; the relationship of email activity and commit activity (in the CVS repositories) and the relationship of social status with commit activity. In this paper, we begin with a discussion of our infrastructure (including a novel use of Scientific Workflow software) and then discuss our approach to mining the email archives; and finally we present some preliminary results from our data analysis.
Conference Paper
Full-text available
Cost estimation is a very crucial field for software developing companies. The acceptance of an estimation technique is highly dependent on estimation accuracy. Often, this accuracy is only determined after an initial application. Possible further steps for improving the underlying estimation model typically do not influence the decision on whether to discard the technique or deploy it. In addition, most estimation techniques do not explicitly support the evolution of the underlying estimation model in an iterative manner. This increases the risk of overlooking some important cost drivers or data inconsistencies. This paper presents an enhanced process for developing a CoBRA® cost estimation model by systematically including iterative analysis and feedback cycles, and its evaluation in a software development unit of Oki Electric Industry Co., Ltd., Japan. During the model improvement cycles, estimation accuracy was improved from an initial 120% down to 14%. In addition, lessons learned with the iterative development approach are described.
Conference Paper
Full-text available
This paper seeks to combine two largely independent threads of multiagent systems research—agent specification and pro- tocols. We specify agents in terms of goal models (as used in Tropos). We specify protocols in terms of the commitments among agents. We illustrate and formalize the semantic re- lationship between agents and protocols by exploiting the re- lationship between goals and commitments. Given an agent specification and a protocol, the semantics helps us perform two kinds of verification: (1) whether the protocol supports achieving particular agent goals, and (2) whether the agent's specification supports the satisfaction of particular commit- ments.
Article
Full-text available
Requirements Engineering (RE) research often ignores, or presumes a uniform nature of the context in which the system operates. This assumption is no longer valid in emerging computing paradigms, such as ambient, pervasive and ubiquitous computing, where it is essential to monitor and adapt to an inherently varying context. Besides influencing the software, context may influence stakeholders' goals and their choices to meet them. In this paper, we propose a goal-oriented RE modeling and reasoning framework for systems operating in varying contexts. We introduce contextual goal models to relate goals and contexts; context analysis to refine contexts and identify ways to verify them; reasoning techniques to derive requirements reflecting the context and users priorities at runtime; and finally, design time reasoning techniques to derive requirements for a system to be developed at minimum cost and valid in all considered contexts. We illustrate and evaluate our approach through a case study about a museum-guide mobile information system.
Conference Paper
Full-text available
Service-oriented applications facilitate the exchange of business services among autonomous and heterogeneous participants. Traditional system modeling approaches either apply at a lower of abstraction than required for such applications or do not accommodate the autonomous and heterogeneous nature of the participants. We present a business-level conceptual model that addresses the above shortcomings. The model gives primacy to the participants in a service-oriented application. A key feature of the model is that it cleanly decouples the specification of an application's architecture from the specification of individual participants. We formalize the connection between the two---the reasoning that would help a participant decide if a specific application is suitable for his needs. We implement the reasoning in datalog and apply it to a case study involving car insurance. We also demonstrate the scalability of our approach.
Article
Full-text available
Software developers are often faced with modification tasks that involve source which is spread across a code base. Some dependencies between source code, such as those between source code written in different languages, are difficult to determine using existing static and dynamic analyses. To augment existing analyses and to help developers identify relevant source code during a modification task, we have developed an approach that applies data mining techniques to determine change patterns - sets of files that were changed together frequently in the past - from the change history of the code base. Our hypothesis is that the change patterns can be used to recommend potentially relevant source code to a developer performing a modification task. We show that this approach can reveal valuable dependencies by applying the approach to the Eclipse and Mozilla open source projects and by evaluating the predictability and interestingness of the recommendations produced for actual modification tasks on these systems.
Article
Full-text available
This paper is an attempt to understand the processes by which software ages. We define code to be aged or decayed if its structure makes it unnecessarily difficult to understand or change and we measure the extent of decay by counting the number of faults in code in a period of time. Using change management data from a very large, long-lived software system, we explore the extent to which measurements from the change history are successful in predicting the distribution over modules of these incidences of faults. In general, process measures based on the change history are more useful in predicting fault rates than product metrics of the code: For instance, the number of times code has been changed is a better indication of how many faults it will contain than is its length. We also compare the fault rates of code of various ages, finding that if a module is, on the average, a year older than an otherwise similar module, the older module will have roughly a third fewer faults. Our most successful model measures the fault potential of a module as the sum of contributions from all of the times the module has been changed, with large, recent changes receiving the most weight
Conference Paper
Full-text available
Goal models have been used in Computer Science in order to represent software requirements, business objectives and design qualities. In previous work we have presented a formal framework for reasoning with goal models, in a qualitative or quantitative way, and we have introduced an algorithm for forward propagating values through goal models. In this paper we focus on the qualitative framework and we propose a technique and an implemented tool for addressing two much more challenging problems: (1) find an initial assignment of labels to leaf goals which satisfies a desired final status of root goals by upward value propagation, while respecting some given constraints; and (2) find an minimum cost assignment of labels to leaf goals which satisfies root goals. The paper also presents preliminary experimental results on the performance of the tool using the goal graph generated by a case study involving the Public Transportation Service of Trentino (Italy).
Article
Business process compliance management has recently grabbed a lot of attention in both business and academia as it helps organizations not only to control and monitor their business processes from a legal point of view but also to avoid financial penalties and undesirable consequences to their reputation. Balancing compliance obligations with business objectives remains however a difficult challenge. We believe goal-oriented compliance management using Key Performance Indicators (KPIs) to measure the compliance level of organi-zations is an area that can be further developed to tackle this challenge. Goal-oriented compliance management concepts have been explored before. Howev-er, there is little research on how to measure and improve the compliance level of organizations using KPIs while considering the impact of candidate adjust-ments on business goals. We discuss a proposal toward a framework to address the aforementioned problems.
Article
Digital records of software-engineering work are left by software developers during the development process. Source code is usually kept in a software repository, and software developers use issue-tracking repositories and online project-tracking software, as well as informal documentation to support their activities. The research discipline of mining software repositories (MSR) uses these extant, digital repositories to gain un-derstanding of the system. MSR has not been applied to model-driven development or model-driven engineering (MDE). In particular, model management deserve particular attention. Model management covers challenges associated with "maintaining traceabil-ity links among model elements to support model evolution and roundtrip engineering", "tracking versions", and "using models during runtime". These problems can be ad-dressed by investigating the models themselves and their relationship to other artifacts using MSR. The objective of this report is to survey state-of-the-art research in MSR and to discuss how these MSR techniques are applicable to the problems faced in MDE. Extracting information about what factors affect model quality, how people interact with models in the repository, and traceability to other artifacts advance our understanding of software engineering when MDE is used.
Article
Suppose you're a software team manager who's responsible for delivering a software product by a specific date, and your team uses a code integration system (referred to as a build in IBM Rational Jazz and in this article) to integrate its work before delivery. When the build fails, your team needs to spend extra time diagnosing the integration issue and reworking code. As the manager, you suspect that your team failed to communicate about a code dependency, which broke the build. Your team needs to quickly disseminate information about its interdependent work to achieve a successful integration build. How can you understand your team's communication? Social-network analysis can give you insight into the team's communication patterns that might have caused the build's failure.
Conference Paper
Despite large volumes of data and many types of metrics, software projects continue to be difficult to predict and risky to conduct. In this paper we propose software analytics which holds out the promise of helping the managers of software projects turn their plentiful information resources, produced readily by current tools, into insights they can act on. We discuss how analytics works, why it's a good fit for software engineering, and the research problems that must be overcome in order to realize its promise.
Conference Paper
Software repositories, such as source code, email archives, and bug databases, contain unstructured and unlabeled text that is difficult to analyze with traditional techniques. We propose the use of statistical topic models to automatically discover structure in these textual repositories. This discovered structure has the potential to be used in software engineering tasks, such as bug prediction and traceability link recovery. Our research goal is to address the challenges of applying topic models to software repositories.
Conference Paper
We present a recommender system that models and recommends product features for a given domain. Our approach mines product descriptions from publicly available online specifications, utilizes text mining and a novel incremental diusive clustering algorithm to discover domain-specific features, generates a probabilistic feature model that represents commonalities, variants, and cross-category features, and then uses association rule mining and the k-NearestNeighbor machine learning strategy to generate product specific feature recommendations. Our recommender system supports the relatively labor-intensive task of domain analysis, potentially increasing opportunities for re-use, reducing time-to-market, and delivering more competitive software products. The approach is empirically validated against 20 dierent product categories using thousands of product descriptions mined from a repository of free software applications.
Article
We propose a simple approach to combining first-order logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a first-order knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a first-order formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudo-likelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a real-world database and knowledge base in a university domain illustrate the promise of this approach.
Article
This paper introduces a new technique for predicting latent software bugs, called change classification. Change classification uses a machine learning classifier to determine whether a new software change is more similar to prior buggy changes or clean changes. In this manner, change classification predicts the existence of bugs in software changes. The classifier is trained using features (in the machine learning sense) extracted from the revision history of a software project stored in its software configuration management repository. The trained classifier can classify changes as buggy or clean, with a 78 percent accuracy and a 60 percent buggy change recall on average. Change classification has several desirable qualities: 1) The prediction granularity is small (a change to a single file), 2) predictions do not require semantic information about the source code, 3) the technique works for a broad array of project types and programming languages, and 4) predictions can be made immediately upon the completion of a change. Contributions of this paper include a description of the change classification approach, techniques for extracting features from the source code and change histories, a characterization of the performance of change classification across 12 open source projects, and an evaluation of the predictive power of different groups of features.