March 2018
·
274 Reads
·
16 Citations
Journal of Thermal Biology
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
March 2018
·
274 Reads
·
16 Citations
Journal of Thermal Biology
March 2015
·
3,220 Reads
·
1 Citation
Shell scripting is the primary way for programmers to interact at a high level with operating systems. For decades bash shell scripts have thus been used to accomplish various tasks. But Bash has a counter-intuitive syntax that is not well understood by modern programmers and is no longer adequately supported, making it now difficult to maintain. Bash also suffers from poor performance, memory leakage problems, and limited functionality which make continued dependence on it problematic. At the request of our industrial partner, we therefore developed a source-to-source translator, bash2py, which converts bash scripts into Python. Bash2py leverages the open source bash code, and the internal parser employed by Bash to parse any bash script. However, bash2py re-implements the variable expansion that occurs in Bash to better generate correct Python code. Bash2py correctly converts most Bash into Python, but does require human intervention to handle constructs that cannot easily be automatically translated. In our experiments on real-world open source bash scripts bash2py successfully translates 90% of the code. Feedback from our industrial partner confirms the usefulness of bash2py in practice.
February 2014
·
19 Reads
Owl Computing Technologies provides software and hardware that facilitates secure unidirectional data transfer across the Internet. Bash scripts are used to facilitate customer installation of Owl's client/server software, and to provide high level management, control, and monitoring of client/server interfaces. With the evolution of more robust scripting languages, Owl now wishes to convert their bash scripts to other scripting languages. As part of this conversion exercise the configuration and customization of their bash scripts will no longer involve direct end user modifications of the script logic. It will instead be achieved through appropriate modification of a supporting XML configuration file, which is read by each script. This avoids the risk that end users erroneously change scripts, and makes legitimate end user customization of their scripts simpler, more obvious, and easier to discern. An open source fact extractor was implemented that determines the dynamic usage made of every variable within an arbitrary bash script. This tool reports errors in a script and generates an XML configuration file that describes variable usage. Those variables whose value may not be assigned by an end user are manually removed from this XML configuration file. A second program reads this configuration file, generates the appropriate bash variable assignment statements, and these are then applied within bash by using the bash eval command. Collectively this provides a simple mechanism for altering arbitrary bash scripts so that they use an external XML configuration file, as a first step in the larger exercise of migrating bash scripts to other scripting languages.
November 2013
·
453 Reads
·
11 Citations
Predicting future behavior reliably and efficiently is vital for systems that manage virtual services. Such systems must be able to balance loads within a cloud environment to ensure that service level agreements (SLAs) are met at a reasonable expense. These virtual services while often comparatively idle are occasionally heavily utilized. Standard approaches to modeling system behavior (by analyzing the totality of the observed data, such as regression based approaches) tend to predict average rather than exceptional system behavior and may ignore important patterns of change over time. Consequently, such approaches are of limited use in providing warnings of future peak utilization within a cloud environment. Skewing predictions to better fit peak utilizations, results in poor fitting to low utilizations, which compromises the ability to accurately predict peak utilizations, due to false positives. In this paper, we present an adaptive approach that estimates, at run time, the best prediction value based on the performance of the previously seen predictions. This algorithm has wide applicability. We applied this adaptive technique to two large-scale real world case studies. In both studies, the results show that the adaptive approach is able to predict low, medium, and high utilizations more accurately than the other proposed approaches, at low cost, by adapting to changing patterns within the input time series. This facilitates better proactive management and placement of systems running within a cloud.
May 2013
·
42 Reads
·
20 Citations
The Linux kernel is one of the largest configurable open source software systems implementing static variability. In Linux, variability is scattered over three different artifacts: source code files, Kconfig files, and Makefiles. Previous work detected inconsistencies between these artifacts that led to anomalies in the intended variability of Linux. We call these variability anomalies. However, there has been no work done to analyze how these variability anomalies are introduced in the first place, and how they get fixed. In this work, we provide an analysis of the causes and fixes of variability anomalies in Linux. We first perform an exploratory case study that uses an existing set of patches which solve variability anomalies to identify patterns for their causes. The observations we make from this dataset allow us to develop four research questions which we then answer in a confirmatory case study on the scope of the whole Linux kernel. We show that variability anomalies exist for several releases in the kernel before they get fixed, and that contrary to our initial suspicion, typos in feature names do not commonly cause these anomalies. Our results show that variability anomalies are often introduced through incomplete patches that change Kconfig definitions without properly propagating these changes to the rest of the system. Anomalies are then commonly fixed through changes to the code rather than to Kconfig files.
May 2013
·
19 Reads
·
13 Citations
Predicting future behavior reliably and efficiently is key for systems that manage virtual services; such systems must be able to balance loads within a cloud environment to ensure that service level agreements are met at a reasonable expense. In principle accurate predictions can be achieved by mining a variety of data sources, which describe the historic behavior of the services, the requirements of the programs running on them, and the evolving demands placed on the cloud by end users. Of particular importance is accurate prediction of maximal loads likely to be observed in the short term. However, standard approaches to modeling system behavior, by analyzing the totality of the observed data, tend to predict average rather than exceptional system behavior and ignore important patterns of change over time. In this paper, we study the ability of a simple multivariate linear regression for forecasting of peak CPU utilization (storms) in an industrial cloud environment. We also propose several modifications to the standard linear regression to adjust it for storm prediction.
May 2012
·
43 Reads
·
1 Citation
Proceedings of the Euromicro Conference on Software Maintenance and Reengineering, CSMR
Industrial software systems often contain fragments of code that are vestigial, that is, they were created long ago for a specific purpose but are no longer useful within the current design of the system. In this work, we describe how we have adapted some research tools to remove such code, we use a hybrid static analysis approach of both source code and assembler to construct a model of the system, and then use graph querying to detect possible dead functions. Suspected dead functions are then commented out of the source. The system is then rebuilt and run against existing test suites to verify that the removals do not affect the semantics of the system. Finally, we discuss the results of performing this technique on a large and long-lived industrial software system as well as a large open source system.
October 2011
·
74 Reads
·
21 Citations
The Linux kernel has long been an interesting subject of study in terms of its source code. Recently, it has also been studied in terms of its variability since the Linux kernel can be configured to include or omit certain features according to the user's selection. These features are defined in the Kconfig files included in the Linux kernel code. Several articles study both the source code and Kconfig files to ensure variability is correctly implemented and to detect anomalies. However, these studies ignore the Make files which are another important component that controls the variability of the Linux kernel. The Make files are responsible for specifying what actually gets compiled and built into the final kernel. With over 1,300 Make files, more than 35,000 source code files, and over 10,000 Kconfig features, inconsistencies and anomalies are inevitable. In this paper, we explore the Linux's Make files (Kbuild) to detect anomalies. We develop three rules to identify anomalies in the Make files. Using these rules, we detect 89 anomalies in the latest release of the Linux kernel (2.6.38.6). We also perform a longitudinal analysis to study the evolution of Kbuild anomalies over time, and the solutions implemented to correct them. Our results show that many of the anomalies we detect are eventually corrected in future releases. This work is a first attempt at exploring the consistency of the variability implemented in Kbuild with the rest of the kernel. Such work opens the door for automatic anomaly detection in build systems which can save developers time in the future.
May 2011
·
33 Reads
·
3 Citations
Proceedings - International Conference on Software Engineering
Software development is difficult to model, particularly the noisy, non-stationary signals of changes per time unit, extracted from version control systems (VCSs). Currently researchers are utilizing timeseries analysis tools such as ARIMA to model these signals extracted from a project's VCS. Unfortunately current approaches are not very amenable to the underlying power-law distributions of this kind of signal. We propose modeling changes per time unit using multifractal analysis. This analysis can be used when a signal exhibits multi-scale self-similarity, as in the case of complex data drawn from power-law distributions. Specifically we utilize multifractal analysis to demonstrate that software development is multifractal, that is the signal is a fractal composed of multiple fractal dimensions along a range of Hurst exponents. Thus we show that software development has multi-scale self-similarity, that software development is multifractal. We also pose questions that we hope multifractal analysis can answer.
March 2011
·
13 Reads
When using replication as a technique for achieving high performance and scalability, optimistic replication is the natural choice. But optimistic replication introduces uncertainty that may lead to confusion in the clients. For example consecutive requests from the same client may be processed out of order, or the answer from a query might be too old to be useful to the client, or an operation may cause unwanted side-effect in the system when the client times out due to long response time. In such situations the clients can benefit from being allowed to specify their tolerable levels of uncertainty, in other words their desired quality-of-service (QoS) from the system. This paper introduces a novel optimistic replication protocol QRep with its QoS-aware middleware that allows the clients' to specify the following three QoS parameters: (1) session guarantee, (2) freshness and (3) deadline. QRep attempts to fulfill the specified QoS while processing the requests. QRep supports both synchronous and asynchronous clients. We carried out experiments to demonstrate the effectiveness of the protocol in terms of both QoS fulfillment and scalability. We also demonstrated that there is no significant performance penalty for providing the QoS.
... For this, we compiled 29 variables that describe the data in our study in four main aspects: (i) the type of trait measured, (ii) taxonomic information about the organism (kingdom and phylum), (iii) the shape of the TPC (e.g., its breadth, the degree of symmetry before and after the peak), and iv) information related to sampling resolution (e.g., the number of distinct temperatures before the peak of the curve). The full list of variables, along with their descriptions, is available in Supplementary Table 2 in Supplementary Note 5. We then randomly split the trait data into training and testing subsets (80% and 20% of the data, respectively) and fitted multi-output conditional inference regression trees 59 with all possible combinations of the four aforementioned groups of predictors using the R package partykit 60 (v. 1. [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17]. This method aims at predicting multiple response variables simultaneously (here, the AICc weights of all models) based on binary splits using the values of predictor variables, selected through nonparametric tests. ...
March 2018
Journal of Thermal Biology
... Recently it has been recognized that a wealth of data for observational studies of software engineering is available in software repositories [6] and the mining of software repositories has emerged as a promising area of research. Workshops on mining software repositories have been held concurrently with the International Conference on Software Engineering (ICSE) the last three years [8,9,12]. The research presented in these workshops has used data produced as a natural result of software development to answer many interesting and important questions about the state of Open Source Software development. ...
September 2005
ACM SIGSOFT Software Engineering Notes
... Prior methods for performance estimation work basically fall into two categories. First group of studies that apply methods such as multilayer perceptrons (MLP) and linear regression [17], weighted multivariate linear regression (MVLR) [5], and recurrent neural network (RNN) [26], Even LSTM [34] focus on improving the relationship between performance and time. The second group ignores the sequential effects and analyzes the workload and estimates performance. ...
May 2013
... Similar to these techniques, Casolari and Andreolini [39] proposed another trend-aware regression model using linear extrapolation. Regression-based methods [40] also exist in this space, as well as techniques based on autoregression, introduced by Roy et al. [41]. ...
November 2013
... These activities are also common during onboarding phases when developers are acquainting themselves with a new project. In fact, these activities are not only common, but essential for maintaining and evolving complex projects [84]. Construct Validity. ...
May 2013
... It can contain restrictions on argument values (e.g., some instructions can only be used with an even register). It can give a .NET code template to be inlined for the instruction dependent on the actual values of the arguments [6]. It can describe properties of the instruction semantics (registers accessed implicitly, reading or writing access, jumping instructions, etc.) that can be used by the assembler to produce diagnostic reports and useful visualisations [27] and to optimise the generated code for performance [9,26]. ...
Reference:
Raincode assembler compiler (tool demo)
May 2012
Proceedings of the Euromicro Conference on Software Maintenance and Reengineering, CSMR
... Such multifractal governance, allowing for more independent self-organization at the micro-level, and convergence on shared principles at the macro, 12 can also be found in open-source software communities (Hindle et al. 2011). Turnu et al. (2013) show the "witches broom" or cancerous growth effect for open source: excessively high fractal dimension correlates with the number of software bugs and other defects. ...
May 2011
Proceedings - International Conference on Software Engineering
... This additional information could be helpful in understanding legacy systems, as we have shown in our previous work (e.g. [12,17]). ...
January 2003
... A relevant example is given by software engineering graph decomposition [15, 20]. These methods are generally used to re-engineer large softwares, that is to propose an organization of software components. ...
January 2002
... To compute the normalized coupling value, we divide the absolute coupling by the N (N − 1)/2, since this is the number of possible inter-module couplings. Calls per function, computed by averaging the number of calls per function for all functions, is another frequentlyused metric for characterizing complexity [8]. The rationale for keeping this value low is that a high number of calls per function indicates a complex, hard to understand, function. ...
January 2004