Minoru Kawahara's research while affiliated with Ehime University and other places

Publications (44)

Conference Paper
Any source code of a software product (production code) is expected to be tested to ensure its correct behavior. Whenever a developer updates production code, the developer should also update or create the corresponding test code to check if the updated parts still work correctly. Such a desirable co-evolution relationship between production and te...
Conference Paper
A successful naming of variables is key to making the source code readable. Programmers may use a compound variable name by concatenating two or more words to make it easier to understand and more informative. While each compound variable name itself may be easy-to-understand, a collection of such variables sometimes makes a "confusing" variable pa...
Preprint
Full-text available
Deep metric learning (DML) learns the mapping, which maps into embedding space in which similar data is near and dissimilar data is far. In this paper, we propose the new proxy-based loss and the new DML performance metric. This study contributes two following: (1) we propose multi-proxies anchor (MPA) loss, and we show the effectiveness of the mul...
Chapter
Variables are fundamental elements of software, and their names hold vital clues to comprehending the source code. It is ideal that a variable’s name should be informative that anyone quickly understands its role. When a variable’s scope gets broader, the demand for such an informative name becomes higher. Although the standard naming conventions p...
Chapter
Methods (functions) are the fundamental components of the software. Programmers usually grasp a method’s behavior by looking at the method’s name. Hence, the name of a method should be a summary of what the method does. There has been a study utilizing Word2Vec, Doc2Vec, and the convolutional neural network (CNN) to evaluate the consistency between...
Preprint
Full-text available
We developed a novel scotoma detection system using time required for fixation to the random targets, or the "eye-guided scotoma detection method". In order to verify the "eye-guided scotoma detection method", we measured 78 eyes of 40 subjects, and examined the measurement results in comparison with the results of measurement by Humphrey perimetry...
Conference Paper
Full-text available
To enhance the efficiency of software testing, researchers have studied various test case prioritization (TCP) methods. A topic model-based TCP is one of the promising methods, which expresses test cases by topic vectors and prioritizes them in the order such that the set of already-prioritized test cases have the maximum dispersion in the vector s...
Chapter
Static code analysis tools (code checkers) scan source programs and issue warnings to potentially-problematic parts. Programmers can utilize a code checker whenever they change their source code to make sure that their code changes do not carry high risks of decreasing the code quality. Although code checkers would be helpful to detect risky code c...
Conference Paper
Full-text available
Well-chosen variable names play significant roles in program comprehension and high-quality software development and maintenance. However, even though all variables have easy-to-understand names, attention needs to be paid to the similarity among those names as well because a highly similar pair of variables may decrease the readability of code and...
Article
Programmers are familiar with local variables, and in many cases, they can freely define the local variables they use. Thus, the properties of these variables are widely diverse, and this may cause variations in the quality of code. Although variables are named in accordance with coding conventions, the following matters have not received much atte...
Conference Paper
Full-text available
Comments in a source program can be helpful artifacts for program comprehension. While many comments are useful documents embedded in source programs, there are also poorly-informative comments in the real world. In order to quantitatively assess the value of comments, this paper proposes applying the Doc2Vec model to comment evaluation. Doc2Vec is...
Article
Full-text available
Once a bug is reported, it is a major concern whether or not the bug is resolved (closed) soon. This paper examines seven metrics quantifying the amount of clues to the early close of reported bugs through a case study. The results show that one of the metrics, the similarity to already-closed bug reports, is strongly related to early-closed bugs.
Conference Paper
Full-text available
Giving a name to a local variable is usually a programmer's discretion. Since it depends on the programmer's preference and experience, there is a lot of individual variation which may cause a variability in the code quality such as the readability. While there have been studies on the naming of local variables in the past, a relationship of names...
Conference Paper
Full-text available
Open source software (OSS) products have been widely used for information systems, and a successful quality management of OSS development has become one of key topics in the information technology world. Since the development and maintenance of an OSS product is driven by various developers, it would be worthy to focus on their contributions and th...
Conference Paper
Full-text available
Toward an effective utilization of static code analysis tools, this paper investigates which violation is familiar with more programmers and widely appears in source files (having a high coverage), and which violation is really related to bugfixes (having a high importance), for six popular OSS projects. The results show: 1) the familiar violations...
Article
Full-text available
To support successful quality managements of open source software (OSS) projects, this paper proposes to measure the balance of developers’ contributions to a source file as an entropy. Through an analysis of data collected from 10 popular OSS projects, the following trends are reported: a source file is more fault-prone as the developers’ contribu...
Conference Paper
Full-text available
In the regression testing, an oversight of a regression is a serious problem to be avoided. A test engineer usually selects test cases to rerun for a regression testing. While the selection is a useful expert decision, there is also a risk of missing some important test cases. To support a more effective regression testing, this paper focuses on th...
Conference Paper
Full-text available
This paper proposes an application of the survival analysis to bug-fix events occurred in source files. When a source file is modified, it has a risk of creating a bug (fault). In this paper, such a risk is analyzed from a viewpoint of the survival time—the time that the source file can survive without any bug fix. Through an empirical study with 1...
Conference Paper
Full-text available
The naming of local variables is usually at the programmer's discretion. Thus, there is a diversity in naming local variables and this may cause variations in the code quality. Many coding conventions say that the name of a local variable can/should be short. This paper focuses on such conventions, and aims to explore the trends of local variables'...
Article
Full-text available
This paper focuses on an evaluation of coding violation warned by a static code analysis tool while considering the change history of violation and the authorship of source file. Through an empirical study with data collected from seven open source software projects, the following findings are reported: (1) the variety and the evaluation of a codin...
Conference Paper
Full-text available
Many empirical studies have reported notable theories or methods for evaluating or predicting code quality through analyses of code repositories. This paper has yet another point of view: it focuses on " commits " rather than source code. That is to say, this paper proposes to evaluate commits themselves. When an aim of a commit is to fix a bug, th...
Conference Paper
Full-text available
This paper focuses on comments written in source programs. While comments can work for improving the read-ability of code, i.e., the quality of programs, there have also been concerns that comments can be added for complicated source code in order to compensate for a lack of readability. That is to say, well-written comments might be associated wit...
Conference Paper
Full-text available
This paper focuses on the relationship between Java method's first word (prefix) and its implementation, for predicting fault-prone method. As a pilot study of the focused way of analysis, this paper analyzes three major prefixes of methods' names: " get " , " set " and " be. " The empirical study in this paper collects many data of methods' names...
Conference Paper
Full-text available
While static code analysis tools would be helpful in reviewing source code, they have not been actively utilized in practice. One of main reasons why they are not used by practitioners has been said that such tools output many warnings (violations to predefined rules) but most of them are false positive. Thus, there have been studies evaluating vio...
Article
Full-text available
This paper focuses on differences in comment densities among individual programmers, and proposes to adjust the conventional code complexity metric (the cyclomatic complexity) by using the abnormality of the comment density. An empirical study with nine popular open source Java products (including 103,246 methods) shows that the proposed metric per...
Conference Paper
Full-text available
Open source software (OSS) products have been broadly utilized for the IT business as well as the personal use in recent years. Software companies can receive much benefit from OSS products in terms of cost to develop and maintain their products. However, there are also risks that products of interest might become no longer being successfully maint...
Conference Paper
Full-text available
Code review is an essential activity to ensure the quality of code being developed, and there have been static code checkers for aiding an effective code review. However, such tools have not been actively utilized in the world of programmers due to a lot of coding violations (warning) produced by tools and their false-positiveness. In order to anal...
Conference Paper
Full-text available
This paper focuses on two types of artifacts—local variables and comments in a method (function). Both of them are usually used at the programer's discretion. Thus, naming local variables and commenting code can vary among individuals, and such an individual difference may cause a dispersion in quality. This paper conducts an empirical analysis on...
Conference Paper
To enhance the cost effectiveness of regression testing , this paper proposes a method for prioritizing test cases. In general, a test case can be evaluated from various different points of view, therefore whether it is worth it to rerun should be discussed using multi criteria. This paper shows that the Mahalanobis-Taguchi (MT) method is a useful...
Conference Paper
Full-text available
The existence of comments in method bodies is a double-edged sword, at one side it helps code reviewers to comprehend complex code while at the same time it could reflect the lack of confidence by the programmer concerning the clearness of their code. While comments can be useful clues to find problematic code, the effects may be vary from person t...
Article
This paper focuses on the power of comments to predict fault-prone programs. In general, comments along with executable statements enhance the understandability of programs. However, comments may also be used to mask the lack of readability in the program, therefore well-written comments are referred to as "deodorant to mask code smells" in the fie...
Conference Paper
Full-text available
This paper focuses on the local variable names and comments that are major artifacts reflecting the programmer's preference. It conducts an empirical analysis on the usefulness of those artifacts in assessing the software quality from the perspective of change-proneness in Java methods developed in six popular open source software products. The emp...
Article
Full-text available
[Context] Comments improve the readability of programs, so they are harmless to the software quality. However, comments may sometimes be added to compensate for the lack of readability in complicated programs. Some programmers want to add in-depth comments to their code fragments which are hard to be understood by other developers. In the field of...
Conference Paper
In recent years, many open source software (OSS) products have become popular and widely used in the information technology (IT) business. To successfully run IT business, it is important to properly understand the OSS development status. Having a proper understanding of development status is necessary to evaluate and predict the product quality. H...
Article
This paper proposes a cache replacement scheme named group replica caching for optical grid networks. In optical grid networks, data files for job execution are replicated at multiple servers in order to distribute loads. Clients download these files via lightpaths and store them as necessary. File downloading is blocked when the corresponding ligh...
Article
This paper proposes a contention resolution scheme considering multicast traffic in optical burst switching (OBS) networks. In OBS networks, for unicast bursts, contention can be avoided by deflection routing. However, deflection routing cannot be applied to multicast bursts because multicast bursts are transmitted along light-trees which are fixed...
Article
Full-text available
In optical grid networks, data files for job execution are replicated at multiple sites in order to distribute loads and achieve high performance computing. Those replicas are downloaded in parallel in order to reduce downloading time. Furthermore, each replica is downloaded with multiple wavelengths. Although parallel and multi-wavelength download...
Article
This paper proposes a replica caching scheme according to status of neighboring nodes in optical grid networks. In optical grid networks, data files for job execution are replicated at multiple servers in order to distribute loads. Clients download replicas via lightpaths and store those replicas as necessary. The blocking probability of lightpath...

Citations

... And renaming, where a maintainer changes a previously given name, is a common form of refactoring [7,24]. Additional research has considered how easy it is to remember names [12,15], and whether similar names may cause confusion [28,5]. In all these contexts it is important to be able to compare names to each other. ...
... Names bear witness to what the developer thought about the role of each part of code in the whole program [26]. There has therefore been substantial research on names and their meanings [3,4,9,10,11,13,16,18,21,23,27]. ...
... The main motivation of these studies is to benefit from the capacity of RL to seamlessly adapt to the dynamic nature of CI (frequent changes in systems and test suites) and integrate new data into already constructed models without retraining them from scratch. RL studies share similar [7,8,9,10,11,12,13] 7, 24% Yes Yes Clustering [14,15,16,17,18,19,20] 7, 24% Yes Yes Ranking [8,21,22,23,24,25,26,27,28,29,30,31,32] 13, 45% No Yes NLP-based [17,21,25,33,34,35] 6, 21% Yes Yes ideas about the creation of an RL environment by replaying CI logs and training an RL agent via interactions with the environment and receiving rewards. The trained RL agent can take a test case and assigns a score that is used to sort (prioritize) test cases. ...
... Analysis of software quality related factors. Researchers used the datasets labeled by B-SZZ and SZZ variations to investigate how software quality relates to several factors empirically [27,29,30,31,32,33]. For instance, Eyolfson et al. [30] studied the correlation between a commit's bugginess and its time-based properties (e.g., commit's frequency, time of the day, day of the week) in three open-source datasets where they labeled fix-inducing commits by B-SZZ. ...
... We considered, in this work, the following important observations. First, API documentation of the classes can be more useful than the name of code elements or comments to estimate the similarity between code fragments and bug reports [9]. Second, code fragments associated with previously fixed bug reports may be relevant also to the current report if these previously fixed bug reports are similar to a current bug report [10]. ...
... Developers' contribution can be described as the entire list of activities a developer performs during the development of the software (Gousios et al. 2008). Several approaches try to express the contribution of a developer in a timespan by, e.g., the number of commits performed in this interval, the number of Lines Of Code (LOC) written (Yamauchi et al. 2018;Bird et al. 2011), or the number of feature requests solved. ...
... Doc2vec is an unsupervised method developed by Le and Mikolov (2014). This method is an extension of word2vec that can express a document as a vector (Aman et al., 2018). Doc2vec provides the possibility to exploit the semantic information existing in a text. ...
... First, Aman et al. [2] shows that such confusing identifier naming combinations do occur in production software, The study focused on string and semantic similarities exclusively. The second study by Tashima et al. [23] shows that methods with similar identifiers are 1.1-2.6 times more likely to have faults. The third study by Al Madi et al. [1] shows that similarity in identifiers could hinder debugging. ...
... In this section, we briefly summarize the findings with regards each of the research question previously defined. With regards RQ1 (What software regression testing approaches are the most used in the industry?), it was observed that the test case selection technique is the most commonly used (24 papers) [18][19][20][21][25][26][27]29,31,32,[35][36][37]40,41,43,45,47,48,[50][51][52][53][54], followed by the prioritization technique (14 papers) [15][16][17][22][23][24]28,30,34,39,43,45,47,50]. We observe that the approach that considers artificial intelligence methods such as data mining is the least used with 2 papers found [32,39]. ...
... Survival analysis can also be applied to the software itself, this has been demonstrated by Aman et al. [4] and Caivano et al. [6]. Aman et al. used survival analysis to analyze the time to a bug-fix for files modified by developers of different experience levels [4]. ...