Shinji Kusumoto

Shinji Kusumoto
Osaka University | Handai · Graduate School of Information Science and Technology

About

206
Publications
35,692
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,572
Citations
Introduction

Publications

Publications (206)
Article
Full-text available
In this retrospective article of our TSE paper ’CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code’[24], we revisit the reasons why we became deeply involved in code clone research, and explore what has driven its frequent citation in many studies. Furthermore, we reflect on why not only our own lab, but...
Chapter
Unit testing is a part of the process of developing software. In unit testing, developers verify that programs properly work as developers intend. Creating a test suite for a unit test is very time-consuming. For this reason, research is being conducted to generate a test suite for unit testing automatically, and before now, some test generation to...
Chapter
Debugging is a heavy task in software development. Computer-assisted debugging is expected to reduce these costs. Spectrum-based Fault Localization (SBFL) is one of the most actively studied computer-assisted debugging techniques. SBFL aims to identify the location of faulty code elements based on the execution paths of tests. Previous research rep...
Chapter
This paper investigates the negative effects of soft assertion on the accuracy of Spectrum-based Fault Localization (SBFL). Soft assertion is a kind of test assertion which continues test case execution even after an assertion failure occurs. In general, the execution path becomes longer if the test case fails by a soft assertion. Hence, soft asser...
Chapter
Decompiler is a system for recovering the original code from bytecode. A critical challenge in decompilers is that the decompiled code contains differences from the original code. These differences not only reduce the readability of the source code but may also change the program’s behavior. In this study, we propose a deep learning-based quirk fix...
Chapter
Full-text available
Code generation is a technique that generates program source code without human intervention. There has been much research on automated methods for writing code, such as code generation. However, many techniques are still in their infancy and often generate syntactically incorrect code. Therefore, automated metrics used in natural language processi...
Chapter
Regular expression is widely known as a powerful and general-purpose text processing tool for programming. Though the regular expression is highly versatile, there are various difficulties in using them. One promising approach to reduce the burden of the pattern composition is reuse by referring to past usages. Still, several source code-specialize...
Chapter
Automated program repair (APR) is a concept of automatically fixing bugs in source code to free developers from the burden of debugging. One of the issues facing search-based APR is that repaired code contains wasteful or meaningless statements that do not affect external behavior. This paper proposes a concept named source code tidying that elimin...
Chapter
Full-text available
In software maintenance process, software libraries are occasionally updated, and their APIs may also be updated. API changes can be classified into two categories: changes that break backward compatibility (in short, breaking changes) and changes that maintain backward compatibility (in short, maintaining changes). Detecting API changes and determ...
Article
Full-text available
In software development, ad hoc solutions that are intentionally implemented by developers are called self-admitted technical debt (SATD). Because the existence of SATD spreads poor implementations, it is necessary to remove it as soon as possible. Meanwhile, container virtualization has been attracting attention in recent years as a technology to...
Article
Full-text available
Method-level historical information is useful in (#2.8)various research on mining software repositories such as fault-prone module detection or evolutionary coupling identification. An existing technique named Historage converts a Git repository of a Java project to a finer-grained one. In a finer-grained repository, each Java method exists as a si...
Conference Paper
Full-text available
Refactoring evaluation is a challenging research topic because right and wrong of refactoring depend on various aspects of development context such as developers' skills, development cost, deadline and so on. Many techniques have been proposed to evaluate refactoring objectively. However, those techniques do not consider individual contexts of soft...
Conference Paper
Automated program repair (in short, APR) has been attracting much attention. A variety of APR techniques have been proposed, and they have been evaluated with actual bugs in open source software. Currently, the authors are trying to introduce APR techniques to industrial software development (in short, ISD) to reduce development cost drastically. H...
Conference Paper
Full-text available
Recently, a variety of studies have been conducted on source code analysis. If auto-generated code is included in the target source code, it is usually removed in a preprocessing phase because the presence of auto-generated code may have negative effects on source code analysis. A straightforward way to remove auto-generated code is searching speci...
Conference Paper
Full-text available
When we use code repositories, each commit should include code changes for only a single task and code changes for a single task should not be scattered over multiple commits. There are many studies on the former violation-often referred to as tangled commits- but the latter violation has been out of scope for MSR research. In this paper, we firstl...
Article
Full-text available
This paper proposes an approach to identify pitfalls of students in programming exercise by using snapshots of source code. Proposed method calculates distances between a snapshot and submitted source code by student. This method identifies pitfalls based on these distances and then, provides pitfalls to the lecturers. We applied our method into th...
Article
Full-text available
Programmers often copy and paste code fragments when they would like to reuse them. Although copy-and-paste operations enable programmers to realize rapid developments of software systems, it makes code clones. Some clones have negative impacts on software developments. For example, if we modify a code fragment, we have to check whether its clones...
Conference Paper
Full-text available
Developers often reuse existing software by copy and paste. Source code reuse improves productivity and software quality. On the other hand, source code reuse requires several professional skills to developers. In source code reuse, developers must locate reusable code fragments, and judge whether such reusable code is adequate to copy and paste in...
Conference Paper
Full-text available
Developing software by teams which adopted the agile de-velopment methodology such as Scrum seems totally natu-ral in industry. On the other hand, students belonging to graduate schools of information science who have some ex-perience on the agile team software development are rare. In the initial education on the Scrum, there exists some challenge...
Article
Full-text available
Although there is a principle that states a commit should only include changes for a single task, it is not always respected by developers. This means that code repositories often include commits that contain tangled changes. The presence of such tangled changes hinders analyzing code repositories because most mining software repository (MSR) appro...
Article
Full-text available
This paper introduces a new dataset of clone references, which is a set of correct clones consisting of their locational information with their gapped lines. Bellon's dataset is one of widely used clone datasets. Bellon's dataset contains many clone references, thus the dataset is useful for comparing accuracies among clone detectors. However, Bell...
Article
Full-text available
高度なソフトウェア技術者や高度ICT(Information and Communication Technology) 人材の育成を目的として,ソフトウェア開発をテーマとしたPBL(Project-based Learning)と呼ばれる教育・学習手法が様々な形態で行われている. PBLでは,振り返りと呼ばれるプロジェクト中にあった問題の発見,原因の分析,対策の考案といった,プロジェクトを継続的に改善することを目的とした活動が重要視されている.ここで,問題の発見や原因の定量的で客観的な分析を行うためには,プロジェクト中に誰がどのようなタスクをいつ実施したかといった正確なタスク記録が必要不可欠である. しかしながらPBLでは,タスク記録時に記述漏れや入力誤りが発生することがある.実際に過去に...
Conference Paper
In the last decade, a variety of studies on mining software repositories has been conducted. Mining repositories has a potential to obtain useful knowledge for the future development and maintenance. When software repositories are mined, large commits in them are often excluded from mining targets because large commits include merging and we believ...
Conference Paper
Although source code search systems are well known as being helpful to reuse source code, they have an issue that they often suggest larger code than what users actually need. This is because they suggest code based on the structure of programming languages such as files or classes. In this paper, we propose a new code search technique that conside...
Conference Paper
Previous research efforts have proposed various techniques for supporting code clone removal. They identify removal candidates based on the states of the source code in the latest version. However, those techniques suggest many code clones that are not suited for removal in addition to appropriate candidates. That is because the presence of code cl...
Conference Paper
Many researchers have conducted a variety of research related to clone evolution. In order to grasp how clones have evolved, clones must be tracked. However, conventional clone tracking techniques are not feasible to track clones if they moved to another location in the source code. Consequently, in this research, we propose a new clone tracking te...
Conference Paper
On the software development PBL (SDPBL), the implementation of firmly-fused development environment for students and monitoring environment for teachers are required in order to succeed in education. We have proposed the service, named "DaaS BADER" in compliance with demands from practical teachers to decrease the cost for preparation and maintenan...
Conference Paper
Full-text available
A variety of methods detecting code clones has been proposed before. In order to detect gapped code clones, AST-based technique, PDG-based technique, metric-based technique and text-based technique using the LCS algorithm have been proposed. However, each of those techniques has limitations. For example, existing AST-based techniques and PDG-based...
Conference Paper
In order to understand source code, humans sometimes execute the program in their mind. When they illustrate the program execution in their mind, it is necessary to memorize what values all the variables are along with the execution. If there are many variables in the program, it is hard to their memorization. However, it is possible to ease to mem...
Article
Full-text available
これまでにソースコード中からコードクローン(互いに類似するコード片)を自動的に検出するツールが多数開発されている.これらの検出ツールはコードクローンを多数検出するが,それらすべてがソフトウェア保守に有用であるとは限らない.さらに,あるコードクローンが有用であるか否かの判別基準は,個々の開発者により,あるいはコードクローン情報の使用目的などにより異なる可能性が高い.そこで本稿では,機械学習を用いた有用なコードクローンの自動特定手法を提案する.提案手法は,ツールによって検出されたコードクローンの一部を利用者に有用であるか否かに分類してもらい,それらを学習データとして残りのコードクローンからその利用者にとって有用なコードクローンを自動的に特定する.また,提案手法を実装し,被験者を用いて評価実験を行...
Article
ソフトウェア間にまたがるコードクローンを検出することは,多くのプロジェクトに頻出する処理 のライブラリ化による開発効率の向上やライセンスに違反したソースコード流用の特定などの観点から有 益である.しかし,既存の研究ではこのようなコードクローンの検出に多大な時間を必要とし,また高速 に検出を行うファイル単位の検出手法でもファイルの一部がコードクローンである場合は検出できないと いう問題点を抱えている.本研究では,大規模なソフトウェア群からメソッド単位のコードクローンを高 速に検出する手法を提案する.実験の結果,提案手法は約3 億6 千万行のソースコードから約4.45 時間で コードクローン検出を終了し,検出したコードクローンの40%はファイル単位の手法では検出できないこ とが確認...
Article
Full-text available
我々は受講生のプログラミング演習時におけるコーディング過程を記録し,可視化して講師に提示するシステム C3PV を提案する.本システムは,ウェブ上で動作するオンラインエディタとコーディング過程ビューから構成されている.オンラインエディタは受講生のコーディングプロセスにおける,文字入力,コンパイル,実行,提出といったすべての行動を記録する.コーディング過程ビューは課題の進み具合いや受講者の相対的な進捗遅れを可視化して講師に提示する.講師はあるエラーに関して長時間悩んでいる受講生や全体の進捗と比較して遅れている受講生を C3PV によって確認し,個別指導といった支援につなげることができる.本研究では実際に C3PV を学部 1 年生が受講する Java プログラミング演習に適用し,C3PV によ...
Conference Paper
In software maintenance, grasping characteristics of software systems by metrics measurement is a basic activity. However, metrics do not always represent characteristics of software systems. For example, Cyclomatic Complexity is a metric counting the number of branches in a given module, and it does not consider its content. One factor that Cyclom...
Conference Paper
A variety of application results of code clone detection and analysis has been reported. There are many reports of code clone detection and analysis on open source software whereas few reports on industrial systems are open to the public. This paper reports an experience of code clone analysis on a governmental project. In the project, a software s...
Article
Full-text available
Sensor-driven services often cause chain reactions, since one service may generate an environmental impact that automatically triggers another service. We first propose a framework that can formalize and detect such service chains based on ECA (event, condition, action) rules. Although the service chain can be a major source of feature interactions...
Conference Paper
Full-text available
Libraries created from commonly used functionalities offer a variety of benefits to developers. To locate such widely used functionalities, clone detection on a large corpus of source code could be useful. However, existing clone detection techniques did not address the creation of libraries. Therefore, existing clone detectors are sometimes unbefi...
Conference Paper
Full-text available
A variety of code clone detection methods have been proposed before now. However, only a small part of them is widely used. Widely-used methods are line-based and token-based ones. They have high scalability because they neither require deep source code analysis nor constructing complex intermediate structures for the detection. High scalability is...
Conference Paper
It is difficult to keep consistent source code. Unintended inconsistencies occur unless we recognize all the code fragments that need to modify in a given bug fix or functional addition. Before modifying source code, keyword-based search tools like grep or code clone detection tools can be used to prevent code fragments from being overlooked. Howev...
Conference Paper
Full-text available
In order to reduce coupling and increase cohesion, we refactor program source code. Previous research efforts for suggesting candidates of such refactorings are based on static analysis, which obtains relations among classes or methods from source code. However, these approaches cannot obtain runtime information such as repetition count of loop, dy...
Conference Paper
In recent years, the increased failure originated in the software defects, in various information systems causes a serious social problem. In order to build a high-quality software, cultivation of ICT (Information and Communication Technology) human resources like a software engineer is required. A software development PBL (Project-based Learning)...
Conference Paper
Full-text available
Results from code clone detectors may contain plentiful useless code clones, and judging whether a code clone is useful varies from user to user based on different purposes of them. We are planing a system to study the judgment of each individual user by applying machine learning algorithms on code clones. We describe the reason why individual judg...
Article
Full-text available
When we reuse a code fragment, some of the identifiers in the fragment might be systematically changed to others. Failing these changes would become a potential bug in the copied fragment. We have developed a tool CloneInspector to detect such inconsistent changes in the code clones, and applied it to two mobile software systems. Using this tool, w...
Conference Paper
Refactoring is important for efficient software maintenance. However, manual operations for refactoring are complicated, and human-related errors easily occur. Tool support can help users to apply such a complicated refactoring. This paper proposes a refactoring support tool with Form Template Method pattern. The developed tool automatically identi...
Article
Full-text available
It is said that the presence of duplicate code is one of the factors that make software maintenance more difficult. Many research efforts have been performed on detecting, removing, or managing duplicate code on this basis. However, some researchers doubt this basis in recent years and have conducted empirical studies to investigate the influence o...
Article
Many research efforts have been performed on removing code clones. Especially, it is highly expected that clone removal techniques by applying Form Template Method have high applicability because they can be applied to code clones that have some gaps. Consequently some researchers have proposed techniques to support refactoring with Form Template M...
Conference Paper
Full-text available
This paper proposes a new mechanism to measure a variety of source code metrics at low cost. The proposed mechanism is very promising because it realizes to add new metrics as necessary. Users do not need to use multiple measurement tools for measuring multiple metrics. The proposed mechanism has been implemented as an actual software tool MASU. Th...
Conference Paper
Full-text available
It has been noted in recent years that the presence of code clones makes software maintenance more difficult. Unintended code inconsistencies may occur due to the presence of code clones. In order to avoid problems caused by code clones, it is necessary to identify where code clones exist in a software system. Consequently, various kinds of code cl...
Article
A function point (FP) is a unit of measurement that expresses the degree of functionality that an information system provides to a user. Many software organizations use FPs to estimate the effort required for software development. However, it is essential that the definition of 1 FP be based on the software development experience of the organizatio...
Article
Full-text available
Refactoring is important for efficient software maintenance. However, tools supports are highly required for refactoring because manual operations of refactoring are troublesome and error prone. This paper proposes a technique that suggests Extract Method candidates automatically. Extract Method refactoring is to create a new method from a code fra...
Conference Paper
PDG-based code clone detection is suitable for detecting on-contiguous code clones while other detection techniques, line-, token-, or AST-based techniques are not. However, PDG-based detection has lower performance for detecting contiguous code clones than the other techniques. Moreover, PDG-based detection is time consuming, so that application t...
Conference Paper
Full-text available
Various kinds of research efforts have been performed on the basis that the presence of duplicate code has a negative impact on software evolution. A typical example is that, if we modify a code fragment that has been duplicated to other code fragments, it is necessary to consider whether the other code fragments have to be modified simultaneously...
Conference Paper
Full-text available
In this paper, we describe the Software Tag which makes software development visible to software purchasers (users). A software tag is a partial set of empirical data about a software development project shared between the purchaser and developer. The purchaser uses the software tag to evaluate the software project, allowing them to recognize the q...
Conference Paper
The present paper discusses how clone sets can be generated from an very large amount of source code. The knowledge of clone sets can help to manage software asset. For example, we can figure out the state of the asset easier, or we can build more useful libraries based on the knowledge.
Article
Large, virtualized pools of computational resources raise the possibility of a new, advantageous computing paradigm for scientific research. To help achieve this, new tools make the cloud platform behave virtually like a local homogeneous computer cluster, giving users access to high-performance clusters without requiring them to purchase or mainta...
Conference Paper
Full-text available
Most code clones are generated by copy-and paste programming. Copy-and-paste programming shortens a time required for implementation because pasted code is a template of the required functionality. However, it sometimes brings on new bugs to the source code. After copy-and-paste, pasted code is somewhat changed fitting for the context of the region...
Conference Paper