Abram Hindle's research while affiliated with University of Alberta and other places

Publications (172)

Conference Paper
Full-text available
Electrocardiogram (ECG) abnormalities are linked to cardiovascular diseases, but may also occur in other non-cardiovascular conditions such as mental, neurological, metabolic and infectious conditions. However, most of the recent success of deep learning (DL) based diagnostic predictions in selected patient cohorts have been limited to a small set...
Preprint
Full-text available
Defect detection at commit check-in time prevents the introduction of defects into software systems. Current defect detection approaches rely on metric-based models which are not very accurate and whose results are not directly useful for developers. We propose a method to detect bug-inducing commits by comparing the incoming changes with all past...
Article
Full-text available
Sentiment analysis is a popular technique to identify the sentiment of a piece of text. Several different domains have been targeted by sentiment analysis research, such as Twitter, movie reviews, and mobile app reviews. Although several techniques have been proposed, the performance of current sentiment analysis techniques is still far from accept...
Preprint
Docker is becoming ubiquitous with containerization for developing and deploying applications. Previous studies have analyzed Dockerfiles that are used to create container images in order to better understand how to improve Docker tooling. These studies obtain Dockerfiles using either Docker Hub or Github. In this paper, we revisit the findings of...
Conference Paper
Full-text available
Single-statement bugs (SStuBs) can have a severe impact on developer productivity. Despite usually being simple and not offering much of a challenge to fix, these bugs may still disturb a developer’s workflow and waste precious development time. However, few studies have paid attention to these simple bugs, focusing instead on bugs of any size and...
Article
Informal communication channels like mailing lists, IRC and instant messaging play a vital role in open source software development by facilitating communication within geographically diverse project teams e.g., to discuss issue reports to facilitate the bug-fixing process. More recently, chat systems like Slack and Gitter have gained a lot of popu...
Preprint
Full-text available
Software is prone to bugs and failures. Security bugs are those that expose or share privileged information and access in violation of the software's requirements. Given the seriousness of security bugs, there are centralized mechanisms for supporting and tracking these bugs across multiple products, one such mechanism is the Common Vulnerabilities...
Article
Full-text available
Researchers in empirical software engineering often make claims based on observable data such as defect reports. Unfortunately, in many cases, these claims are generalized beyond the data sets that have been evaluated. Will the researcher’s conclusions hold a year from now for the same software projects? Perhaps not. Recent studies show that in the...
Preprint
The 12-lead electrocardiogram (ECG) is a commonly used tool for detecting cardiac abnormalities such as atrial fibrillation, blocks, and irregular complexes. For the PhysioNet/CinC 2020 Challenge, we built an algorithm using gradient boosted tree ensembles fitted on morphology and signal processing features to classify ECG diagnosis. For each lead,...
Conference Paper
Full-text available
Can we generate drum synthesizers automatically? We present an approach for the automatic generation of synthesizer programs for one-shot percussive sounds. Recent advancements in digital synthesis, heuristic search, and neural networks can be utilized for sound generation. Yet the need for data, the problem of open set recognition, and high comput...
Preprint
Full-text available
Researchers in empirical software engineering often make claims based on observable data such as defect reports. Unfortunately, in many cases, these claims are generalized beyond the data sets that have been evaluated. Will the researcher's conclusions hold a year from now for the same software projects? Perhaps not. Recent studies show that in the...
Article
Full-text available
Software energy consumption is a performance related non-functional requirement that complicates building software on mobile devices today. Energy hogging applications (apps) are a liability to both the end-user and software developer. Measuring software energy consumption is non-trivial, requiring both equipment and expertise, yet researchers have...
Preprint
One problem when studying how to find and fix syntax errors is how to get natural and representative examples of syntax errors. Most syntax error datasets are not free, open, and public, or they are extracted from novice programmers and do not represent syntax errors that the general population of developers would make. Programmers of all skill lev...
Preprint
Full-text available
Online resources today contain an abundant amount of code snippets for documentation, collaboration, learning, and problem-solving purposes. Their executability in a "plug and play" manner enables us to confirm their quality and use them directly in projects. But, in practice that is often not the case due to several requirements violations or inco...
Preprint
Massive Open Online Courses are educational programs that are open and accessible to a large number of people through the internet. To facilitate learning, MOOC discussion forums exist where students and instructors communicate questions, answers, and thoughts related to the course. The primary objective of this paper is to investigate tracing disc...
Article
Full-text available
Machine learning is a popular method of learning functions from data to represent and to classify sensor inputs, multimedia, emails, and calendar events. Smartphone applications have been integrating more and more intelligence in the form of machine learning. Machine learning functionality now appears on most smartphones as voice recognition, spell...
Article
Full-text available
Bug deduplication or duplicate bug report detection is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report software they employ, such as Bugzilla, Jira, or Github Issues. These search capabilities range...
Article
Examines the concept of "keeping it simple" with respect to software engineering. It’s funny how keeping it simple in software development can often mean revising and refactoring an existing system until it is elegant enough to afford adaptation and change. Simplicity and elegance are the goals of many developers when they’re designing software. De...
Conference Paper
Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested a...
Article
Full-text available
Testing is an integral part of the software development lifecycle, approached with varying degrees of rigor by different process models. Agile process models recommend Test Driven Development (TDD) as a key practice for reducing costs and improving code quality. The objective of this work is to perform a cost-benefit analysis of this practice. To t...
Preprint
On the worldwide web, not only are webpages connected but source code is too. Software development is becoming more accessible to everyone and the licensing for software remains complicated. We need to know if software licenses are being maintained properly throughout their reuse and evolution. This motivated the development of the Sourcerer's Appr...
Article
Full-text available
Execution logs are debug statements that developers insert into their code. Execution logs are used widely to monitor and diagnose the health of software applications. However, logging comes with costs, as it uses computing resources and can have an impact on an application’s performance. Compared with desktop applications, one additional critical...
Conference Paper
Testing is an integral part of the software development lifecycle, approached with varying degrees of rigor by different process models. Agile process models recommend Test Driven Development (TDD) as a key practice for reducing costs and improving code quality. The objective of this work is to perform a cost-benefit analysis of this practice. Prev...
Article
Full-text available
On mobile phones, users and developers use apps official marketplaces serving as repositories of apps. The Google Play Store and Apple Store are the official marketplaces of Android and Apple products which offer more than a million apps. Although both repositories offer description of apps, information concerning performance is not available. Due...
Preprint
Full-text available
Minor syntax errors are made by novice and experienced programmers alike; however, novice programmers lack the years of intuition that help them resolve these tiny errors. Standard LR parsers typically resolve syntax errors and their precise location poorly. We propose a methodology that helps locate where syntax errors occur, but also suggests pos...
Preprint
Full-text available
Minor syntax errors are made by novice and experienced programmers alike; however, novice programmers lack the years of intuition that help them resolve these tiny errors. Standard LR parsers typically resolve syntax errors and their precise location poorly. We propose a methodology that helps locate where syntax errors occur, but also suggests pos...
Article
Context: Virtual machines provide isolation of services at the cost of hypervisors and more resource usage. This spurred the growth of systems like Docker that enable single hosts to isolate several applications, similar to VMs, within a low-overhead abstraction called containers. Motivation: Although containers tout low overhead performance, do th...
Article
Music transcription involves the transformation of an audio recording to common music notation, colloquially referred to as sheet music. Manually transcribing audio recordings is a difficult and time-consuming process, even for experienced musicians. In response, several algorithms have been proposed to automatically analyze and transcribe the note...
Preprint
Full-text available
Software energy consumption is a performance related non-functional requirement that complicates building software on mobile devices today. Energy hogging applications are a liability to both the end-user and software developer. Measuring software energy consumption is non-trivial, requiring both equipment and expertise, yet many researchers have f...
Article
Full-text available
This work investigates the properties of crash reports collected from Ubuntu Linux users. Understanding crash reports is important to better store, categorize, prioritize, parse, triage, assign bugs to, and potentially synthesize them. Understanding what is in a crash report, and how the metadata and stack traces in crash reports vary will help sol...
Article
Full-text available
This work investigates the properties of crash reports collected from Ubuntu Linux users. Understanding crash reports is important to better store, categorize, prioritize, parse, triage, assign bugs to, and potentially synthesize them. Understanding what is in a crash report, and how the metadata and stack traces in crash reports vary will help sol...
Article
Bug deduplication, ie, recognizing bug reports that refer to the same problem, is a challenging task in the software‐engineering life cycle. Researchers have proposed several methods primarily relying on information‐retrieval techniques. Our work motivated by the intuition that domain knowledge can provide the relevant context to enhance effectiven...
Article
Full-text available
Machine learning is a popular method of learning functions from data to represent and to classify sensor inputs, multimedia, emails, and calendar events. Smartphone applications have been integrating more and more intelligence in the form of machine learning. Machine learning functionality now appears on most smartphones as voice recognition, spell...
Article
Full-text available
Machine learning is a popular method of learning functions from data to represent and to classify sensor inputs, multimedia, emails, and calendar events. Smartphone applications have been integrating more and more intelligence in the form of machine learning. Machine learning functionality now appears on most smartphones as voice recognition, spell...
Article
Full-text available
Software energy consumption is a performance related non-functional requirement that complicates building software on mobile devices today. Energy hogging applications are a liability to both the end-user and software developer. Measuring software energy consumption is non-trivial, requiring both equipment and expertise, yet many researchers have f...
Article
Full-text available
Software energy consumption is a performance related non-functional requirement that complicates building software on mobile devices today. Energy hogging applications are a liability to both the end-user and software developer. Measuring software energy consumption is non-trivial, requiring both equipment and expertise, yet many researchers have f...
Preprint
Full-text available
Bug deduplication is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report software they employ, such as Bugzilla, Jira, or Github Issues. These search capabilities range from simple SQL string search to I...
Preprint
Full-text available
Bug deduplication is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report software they employ, such as Bugzilla, Jira, or Github Issues. These search capabilities range from simple SQL string search to I...
Article
Full-text available
Testing is an integral part of the software development lifecycle, approached with varying degrees of rigor by different process models. Agile process models advocate Test Driven Development (TDD) as one among their key practices for reducing costs and improving code quality. In this paper we comparatively analyze GitHub repositories that adopt TDD...
Article
Full-text available
Testing is an integral part of the software development lifecycle, approached with varying degrees of rigor by different process models. Agile process models advocate Test Driven Development (TDD) as one among their key practices for reducing costs and improving code quality. In this paper we comparatively analyze GitHub repositories that adopt TDD...
Conference Paper
We created detailed profiles of the energy consumed by common operations done on Java List, Map, and Set abstractions. The results show that the alternative data types for these abstractions differ significantly in terms of energy consumption depending on the operations. For example, an ArrayList consumes less energy than a LinkedList if items are...
Conference Paper
Software energy consumption is a relatively new concern for mobile application developers. Poor energy performance can harm adoption and sales of applications. Unfortunately for the developers, the measurement of software energy consumption is expensive in terms of hardware and difficult in terms of expertise. Many prior models of software energy c...
Conference Paper
Developers summarize their changes to code in commit messages. When a message seems "unusual", however, this puts doubt into the quality of the code contained in the commit. We trained n-gram language models and used cross-entropy as an indicator of commit message "unusualness" of over 120,000 commits from open source projects. Build statuses colle...
Conference Paper
Full-text available
Organizations like Mozilla, Microsoft, and Apple are flooded with thousands of automated crash reports per day. Although crash reports contain valuable information for debugging, there are often too many for developers to examine individually. Therefore, in industry, crash reports are often automatically grouped together in buckets. Ubuntu?s reposi...
Conference Paper
The improvement in battery technology for battery-driven devices is insignificant compared to their computing ability. In spite of the overwhelming advances in processing ability, adoption of sophisticated applications is hindered by the fear of shorter battery life. This is one of the several reasons software developers are becoming conscious of w...
Article
Full-text available
Server energy consumption has been a subject of research for more than a decade now. With Internet scaling rapidly all over the world, more servers are being added continuously. With global warming and financial cost associated with running servers, it has now become a more pressing concern to optimize the power consumption of these servers while s...
Article
Full-text available
Server energy consumption has been a subject of research for more than a decade now. With Internet scaling rapidly all over the world, more servers are being added continuously. With global warming and financial cost associated with running servers, it has now become a more pressing concern to optimize the power consumption of these servers while s...
Article
Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repe...
Article
Full-text available
The business models of software/platform as a service have contributed to developers dependence on the Internet. Developers can rapidly point each other and consumers to the newest software changes with the power of the hyper link. But, developers are not limited to referencing software changes to one another through the web. Other shared hypermedi...
Article
Full-text available
The business models of software/platform as a service have contributed to developers dependence on the Internet. Developers can rapidly point each other and consumers to the newest software changes with the power of the hyper link. But, developers are not limited to referencing software changes to one another through the web. Other shared hypermedi...
Article
Full-text available
This paper studies the relationship between Test Driven Development (TDD), productivity and developer sentiment in order to assess the impact of TDD on software development. We used a set of 256572 Java repositories archived from GitHub in September 2015 and made available through the Boa language and infrastructure. This research found that of the...
Conference Paper
Full-text available
Bug triaging, i.e., assigning a bug report to the “best” person to address it, involves identifying a list of developers that are qualified to understand and address the bug report, and then ranking them according to their expertise. Most research in this area examines the description of the bug report and the developers’ prior development and bug-...
Conference Paper
Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoo...
Article
Full-text available
This paper studies the relationship between Test Driven Development (TDD), productivity and developer sentiment in order to assess the impact of TDD on software development. We used a set of 256572 Java repositories archived from GitHub in September 2015 and made available through the Boa language and infrastructure. This research found that of the...
Article
Nine rising stars in software engineering describe how software engineering research will evolve, highlighting emerging opportunities and groundbreaking solutions. They predict the rise of end-user programming, the monitoring of developers through neuroimaging and biometrics sensors, analysis of data from unstructured documents, the mining of mobil...
Conference Paper
Full-text available
Computer Science often seems distant from its natural science cousins, especially software engineering which feels closer to sociology and psychology than to physics. Physical measurements are often rare in software engineering, except in a few niches. One such important niche is that of software energy consumption, green mining, green IT, and sust...