Sonia Haiduc

Sonia Haiduc
Florida State University | FSU · Department of Computer Science

Ph.D.

About

52
Publications
13,341
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,274
Citations

Publications

Publications (52)
Preprint
Accurate automatic code extraction from tutorial videos is crucial for software developers seeking to reuse the code contained in these videos. Current methods using optical character recognition (OCR) often yield inaccurate results due to code complexity and variations in screencast formats. To address this issue, we introduce CodeT5-OCRfix, an ap...
Article
Full-text available
Software developers are often using instant messaging platforms to communicate with each other and other stakeholders. Among these platforms, Gitter has emerged as a popular choice and the messages it contains can reveal important information to researchers studying open source software systems. Uncovering what developers are communicating about th...
Preprint
Full-text available
Mobile apps are one of the most widely used types of software systems in existence today and more programmers and students learn how to develop them everyday. One of the most popular resources for learning mobile programming are videos hosted on social platforms such as YouTube. While useful, this type of resource has also its limitations, especial...
Article
Full-text available
As societal dependence on software continues to grow, bugs are becoming increasingly costly in terms of financial resources as well as human safety. Bug localization is the process by which a developer identifies buggy code that needs to be fixed to make a system safer and more reliable. Unfortunately, manually attempting to locate bugs solely from...
Preprint
Full-text available
Mobile applications demand is on the rise, leading to more programmers learning to develop or having to maintain this kind of programs. Developers often refer to online resources to find inspiration or answers to questions they have about mobile programming topics and screencasts are a popular resource. However, given the multitude of screencasts a...
Preprint
Full-text available
The need for mobile applications and mobile programming is increasing due to the continuous rise in the pervasiveness of mobile devices. Developers often refer to video programming tutorials to learn more about mobile programming topics. To find the right video to watch, developers typically skim over several videos, looking at their title, descrip...
Preprint
Full-text available
Team communication is essential for the development of modern software systems. For distributed software development teams, such as those found in many open source projects, this communication usually takes place using electronic tools. Among these, modern chat platforms such as Gitter are becoming the de facto choice for many software projects due...
Preprint
Full-text available
Software developers use modern chat platforms to communicate about the status of a project and to coordinate development and release efforts, among other things. Developers also use chat platforms to ask technical questions to other developers. While some questions are project-specific and require an experienced developer familiar with the system t...
Preprint
Full-text available
Programming screencasts can be a rich source of documentation for developers. However, despite the availability of such videos, the information available in them, and especially the source code being displayed is not easy to find, search, or reuse by programmers. Recent work has identified this challenge and proposed solutions that identify and ext...
Article
Full-text available
Programming screencasts are growing in popularity and are often used by developers as a learning source. The source code shown in these screencasts is often not available for download or copy-pasting. Without having the code readily available, developers have to frequently pause a video to transcribe the code. This is time-consuming and reduces the...
Conference Paper
Full-text available
Background: Video programming tutorials are becoming a popular resource for developers looking for quick answers to a specific programming problem or trying to learn a programming topic in more depth. Since the most important source of information for developers in many such videos is source code, it is important to be able to accurately extract th...
Preprint
Previous studies have shown that software traceability, the ability to link together related artifacts from different sources within a project (e.g., source code, use cases, documentation, etc.), improves project outcomes by assisting developers and other stakeholders with common tasks such as impact analysis, concept location, etc. Establishing tr...
Conference Paper
Software development video tutorials are emerging as a new resource for developers to support their information needs. However, when trying to find the right video to watch for a task at hand, developers have little information at their disposal to quickly decide if they found the right video or not. This can lead to missing the best tutorials or w...
Article
Software development video tutorials have seen a steep increase in popularity in recent years. Their main advantage is that they thoroughly illustrate how certain technologies, programming languages, etc. are to be used. However, they come with a caveat: there is currently little support for searching and browsing their content. This makes it diffi...
Article
Context: Since the mid-2000s, numerous recommendation systems based on text retrieval (TR) have been proposed to support software engineering (SE) tasks such as concept location, traceability link recovery, code reuse, impact analysis, and so on. The success of TR-based solutions highly depends on the query submitted, which is either formulated by...
Conference Paper
Traceability Link Recovery (TLR) is a fundamental software maintenance task in which links are established between related software artifacts of different types (e.g., source code, documentation, requirements specifications, etc.) within a system. Existing approaches to TLR often require a human to analyze a long list of potential links and disting...
Conference Paper
When knowledgeable colleagues are not available, developers resort to offline and online resources, e.g., tutorials, mailing lists, and Q&A websites. These, however, need to be found, read, and understood, which takes its toll in terms of time and mental energy. A more immediate and accessible resource are video tutorials found on the web, which in...
Conference Paper
Code reuse is a common practice among software developers, whether novices or experts. Developers often rely on online resources in order to find code to reuse. For Python, the Python Package Index (PyPI) contains all packages developed for the community and is the largest catalog of reusable, open source packages developers can consult. While a va...
Conference Paper
Nowadays developers heavily rely on sources of informal documentation, including Q&A forums, slides, or video tutorials, the latter being particularly useful to provide introductory notions for a piece of technology. The current practice is that developers have to browse sources individually, which in the case of video tutorials is cumbersome, as t...
Conference Paper
Humanitarian Free Open Source Software (HFOSS) serves philanthropic goals that usually benefit non-profit organizations meant to improve the human condition. The altruistic goals these systems serve can offer developers additional motivation for contributing to OSS and have been seen as a way to attract more women to computing majors and to improve...
Conference Paper
This technical briefing presents the state of the art Text Retrieval and Natural Language Processing techniques used in Software Engineering and discusses their applications in the field.
Conference Paper
Text Retrieval (TR) approaches have been used to leverage the textual information contained in software artifacts to address a multitude of software engineering (SE) tasks. However, TR approaches need to be configured properly in order to lead to good results. Current approaches for automatic TR configuration in SE configure a single TR approach an...
Article
Text Retrieval (TR) techniques have been successfully used to leverage the textual information found in software artifacts with the purpose of aiding developers with their daily tasks. TR techniques require a query as input and the usefulness of the results they retrieve depends greatly on this query. While some queries retrieve relevant informatio...
Conference Paper
Software development knowledge resides in the source code and in a number of other artefacts produced during the development process. To extract such a knowledge, past software engineering research has extensively focused on mining the source code, i.e., the final product of the development effort. Currently, we witness an emerging trend where rese...
Conference Paper
Text retrieval (TR) techniques have been widely used to support concept and bug location. When locating bugs, developers often formulate queries based on the bug descriptions. More than that, a large body of research uses bug descriptions to evaluate bug location techniques using TR. The implicit assumption is that the bug descriptions and the rele...
Conference Paper
There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to...
Conference Paper
Developers search source code frequently during their daily tasks, to find pieces of code to reuse, to find where to implement changes, etc. Code search based on text retrieval (TR) techniques has been widely used in the software engineering community during the past decade. The accuracy of the TR-based search results depends largely on the quality...
Chapter
Concept location in source code is an essential activity during software change. It starts with a change request and results in a place in the source code where the change is to be implemented. As a program comprehension activity, it is also part of other software evolution tasks, such as, bug localization, recovery of traceability links between so...
Conference Paper
Text-based search and retrieval is used by developers in the context of many SE tasks, such as, concept location, traceability link retrieval, reuse, impact analysis, etc. Solutions for software text search range from regular expression matching to complex techniques using text retrieval. In all cases, the results of a search depend on the query fo...
Conference Paper
Text retrieval approaches have been used to address many software engineering tasks. In most cases, their use involves issuing a textual query to retrieve a set of relevant software artifacts from the system. The performance of all these approaches depends on the quality of the given query (i.e., its ability to describe the information need in such...
Conference Paper
Concept location is an essential task during software maintenance and in particular program comprehension activities. One of the approaches to this task is based on leveraging the lexical information found in the source code by means of Information Retrieval techniques. All IR-based approaches to concept location are highly dependent on the queries...
Conference Paper
Experienced programmers choose identifier names carefully, in the attempt to convey information about the role and behavior of the labeled code entity in a concise and expressive way. In fact, during program understanding the names given to code entities represent one of the major sources of information used by developers. We conjecture that lexico...
Conference Paper
Concept location is an essential task during software maintenance and in particular program comprehension activities. One of the approaches to this task is the based on leveraging the lexical information found in the source code by means of Information Retrieval techniques. All IR-based approaches to concept location are highly dependent on the que...
Article
Full-text available
During maintenance developers cannot read the entire code of large systems. They need a way to get a quick understanding of source code entities (such as, classes, methods, packages, etc.), so they can efficiently identify and then focus on the ones related to their task at hand. Sometimes reading just a method header or a class name does not tell...
Article
Full-text available
One of the main challenges faced by today's developers is keeping up with the staggering amount of source code that needs to be read and understood. In order to help developers with this problem and reduce the costs associated with it, one solution is to use simple textual descriptions of source code entities that developers can grasp easily, while...
Article
Full-text available
Concept location is a critical activity during software evolution as it produces the location where a change is to start in response to a modification request, such as, a bug report or a new feature request. Lexical-based concept location techniques rely on matching the text embedded in the source code to queries formulated by the developers. The e...
Article
Source code is a mixed software artifact, containing information for both the compiler and the developers. While programming language grammar dictates how the source code is written, developers have a lot of freedom in writing identifiers and comments. These are intentional in nature and become means of communication between developers.The goal of...
Conference Paper
We introduce the notion of "lexicon bad smell", which parallels that of "code smell" and indicates some potential lexicon construction problems that can be addressed through refactoring (e.g., renaming). We created a catalog of lexicon bad smells and we developed a publicly available suite of detectors to locate them. The paper presents a case stud...
Conference Paper
Information about the problem domain of the software and the solution it implements is often embedded by developers in comments and identifiers. When using software developed by others or when are new to a project, programmers know little about how domain information is reflected in the source code. Programmers often learn about the domain from ext...

Network

Cited By