Westley Weimer’s research while affiliated with University of Michigan and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (184)


Towards a Cognitive Model of Dynamic Debugging: Does Identifier Construction Matter?
  • Article

November 2024

IEEE Transactions on Software Engineering

Danniell Hu

·

Priscila Santiesteban

·

Madeline Endres

·

Westley Weimer

Debugging is a vital and time-consuming process in software engineering. Recently, researchers have begun using neuroimaging to understand the cognitive bases of programming tasks by measuring patterns of neural activity. While exciting, prior studies have only examined small sub-steps in isolation, such as comprehending a method without writing any code or writing a method from scratch without reading any already-existing code. We propose a simple multi-stage debugging model in which programmers transition between Task Comprehension, Fault Localization, Code Editing, Compiling, and Output Comprehension activities. We conduct a human study of n=28 participants using a combination of functional near-infrared spectroscopy and standard coding measurements (e.g., time taken, tests passed, etc.). Critically, we find that our proposed debugging stages are both neurally and behaviorally distinct. To the best of our knowledge, this is the first neurally-justified cognitive model of debugging. At the same time, there is significant interest in understanding how programmers from different backgrounds, such as those grappling with challenges in English prose comprehension, are impacted by code features when debugging. We use our cognitive model of debugging to investigate the role of one such feature: identifier construction. Specifically, we investigate how features of identifier construction impact neural activity while debugging by participants with and without reading difficulties. While we find significant differences in cognitive load as a function of morphology and expertise, we do not find significant differences in end-to-end programming outcomes (e.g., time, correctness, etc.). This nuanced result suggests that prior findings on the cognitive importance of identifier naming in isolated sub-steps may not generalize to end-to-end debugging. Finally, in a result relevant to broadening participation in computing, we find no behavioral outcome differences for participants with reading difficulties.


Fig. 1: We investigate the relationship between authors' demographic information (gender and age) and potential human biases for online technical articles in SE. The experiment controls the article text with varying author profiles based on gender and age.
Fig. 2: Example stimuli for the pre-study experiment. Participants are asked to rank the likelihood of these profile pictures appearing as an author photo for an online technical article by dragging the pictures provided. In this example, profile picture 4 was ranked as the least likely to appear as an author photo by 75% of the participants and was eliminated from the final dataset.
Fig. 3: Example for the final survey stimuli. These examples demonstrate a controlled article paired with six different profile pictures that represent all six groups (i.e., YM, YF, MM, MF, OM, and OF) in six versions of the final survey (i.e., V1 -V6 in Table 1). The target question for this controlled article is "What are the steps for creating a 3D model based on a 2D image?"
Fig. 6: Survey duration by gender. The average response time for female participants is 182.86 seconds (median = 177 seconds), while the average response time for male participants is 143.97 seconds (median = 128 seconds). The 95% confidence interval for the difference in response time between these two groups is [23.69, 54.08] seconds.
Demographics of survey participants. This table shows the gender and age distribution of the 540 partici- pants who completed this study.
A Controlled Experiment in Age and Gender Bias When Reading Technical Articles in Software Engineering
  • Article
  • Full-text available

October 2024

·

28 Reads

IEEE Transactions on Software Engineering

Online platforms and communities are a critical part of modern software engineering, yet are often affected by human biases. While previous studies investigated human biases and their potential harms against the efficiency and fairness of online communities, they have mainly focused on the open source and Q&A platforms, such as GitHub and Stack Overflow, but overlooked the audience-focused online platforms for delivering programming and SE-related technical articles, where millions of software engineering practitioners share, seek for, and learn from high-quality software engineering articles (i.e., technical articles for SE). Furthermore, most of the previous work has revealed gender and race bias, but we have little knowledge about the effect of age on software engineering practice. In this paper, we propose to investigate the effect of authors’ demographic information (gender and age) on the evaluation of technical articles on software engineering and potential behavioral differences among participants. We conducted a survey-based and controlled human study and collected responses from 540 participants to investigate developers’ evaluation of technical articles for software engineering. By controlling the gender and age of the author profiles of technical articles for SE, we found that raters tend to have more positive content depth evaluations for younger male authors when compared to older male authors and that male participants conduct technical article evaluations faster than female participants, consistent with prior study findings. Surprisingly, different from other software engineering evaluation activities (e.g., code review, pull request, etc.), we did not find a significant difference in the genders of authors on the evaluation outcome of technical articles in SE.

Download

Self-organization in computation and chemistry: Return to AlChemy

September 2024

·

6 Reads

How do complex adaptive systems, such as life, emerge from simple constituent parts? In the 1990s, Walter Fontana and Leo Buss proposed a novel modeling approach to this question, based on a formal model of computation known as the λ calculus. The model demonstrated how simple rules, embedded in a combinatorially large space of possibilities, could yield complex, dynamically stable organizations, reminiscent of biochemical reaction networks. Here, we revisit this classic model, called AlChemy, which has been understudied over the past 30 years. We reproduce the original results and study the robustness of those results using the greater computing resources available today. Our analysis reveals several unanticipated features of the system, demonstrating a surprising mix of dynamical robustness and fragility. Specifically, we find that complex, stable organizations emerge more frequently than previously expected, that these organizations are robust against collapse into trivial fixed points, but that these stable organizations cannot be easily combined into higher order entities. We also study the role played by the random generators used in the model, characterizing the initial distribution of objects produced by two random expression generators, and their consequences on the results. Finally, we provide a constructive proof that shows how an extension of the model, based on the typed λ calculus, could simulate transitions between arbitrary states in any possible chemical reaction network, thus indicating a concrete connection between AlChemy and chemical reaction networks. We conclude with a discussion of possible applications of AlChemy to self-organization in modern programming languages and quantitative approaches to the origin of life.


Self-Organization in Computation & Chemistry: Return to AlChemy

August 2024

·

23 Reads

How do complex adaptive systems, such as life, emerge from simple constituent parts? In the 1990s Walter Fontana and Leo Buss proposed a novel modeling approach to this question, based on a formal model of computation known as λ\lambda calculus. The model demonstrated how simple rules, embedded in a combinatorially large space of possibilities, could yield complex, dynamically stable organizations, reminiscent of biochemical reaction networks. Here, we revisit this classic model, called AlChemy, which has been understudied over the past thirty years. We reproduce the original results and study the robustness of those results using the greater computing resources available today. Our analysis reveals several unanticipated features of the system, demonstrating a surprising mix of dynamical robustness and fragility. Specifically, we find that complex, stable organizations emerge more frequently than previously expected, that these organizations are robust against collapse into trivial fixed-points, but that these stable organizations cannot be easily combined into higher order entities. We also study the role played by the random generators used in the model, characterizing the initial distribution of objects produced by two random expression generators, and their consequences on the results. Finally, we provide a constructive proof that shows how an extension of the model, based on typed λ\lambda calculus, \textcolor{black}{could simulate transitions between arbitrary states in any possible chemical reaction network, thus indicating a concrete connection between AlChemy and chemical reaction networks}. We conclude with a discussion of possible applications of AlChemy to self-organization in modern programming languages and quantitative approaches to the origin of life.





Automatically Mitigating Vulnerabilities in Binary Programs via Partially Recompilable Decompilation

January 2024

·

1 Citation

IEEE Transactions on Dependable and Secure Computing

Vulnerabilities are challenging to locate and repair, especially when source code is unavailable and binary patching is required. Manual methods are time-consuming, require significant expertise, and do not scale to the rate at which new vulnerabilities are discovered. Automated methods are an attractive alternative, and we propose Partially Recompilable Decompilation (PRD) to help automate the process. PRD lifts suspect binary functions to source, available for analysis, revision, or review, and creates a patched binary using source- and binary-level techniques. Although decompilation and recompilation do not typically succeed on an entire binary, our approach does because it is limited to a few functions, such as those identified by our binary fault localization. We evaluate the assumptions underlying our approach and find that, without any grammar or compilation restrictions, up to 79% of individual functions are successfully decompiled and recompiled. In comparison, only 1.7% of the full C-binaries succeed. When recompilation succeeds, PRD produces test-equivalent binaries 93.0% of the time. We evaluate PRD in two contexts: a fully automated process incorporating source-level Automated Program Repair (APR) methods; and human-edited source-level repairs. When evaluated on DARPA Cyber Grand Challenge (CGC) binaries, we find that PRD-enabled APR tools, operating only on binaries, perform as well as, and sometimes better than full-source tools, collectively mitigating 85 of the 148 scenarios, a success rate consistent with the same tools operating with access to the entire source code. PRD achieves similar success rates as the winning CGC entries, sometimes finding higher-quality mitigations than those produced by top CGC teams. For generality, the evaluation includes two independently developed APR tools and C++, Rode0day, and real-world binaries.



CirFix: Automated Hardware Repair and Its Real-World Applications

July 2023

·

28 Reads

·

2 Citations

IEEE Transactions on Software Engineering

This article presents CirFix, a framework for automatically repairing defects in hardware designs implemented in languages like Verilog. We propose a novel fault localization approach based on assignments to wires and registers, and a fitness function tailored to the hardware domain to bridge the gap between software-level automated program repair and hardware descriptions. We also present a benchmark suite of 32 defect scenarios corresponding to a variety of hardware projects. Overall, CirFix produces plausible repairs for 21/32 and correct repairs for 16/32 of the defect scenarios. Additionally, we evaluate CirFix's fault localization independently through a human study (n=41), and find that the approach may be a beneficial debugging aid for complex multi-line hardware defects.


Citations (66)


... Following [119] and [91], the partial search string on eye tracking should include the terms for method and device each in the two common spelling variants with and without hyphens (i.e., "eye tracking", "eye-tracking", "eye tracker", and "eye-tracker"). 3 Unlike in [119], we do not include the term "RFV" (short for restricted focus viewer); also, departing from both previous SLRs, we add the more general terms "eye movement" or "eye movements". For the partial search term on SE-related artifacts, we take the overlap of the two previous SLRs: "code" (from "source code" in [119] and "code" or "pseudo code" in [91]), "program*", and "uml"; deviating from the two works, we add the terms "software" and "requirement". ...

Reference:

On Eye Tracking in Software Engineering
How Do We Read Formal Claims? Eye-Tracking and the Cognition of Proofs about Algorithms
  • Citing Conference Paper
  • May 2023

... For instance, LGBTQIA+ software professionals, women experiencing harassment within software development environments, and software professionals contending with mental health conditions such as depression. Within the software engineering literature, some publications addressed hidden populations, including experienced green software practitioners [30], LGBTQIA+ software professionals [12,13,18], LGBTQIA+ software engineering students [34], cyber-physical systems engineers [46], software professionals on the autism spectrum [26], programmers who use cannabis [16], software professionals who use psychoactive substances at work [33], and whistleblowers in the software industry [14]. Additionally, some publications treated software professionals as a hidden population in a broader sense [11,28,29,45], employing this approach as a strategy to mitigate sampling bias [4]. ...

From Organizations to Individuals: Psychoactive Substance Use By Professional Programmers
  • Citing Conference Paper
  • May 2023

... In addition, the attacker is capable of spoofing system IDs. The work by Highnam et al. [117] described a realistic scenario of many drones operating under the MAVLink protocol being compromised. By capturing a flight mission's system ID and spoofing MAVLink packets, the considered specimen threat scenario exhibits an attacker's capacity to carry out a stealthy assault. ...

An Uncrewed Aerial Vehicle Attack Scenario and Trustworthy Repair Architecture
  • Citing Conference Paper
  • June 2016

... Notable findings include Gordon et al.'s [11] observation that female teachers received lower teaching evaluation scores than did male teachers. Santiesteban et al. [12] found a gender bias in computer science teaching evaluations that mainly affected the evaluation results of professors and had a smaller impact on the evaluation results of student teachers. Arrona-Palacios et al. [13] noted that undergraduate students rarely consider gender when evaluating professors but prefer male teachers to recommend the best professor. ...

An analysis of sex differences in computing teaching evaluations
  • Citing Conference Paper
  • December 2022

... In comparison, our work focuses on the online testing of UAS using OCL-based test oracles. Recently, Leach et al. (2022) presented a framework for assessing and handling the security situation during UAV operation. The evaluation using ArduPilot and Red Team attacks shows that the proposed framework can increase the dependability of autonomous systems. ...

START: A Framework for Trusted and Resilient Autonomous Vehicles (Practical Experience Report)
  • Citing Conference Paper
  • October 2022

... Software development encompasses more than just programming and teamwork; it also involves actively seeking knowledge online and in knowledge management systems [12], conducting testing and code reviewing [14], and taking advantage of software such as Integrated Development Environments. IDEs have been a researched topic for quite some time [26] as well as how online resources [8,18] aid and enhance the development process, both in speed and quality across varying ranges of experience. Through the use of IDEs, software engineers have gotten access to capabilities for refactoring, debugging, source repositories, third-party plugins [26], and auto-completion of code [6]. ...

Debugging with stack overflow: web search behavior in novice and expert programmers
  • Citing Conference Paper
  • October 2022

... Consequently, some software engineers relax by finding escape on drugs,consuming cannabis- [45], at risk of losing their job. According to research [45] among 803 software developers, an astonishing 35% performed programming under the influence of cannabis, while an 18% continues to do so, at least once a month. ...

Hashing it out: a survey of programmers' cannabis usage, perception, and motivation
  • Citing Conference Paper
  • July 2022

... Alternatively, reducing the search space by "searching in the right place" (Ahmad et al. 2022) involves profiling the target application in order to identify hot methods. As such, in GI the targeted applications are often profiled. ...

Digging into Semantics: Where Do Search-Based Software Repair Methods Search?
  • Citing Chapter
  • August 2022

Lecture Notes in Computer Science

... More recent research by Li et al. (2022) examined how novice and expert programmers utilize Stack Overflow for debugging, emphasizing the importance of nuanced strategies such as query formulation and code reviews. While the distinction between novices and experts does not necessarily align with high and low performing students, these insights remain valuable. ...

Debugging with Stack Overflow: Web Search Behavior in Novice and Expert Programmers
  • Citing Conference Paper
  • May 2022

... APR systems, as depicted in Fig. 1(b), receive design codes and test cases, and attempt to enact targeted modifications with predefined templates to ensure all tests are passed. Innovations such as Cirfix [7], Strider [8], and RTLrepair [9] demonstrate the potential of APR to reduce the labor and time required for hardware design verification. However, these APR methodologies predominantly rely on fixed templates and focus on addressing functional error, limiting their scope and effectiveness of the repairs. ...

CirFix: automatically repairing defects in hardware design code
  • Citing Conference Paper
  • February 2022