November 2023
·
7 Reads
·
3 Citations
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
November 2023
·
7 Reads
·
3 Citations
September 2023
·
2 Reads
August 2023
·
25 Reads
Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code -- supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce code that is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, less error-prone imperative DL frameworks encouraging eager execution have emerged at the expense of run-time performance. Though hybrid approaches aim for the "best of both worlds," using them effectively requires subtle considerations to make code amenable to safe, accurate, and efficient graph execution -- avoiding performance bottlenecks and semantically inequivalent results. We present our ongoing work on an automated refactoring approach that assists developers in specifying whether and how their otherwise eagerly-executed imperative DL code could be reliably and efficiently executed as graphs at run-time in a semantics-preserving fashion. The approach, based on a novel tensor analysis specifically for imperative DL code, consists of refactoring preconditions for automatically determining when it is safe and potentially advantageous to migrate imperative DL code to graph execution and modifying decorator parameters or eagerly executing code already running as graphs. The approach is being implemented as a PyDev Eclipse IDE plug-in and uses the WALA Ariadne analysis framework. We discuss our ongoing work towards optimizing imperative DL code to its full potential.
October 2022
·
8 Reads
·
3 Citations
October 2022
·
15 Reads
·
8 Citations
May 2022
·
7 Reads
·
3 Citations
February 2022
·
77 Reads
Issue tracking systems enable users and developers to comment on problems plaguing a software system. Empirical Software Engineering (ESE) researchers study (open-source) project issues and the comments and threads within to discover -- among others -- challenges developers face when, e.g., incorporating new technologies, platforms, and programming language constructs. However, issue discussion threads accumulate over time and thus can become unwieldy, hindering any insight that researchers may gain. While existing approaches alleviate this burden by classifying issue thread comments, there is a gap between searching popular open-source software repositories (e.g., those on GitHub) for issues containing particular keywords and feeding the results into a classification model. In this paper, we demonstrate a research infrastructure tool called QuerTCI that bridges this gap by integrating the GitHub issue comment search API with the classification models found in existing approaches. Using queries, ESE researchers can retrieve GitHub issues containing particular keywords, e.g., those related to a certain programming language construct, and subsequently classify the kinds of discussions occurring in those issues. Using our tool, our hope is that ESE researchers can uncover challenges related to particular technologies using certain keywords through popular open-source repositories more seamlessly than previously possible. A tool demonstration video may be found at: https://youtu.be/fADKSxn0QUk.
January 2022
·
107 Reads
Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code that supports symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce DL code that is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, less error-prone imperative DL frameworks encouraging eager execution have emerged but at the expense of run-time performance. While hybrid approaches aim for the "best of both worlds," the challenges in applying them in the real world are largely unknown. We conduct a data-driven analysis of challenges -- and resultant bugs -- involved in writing reliable yet performant imperative DL code by studying 250 open-source projects, consisting of 19.7 MLOC, along with 470 and 446 manually examined code patches and bug reports, respectively. The results indicate that hybridization: (i) is prone to API misuse, (ii) can result in performance degradation -- the opposite of its intention, and (iii) has limited application due to execution mode incompatibility. We put forth several recommendations, best practices, and anti-patterns for effectively hybridizing imperative DL code, potentially benefiting DL practitioners, API designers, tool developers, and educators.
December 2021
·
45 Reads
Logging is a significant programming practice. Due to the highly transactional nature of modern software applications, massive amount of logs are generated every day, which may overwhelm developers. Logging information overload can be dangerous to software applications. Using log levels, developers can print the useful information while hiding the verbose logs during software runtime. As software evolves, the log levels of logging statements associated with the surrounding software feature implementation may also need to be altered. Maintaining log levels necessitates a significant amount of manual effort. In this paper, we demonstrate an automated approach that can rejuvenate feature log levels by matching the interest level of developers in the surrounding features. The approach is implemented as an open-source Eclipse plugin, using two external plug-ins (JGit and Mylyn). It was tested on 18 open-source Java projects consisting of ~3 million lines of code and ~4K log statements. Our tool successfully analyzes 99.22% of logging statements, increases log level distributions by ~20%, and increases the focus of logs in bug fix contexts ~83% of the time. For further details, interested readers can watch our demonstration video (https://www.youtube.com/watch?v=qIULoAXoDv4).
October 2021
·
14 Reads
·
12 Citations
Science of Computer Programming
Loggingâused for system events and security breaches to describe more informational yet essential aspects of software featuresâis pervasive. Given the high transactionality of today's software, logging effectiveness can be reduced by information overload. Log levels help alleviate this problem by correlating a priority to logs that can be later filtered. As software evolves, however, levels of logs documenting surrounding feature implementations may also require modification as features once deemed important may have decreased in urgency and vice-versa. We present an automated approach that assists developers in evolving levels of such (feature) logs. The approach, based on mining Git histories and manipulating a degree of interest (DOI) model,Âč transforms source code to revitalize feature log levels based on the âinterestingnessâ of the surrounding code. Built upon JGit and Mylyn, the approach is implemented as an Eclipse IDE plug-in and evaluated on 18 Java projects with âŒ3 million lines of code and âŒ4K log statements. Our tool successfully analyzes 99.22% of logging statements, increases log level distributions by âŒ20%, and increases the focus of logs in bug fix contexts âŒ83% of the time. Moreover, pull (patch) requests were integrated into large and popular open-source projects. The results indicate that the approach is promising in assisting developers in evolving feature log levels.
... As shown in Table 1, we focused on six papers for replication. In 16 studies [10,11,16,25,43,60,61,64,67,85,89], the detection of security weaknesses and bugs in code snippets was not automated; instead, these studies relied on manual processes to label and identify security issues and/or bugs in code snippets. This approach typically involved human annotators reviewing and classifying the code for potential weaknesses, which is hard to compare in a replication study with different human evaluators. ...
November 2023
... Unlike the work of , they exclude all generic (i.e., non-DL specific) bugs in DL applications before building the taxonomy, and identify API misuse as a leaf category of the whole taxonomy. VĂ©lez et al. (2022) examined code patches and bug reports of tf.function, and found that hybridization approach could lead to API misuses. The work of VĂ©lez et al. (2022) is not an exclusive study targeting API misuse. ...
October 2022
... With the rapid development of the Internet, users generate massive amounts of log information while using the Internet. When faced with massive amounts of Internet information, it is difficult for users to obtain the information they are interested in, resulting in information overload problems [1,2]. Therefore, various recommendation methods have become research hotspots, enabling user groups to obtain realtime and effective information that they are interested in (such as microblog recommendation, product recommendation, movie recommendation, etc.). ...
October 2022
... regarding the benefits and costs of logging practice. More recently, Tang et al. (2022) study the logging practices specific to log levels and present an automated tool (Tang et al. 2021) to help developers rejuvenate log levels. In addition, metrics used in our study to measure logging characterises, such as the density and churn rate of logging statements, are also adopted in many prior studies (Yuan et al. 2012b;Shang et al. 2015;Chen and Jiang 2017c;Kabinna et al. 2018;Zeng et al. 2019). ...
Reference:
Studying logging practice in test code
May 2022
... regarding the benefits and costs of logging practice. More recently, Tang et al. (2022) study the logging practices specific to log levels and present an automated tool (Tang et al. 2021) to help developers rejuvenate log levels. In addition, metrics used in our study to measure logging characterises, such as the density and churn rate of logging statements, are also adopted in many prior studies (Yuan et al. 2012b;Shang et al. 2015;Chen and Jiang 2017c;Kabinna et al. 2018;Zeng et al. 2019). ...
Reference:
Studying logging practice in test code
October 2021
Science of Computer Programming
... While existing research has delved into various facets of model maintenance-ranging from technical debt [6,59], library usage [16], to architectural frameworks [42]-there remains a notable gap in comprehensively categorizing and analyzing the changes made to models over time. Specifically, no prior study has applied a multifaceted taxonomy of changes to ML repositories to systematically understand how these models are maintained and improved in practice. ...
Reference:
How do Machine Learning Models Change?
May 2021
... ensuring that messages are sent or received in expected orders [4,34,63], including the possibility that a message arrives before or after an actor is using a behaviour ready to process that particular message. Static checking of message ordering in message passing systems is a classic problem, studied in many settings. ...
November 2020
Proceedings of the ACM on Programming Languages
... Furthermore, tf.function may be used as a first-class function instead of a decorator. To this end, we are working towards building a fluent API typestate analysis for imperative DL code by adapting the work of Khatchadourian et al. [32]. Existing work for determining tensor shapes only works for procedural TensorFlow (TF v1) code. ...
May 2019
... Khatchadourian et al. [15] focus on analyzing stream operations for safe parallelization, defining conditions to ensure performance gains when converting sequential to parallel streams. Kayak [5], a semantics-driven refactoring tool, utilizes program synthesis to transform external iterations into Java 8 Streams. ...
May 2020
Science of Computer Programming
... Zhang et al. [14] discuss a case where fixing a bug related to an underdetermined reflection API method (the order of the elements returned by getDeclaredFields) adds multiple lines to the code. Bug patterns involving Java streams are discussed in Khatchadourian et al. [7]. Java exception handling bugs are explored in Ebert et al. [5]. ...
April 2020
Lecture Notes in Computer Science