Cristina Videira Lopes’s research while affiliated with University of California, Irvine and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (189)


Integrating AI Tutors in a Programming Course
  • Conference Paper

December 2024

·

6 Reads

Iris Ma

·

Alberto Krone-Martins

·

Cristina Videira Lopes

Figure 1: RAGMan Architecture
Integrating AI Tutors in a Programming Course
  • Preprint
  • File available

July 2024

·

63 Reads

RAGMan is an LLM-powered tutoring system that can support a variety of course-specific and homework-specific AI tutors. RAGMan leverages Retrieval Augmented Generation (RAG), as well as strict instructions, to ensure the alignment of the AI tutors' responses. By using RAGMan's AI tutors, students receive assistance with their specific homework assignments without directly obtaining solutions, while also having the ability to ask general programming-related questions. RAGMan was deployed as an optional resource in an introductory programming course with an enrollment of 455 students. It was configured as a set of five homework-specific AI tutors. This paper describes the interactions the students had with the AI tutors, the students' feedback, and a comparative grade analysis. Overall, about half of the students engaged with the AI tutors, and the vast majority of the interactions were legitimate homework questions. When students posed questions within the intended scope, the AI tutors delivered accurate responses 98% of the time. Within the students used AI tutors, 78% reported that the tutors helped their learning. Beyond AI tutors' ability to provide valuable suggestions, students reported appreciating them for fostering a safe learning environment free from judgment.

Download

Towards AI-Assisted Synthesis of Verified Dafny Methods

July 2024

·

3 Reads

·

16 Citations

Large language models show great promise in many domains, including programming. A promise is easy to make but hard to keep, and language models often fail to keep their promises, generating erroneous code. A promising avenue to keep models honest is to incorporate formal verification: generating programs’ specifications as well as code so that the code can be proved correct with respect to the specifications. Unfortunately, existing large language models show a severe lack of proficiency in verified programming. In this paper, we demonstrate how to improve two pretrained models’ proficiency in the Dafny verification-aware language. Using 178 problems from the MBPP dataset, we prompt two contemporary models (GPT-4 and PaLM-2) to synthesize Dafny methods. We use three different types of prompts: a direct Contextless prompt; a Signature prompt that includes a method signature and test cases, and a Chain of Thought (CoT) prompt that decomposes the problem into steps and includes retrieval augmentation generated example problems and solutions. Our results show that GPT-4 performs better than PaLM-2 on these tasks and that both models perform best with the retrieval augmentation generated CoT prompt. GPT-4 was able to generate verified, human-evaluated, Dafny methods for 58% of the problems, however, GPT-4 managed only 19% of the problems with the Contextless prompt, and even fewer (10%) for the Signature prompt. We are thus able to contribute 153 verified Dafny solutions to MBPP problems, 50 that we wrote manually, and 103 synthesized by GPT-4. Our results demonstrate that the benefits of formal program verification are now within reach of code generating large language models. Likewise, program verification systems can benefit from large language models, whether to synthesize code wholesale, to generate specifications, or to act as a "programmer’s verification apprentice", to construct annotations such as loop invariants which are hard for programmers to write or verification tools to find. Finally, we expect that the approach we have pioneered here — generating candidate solutions that are subsequently formally checked for correctness — should transfer to other domains (e.g., legal arguments, transport signaling, structural engineering) where solutions must be correct, where that correctness must be demonstrated, explained and understood by designers and end-users.


SourcererJBF: A Java Build Framework For Large-Scale Compilation

December 2023

·

15 Reads

·

2 Citations

ACM Transactions on Software Engineering and Methodology

Researchers and tool developers working on dynamic analysis, software testing, automated program repair, verification, and validation, need large compiled, compilable, and executable code corpora to test their ideas. The publicly available corpora are relatively small, and/or non-compilable, and/or non-executable. Developing a compiled code corpus is a laborious activity demanding significant manual effort and human intervention. To facilitate large-scale program analysis research, we develop SourcererJBF , a J ava B uild F ramework that can automatically build a large Java code corpus without project-specific instructions and human intervention. To generate a compiled code corpus, SourcererJBF creates an offline knowledge base by collecting external dependencies from the project directories and existing build scripts (if available). It constructs indices of those collected external dependencies that enable a fast search for resolving dependencies during the project compilation. As the output of the large-scale compilation, it produces JAigantic, a compilable Java corpus containing compiled projects, their bytecode, dependencies, normalized build script, and build command. We evaluated SourcererJBF’s effectiveness, correctness, performance, and scalability in a large collection of Java projects. Our experimental results demonstrate that SourcererJBF is significantly effective and scalable in building large Java code corpus. Besides, it substantiates reasonable performance and correctness similar to projects’ existing build systems.




Fig. 1. Original GitHub Desktop Application
OF COMMIT MESSAGES AND OUR CLASSIFICATIONS
DISTRIBUTION OF COMMIT MESSAGE TYPES
Improving the Quality of Commit Messages in Students' Projects

April 2023

·

31 Reads

Commit messages play a crucial role in collaborative software development. They provide a clear and concise description of the changes made to the source code. However, many commit messages among students' projects lack useful information. This is a concern, as low-quality commit messages can negatively impact communication of software development and future maintenance. To address this issue, this research aims to help students write high-quality commit messages by "nudging" them in the right direction. We modified the GitHub Desktop application by incorporating specific requirements for commit messages, specifically "what" and "why" parts. To test whether this affects the quality of commit messages, we divided students from an Information Retrieval class into two groups, with one group using the modified application and the other using other interfaces. The results show that the quality of commit messages is improved in terms of informativeness, clearness, and length.


Figure e DNN function.
Figure e Frequency of Labels with Unconstrained Random Inputs
Figure e Similarity values for arbitrary models
Figure e Similarity with MN Verr. Each box: min, , st quartile, median, , r d quartile, max.
Black Boxes, White Noise: Similarity Detection for Neural Functions

February 2023

·

24 Reads

Similarity, or clone, detection has important applications in copyright violation, software theft, code search, and the detection of malicious components. There is now a good number of open source and proprietary clone detectors for programs written in traditional programming languages. However, the increasing adoption of deep learning models in software poses a challenge to these tools: these models implement functions that are inscrutable black boxes. As more software includes these DNN functions, new techniques are needed in order to assess the similarity between deep learning components of software. Previous work has unveiled techniques for comparing the representations learned at various layers of deep neural network models by feeding canonical inputs to the models. Our goal is to be able to compare DNN functions when canonical inputs are not available -- because they may not be in many application scenarios. The challenge, then, is to generate appropriate inputs and to identify a metric that, for those inputs, is capable of representing the degree of functional similarity between two comparable DNN functions. Our approach uses random input with values between -1 and 1, in a shape that is compatible with what the DNN models expect. We then compare the outputs by performing correlation analysis. Our study shows how it is possible to perform similarity analysis even in the absence of meaningful canonical inputs. The response to random inputs of two comparable DNN functions exposes those functions' similarity, or lack thereof. Of all the metrics tried, we find that Spearman's rank correlation coefficient is the most powerful and versatile, although in special cases other methods and metrics are more expressive. We present a systematic empirical study comparing the effectiveness of several similarity metrics using a dataset of 56,355 classifiers collected from GitHub. This is accompanied by a sensitivity analysis that reveals how certain models' training related properties affect the effectiveness of the similarity metrics. To the best of our knowledge, this is the first work that shows how similarity of DNN functions can be detected by using random inputs. Our study of correlation metrics, and the identification of Spearman correlation coefficient as the most powerful among them for this purpose, establishes a complete and practical method for DNN clone detection that can be used in the design of new tools. It may also serve as inspiration for other program analysis tasks whose approaches break in the presence of DNN components.



Figure 1: An architectural view of the simulation platform.
A Simulation Analysis of Large Contests with Thresholding Agents

December 2019

·

115 Reads

·

3 Citations

Running contests has been an effective way to solicit efforts from a large pool of participants. Existing research mostly focuses on small contests that typically consist of two or several perfectly rational agents. In practice, however, agents are often founded in complex environments that involve large numbers of players, and they usually use thresholding policies to make decisions. Despite the fact, there is a surprising lack of understanding of how contest factors influence their outcomes. Here, we present the first simulation analysis on how parameters of the contest success function, the population dynamics, and the agents' cutoff policies influence the outcomes of the contests with thresholding agents that are non-cooperative. Experimental results demonstrate that stakeholders can design (approximately) optimal contests to satisfy both their interests and the agents' by choosing a relatively low bias factor. Our work brings new insights into how to design proper competitions to coordinate thresholding agents.


Citations (63)


... Work exploring the role of prompt engineering and search-based approaches for the synthesis of Dafny code include that of Misu et. al. [4] and Branfonbrener et. al. [1]. ...

Reference:

Dafny as Verification-Aware Intermediate Language for Code Generation
Towards AI-Assisted Synthesis of Verified Dafny Methods
  • Citing Article
  • July 2024

... They excel in installing models for fast analysis and low latency, enabling immediate alterations in quality control procedures, but needing additional development experience [25]. Java is known for its stability and cross-platform interoperability, making it ideal for large-scale installations that prioritize reliable performance and easy maintenance [26]. Because of this, it is a great choice for adding quality control models to refinery equipment that is already in place. ...

SourcererJBF: A Java Build Framework For Large-Scale Compilation
  • Citing Article
  • December 2023

ACM Transactions on Software Engineering and Methodology

... This paradigm shift introduces unique challenges in software development [15], such as debugging, optimization, and ensuring the correctness of quantum circuits. Prior work by Jhaveri et al. [16] explored a novel approach for code clone detection by expressing it as a subgraph isomorphism problem solved using quantum annealing. While this represents an important first step, it focuses on the application of quantum computing to classical software problems rather than investigating clones specifically within quantum programs. ...

Cloning and Beyond: A Quantum Solution to Duplicate Code
  • Citing Conference Paper
  • October 2023

... The application of AR as educational technology is a significant area of study in today's era. AR application research in education has been developed for several fields, including medicine and engineering [3][4][5][6][7][8][9][10]. In addition to designing AR applications, verifying their efficiency, usability, and user experience is crucial. ...

Abstract T MP43: Utility of Augmented Reality in Relation to Virtual Reality in Stroke Rehabilitation
  • Citing Article
  • February 2014

Stroke

·

·

·

[...]

·

Steven C Cramer

... Their approach outperformed state-of-the-art tools, achieving recall and precision of 84% and 86% respectively, being also able to recommend the suggestion in less than 2ms. D'Souza et al. [17] presented PyReco, a code completion system for Python that exploit a nearest neighbor classifier to sort the suggested APIs based on the relevance rather than the conventional alphabetic order. Thanks to the rich contextual information collected, like libraries imported and the API methods or attributes, they were able to achieve a Recall of 84%. ...

Collective Intelligence for Smarter API Recommendations in Python
  • Citing Conference Paper
  • October 2016

... Retrieving Candidate Implementations: Once the functional abstractions have been identified, their syntactic signatures are used to perform an interface-based code search [9,20] for candidate implementations using a traditional full-text search engine [21,30,36,40]. These use advanced NLP techniques [32] to retrieve code units which appear to be implementations of the functional abstraction based on their input/output parameter types and identifier names. ...

An Exploratory Study of Interface Redundancy in Code Repositories
  • Citing Conference Paper
  • October 2016

... Svajlenko and Roy also published a mutation and injection framework to evaluate the recall of clone detectors for C [22]. Some studies have attempted to advance the evaluation of precision [64,65]. However, current precision evaluations still require random sampling and manual judgment. ...

Towards Automating Precision Studies of Clone Detectors
  • Citing Conference Paper
  • May 2019

... The seminal article by Terwiesch and Xu (2008) presented the conceptual underpinnings of this claim wherein authors state that as the number of competitors increases in a contest, the winning probability of the focal solver reduces, leading to weaker incentives for the solver to exert greater efforts (Terwiesch andXu 2008, 1536-37). A related phenomenon is the 'bias factor,' which measures the extent to which the focal solver's win in a CC is deterministic as a function of the solver's efforts (Shen, Achar, and Lopes 2019). It is observed that the bias factor grows, the solver's output (efforts) register a sharp drop (Shen, Achar, and Lopes 2019, 186). ...

A Simulation Analysis of Large Contests with Thresholding Agents

... Svajlenko and Roy also published a mutation and injection framework to evaluate the recall of clone detectors for C [22]. Some studies have attempted to advance the evaluation of precision [64,65]. However, current precision evaluations still require random sampling and manual judgment. ...

On Precision of Code Clone Detection Tools
  • Citing Conference Paper
  • February 2019

... Although things have improved slightly, the availability of dynamic (observational) data remains significantly limited compared to syntactic data, hindering the development of fully morescient models. This is because real-world, high-quality corpora of run-time behavior (i.e., morescient data sets) are challenging to obtain and curate at a large scale [9,52]. A prerequisite is parsable and executable code. ...

NJR: a normalized Java resource
  • Citing Conference Paper
  • July 2018