Yuriy Brun’s research while affiliated with University of Massachusetts Amherst and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (154)


Rango: Adaptive Retrieval-Augmented Proving for Automated Software Verification
  • Preprint

December 2024

·

6 Reads

Kyle Thompson

·

Nuno Saavedra

·

Pedro Carrott

·

[...]

·

Emily First

Formal verification using proof assistants, such as Coq, enables the creation of high-quality software. However, the verification process requires significant expertise and manual effort to write proofs. Recent work has explored automating proof synthesis using machine learning and large language models (LLMs). This work has shown that identifying relevant premises, such as lemmas and definitions, can aid synthesis. We present Rango, a fully automated proof synthesis tool for Coq that automatically identifies relevant premises and also similar proofs from the current project and uses them during synthesis. Rango uses retrieval augmentation at every step of the proof to automatically determine which proofs and premises to include in the context of its fine-tuned LLM. In this way, Rango adapts to the project and to the evolving state of the proof. We create a new dataset, CoqStoq, of 2,226 open-source Coq projects and 196,929 theorems from GitHub, which includes both training data and a curated evaluation benchmark of well-maintained projects. On this benchmark, Rango synthesizes proofs for 32.0% of the theorems, which is 29% more theorems than the prior state-of-the-art tool Tactician. Our evaluation also shows that Rango adding relevant proofs to its context leads to a 47% increase in the number of theorems proven.


Cobblestone: Iterative Automation for Formal Verification
  • Preprint
  • File available

October 2024

·

8 Reads

Formal verification using proof assistants, such as Coq, is an effective way of improving software quality, but it is expensive. Writing proofs manually requires both significant effort and expertise. Recent research has used machine learning to automatically synthesize proofs, reducing verification effort, but these tools are able to prove only a fraction of the desired software properties. We introduce Cobblestone, a new proof-synthesis approach that improves on the state of the art by taking advantage of partial progress in proof synthesis attempts. Unlike prior tools, Cobblestone can produce multiple unsuccessful proofs using a large language model (LLM), identify the working portions of those proofs, and combine them into a single, successful proof, taking advantage of internal partial progress. We evaluate Cobblestone on two benchmarks of open-source Coq projects, controlling for training data leakage in LLM datasets. Fully automatically, Cobblestone can prove 48% of the theorems, while Proverbot9001, the previous state-of-the-art, learning-based, proof-synthesis tool, can prove 17%. Cobblestone establishes a new state of the art for fully automated proof synthesis tools for Coq. We also evaluate Cobblestone in a setting where it is given external partial proof progress from oracles, serving as proxies for a human proof engineer or another tool. When the theorem is broken down into a set of subgoals and Cobblestone is given a set of relevant lemmas already proven in the project, it can prove up to 58% of the theorems. We qualitatively study the theorems Cobblestone is and is not able to prove to outline potential future research directions to further improve proof synthesis, including developing interactive, semi-automated tools. Our research shows that tools can make better use of partial progress made during proof synthesis to more effectively automate formal verification.

Download

QEDCartographer: Automating Formal Verification Using Reward-Free Reinforcement Learning

August 2024

·

3 Reads

Formal verification is a promising method for producing reliable software, but the difficulty of manually writing verification proofs severely limits its utility in practice. Recent methods have automated some proof synthesis by guiding a search through the proof space using a theorem prover. Unfortunately, the theorem prover provides only the crudest estimate of progress, resulting in effectively undirected search. To address this problem, we create QEDCartographer, an automated proof-synthesis tool that combines supervised and reinforcement learning to more effectively explore the proof space. QEDCartographer incorporates the proofs' branching structure, enabling reward-free search and overcoming the sparse reward problem inherent to formal verification. We evaluate QEDCartographer using the CoqGym benchmark of 68.5K theorems from 124 open-source Coq projects. QEDCartographer fully automatically proves 21.4% of the test-set theorems. Previous search-based proof-synthesis tools Tok, Tac, ASTactic, Passport, and Proverbot9001, which rely only on supervised learning, prove 9.6%, 9.8%, 10.9%, 12.5%, and 19.8%, respectively. Diva, which combines 62 tools, proves 19.2%. Comparing to the most effective prior tool, Proverbot9001, QEDCartographer produces 26% shorter proofs 27% faster, on average over the theorems both tools prove. Together, QEDCartographer and non-learning-based CoqHammer prove 31.8% of the theorems, while CoqHammer alone proves 26.6%. Our work demonstrates that reinforcement learning is a fruitful research direction for improving proof-synthesis tools' search mechanisms.




My Model is Unfair, Do People Even Care? Visual Design Affects Trust and Perceived Bias in Machine Learning

October 2023

·

13 Reads

·

10 Citations

IEEE Transactions on Visualization and Computer Graphics

Machine learning technology has become ubiquitous, but, unfortunately, often exhibits bias. As a consequence, disparate stakeholders need to interact with and make informed decisions about using machine learning models in everyday systems. Visualization technology can support stakeholders in understanding and evaluating trade-offs between, for example, accuracy and fairness of models. This paper aims to empirically answer “Can visualization design choices affect a stakeholder”s perception of model bias, trust in a model, and willingness to adopt a model?' Through a series of controlled, crowd-sourced experiments with more than 1,500 participants, we identify a set of strategies people follow in deciding which models to trust. Our results show that men and women prioritize fairness and performance differently and that visual design choices significantly affect that prioritization. For example, women trust fairer models more often than men do, participants value fairness more when it is explained using text than as a bar chart, and being explicitly told a model is biased has a bigger impact than showing past biased performance. We test the generalizability of our results by comparing the effect of multiple textual and visual design choices and offer potential explanations of the cognitive mechanisms behind the difference in fairness perception and trust. Our research guides design considerations to support future work developing visualization systems for machine learning.


Figure 2: An example question using the bar chart representation.
My Model is Unfair, Do People Even Care? Visual Design Affects Trust and Perceived Bias in Machine Learning

August 2023

·

31 Reads

Machine learning technology has become ubiquitous, but, unfortunately, often exhibits bias. As a consequence, disparate stakeholders need to interact with and make informed decisions about using machine learning models in everyday systems. Visualization technology can support stakeholders in understanding and evaluating trade-offs between, for example, accuracy and fairness of models. This paper aims to empirically answer "Can visualization design choices affect a stakeholder's perception of model bias, trust in a model, and willingness to adopt a model?" Through a series of controlled, crowd-sourced experiments with more than 1,500 participants, we identify a set of strategies people follow in deciding which models to trust. Our results show that men and women prioritize fairness and performance differently and that visual design choices significantly affect that prioritization. For example, women trust fairer models more often than men do, participants value fairness more when it is explained using text than as a bar chart, and being explicitly told a model is biased has a bigger impact than showing past biased performance. We test the generalizability of our results by comparing the effect of multiple textual and visual design choices and offer potential explanations of the cognitive mechanisms behind the difference in fairness perception and trust. Our research guides design considerations to support future work developing visualization systems for machine learning.





Citations (82)


... Thus far, many of the solutions data scientists have produced to mitigate the risks of algorithmic bias have been technical or procedural in nature, including debiasing techniques which focus on ensuring datasets are either more representative of their target population, or relying on achieving statistical parity when comparing the outcomes of different groups based on protected characteristics (Galhotra et al, 2017). Other approaches have included finding new ways to operationalize the concept of fairness within a statistical framework, allowing data scientists to better perform statistical checks on their models (Bellamy et al, 2018). ...

Reference:

Practitioner Interventions in Data Power
Fairness Testing: Testing Software for Discrimination
  • Citing Preprint
  • September 2017

... Although developers are open to utilizing patches created by APR techniques, they are skeptical of the quality of auto-generated patches [15,55,70]. As a result, they still need to manually comprehend, validate, and compare candidate patches to answer the question: why is this patch correct? ...

Automated Program Repair, What Is It Good For? Not Absolutely Nothing!
  • Citing Conference Paper
  • April 2024

... Interactive proof assistants such as Lean de Moura et al. (2015), Isabelle Wenzel et al. (2008), and Coq Barras et al. (1999) enable the formal verification of mathematical proofs and software by leveraging specialized programming languages Avigad (2023); Ringer et al. (2019). Neural theorem proving, which integrates neural language models with interactive proof assistants, has emerged as a promising approach to automating formal reasoning First et al. (2023); Polu & Sutskever (2020b); ; Yang et al. (2023b); . This integration is mutually beneficial: proof assistants enforce formal correctness, while language models assist in proof construction by predicting and suggesting logical steps. ...

Baldur: Whole-Proof Generation and Repair with Large Language Models
  • Citing Conference Paper
  • November 2023

... Other authors have explored the effects of visualization in the context of ML fairness and bias assessment (Yan et al. 2024;Gaba et al. 2024;van Berkel et al. 2021;Yuan et al. 2024. Yan et al. (2024 explored the effects of visualization in the context of AI education and found that visualization helped with concept understanding and interaction improved the understanding of metrics. ...

My Model is Unfair, Do People Even Care? Visual Design Affects Trust and Perceived Bias in Machine Learning
  • Citing Article
  • October 2023

IEEE Transactions on Visualization and Computer Graphics

... By definition, Ochiai(e) > 0 if and only if e ∈ E F (Equation (1)). • IRFL: We employ an unsupervised statement-level IRFL technique, Blues, which was proposed in a recent APR study [31] and builds upon on BLUiR [32], an unsupervised file-level IRFL technique. We directly utilise the pre-computed Blues suspiciousness scores for the Defects4J bugs available in their replication package. ...

Better Automatic Program Repair by Using Bug Reports and Tests Together
  • Citing Conference Paper
  • May 2023

... Formal languages can be extended to be more expressive, to capture privacy properties [83], data-based properties [59], [60], fairness properties [12], [27], among others. Some of these kinds of properties can be automatically verified probabilistically [4], [29], [33], [53], [81]. ...

Seldonian Toolkit: Building Software with Safe and Fair Machine Learning
  • Citing Conference Paper
  • May 2023

... This result is backed up in Section 4. The setup of CoqPilot consists of a list of model parameters for each of the chosen services. 1 Language Server Protocol: https://microsoft.github.io/language-server-protocol/ While implementing the described approach, we encountered several difficulties that affected the CoqPilot's final architecture. ...

PRoofster: Automated Formal Verification
  • Citing Conference Paper
  • May 2023

... Search-Based Software Testing (SBST) techniques [8,11,23,25,28,42,52,60], exemplified by tools like EvoSuite [12], aim to improve the effectiveness of test generation but still struggle with large search spaces and high computational costs [30]. Recently, deep learning-based approaches, such as AthenaTest [45], have emerged, utilizing neural models to generate more diverse test inputs and better capture the functional intent of code [4,11,46,54,59]. ...

Avgust: A Tool for Generating Usage-Based Tests from Videos of App Executions
  • Citing Conference Paper
  • May 2023

... For example, the code required to verify CompCert is 8 times as large as the code implementing its functionality [47]. A promising line of work towards automating formal verification is to automatically generate the proofs using machine learning techniques [10], [24], [25], [45], [73], [74]. Within this research space, there has been a recent and exciting line of work on exploring large language models (LLMs) for proof generation [32], [36], [37], [70], [91]. ...

Passport: Improving Automated Formal Verification Using Identifiers
  • Citing Article
  • April 2023

ACM Transactions on Programming Languages and Systems

... Due to ML, computers can recognize patterns and make intelligent decisions without requiring explicit programming (Alves et al., 2023). The following three fundamental elements make up an ML algorithm: representation, evaluation, and optimization (Johnson et al., 2023). Current study utilizes two categories of ML, as illustrated in Fig. 2. ...

Fairkit, Fairkit, on the Wall, Who’s the Fairest of Them All? Supporting Fairness-Related Decision-Making
  • Citing Article
  • March 2023

EURO Journal on Decision Processes