
Avraham ShinnarIBM · Programming Technologies and Software Engineering
Avraham Shinnar
PhD
About
55
Publications
6,257
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,143
Citations
Citations since 2017
Introduction
Publications
Publications (55)
Bias mitigators can improve algorithmic fairness in machine learning models, but their effect on fairness is often not stable across data splits. A popular approach to train more stable models is ensemble learning, but unfortunately, it is unclear how to combine ensembles with mitigators to best navigate trade-offs between fairness and predictive p...
While artificial intelligence (AI) models have improved at understanding large-scale data, understanding AI models themselves at any scale is difficult. For example, even two models that implement the same network architecture may differ in frameworks, datasets, or even domains. Furthermore, attempting to use either model often requires much manual...
SQL is by far the most widely used and implemented query language. Yet, on some key features, such as correlated queries and NULL value semantics, many implementations diverge or contain bugs. We leverage recent advances in the formalization of SQL and query compilers to develop DBCert, the first mechanically verified compiler from SQL queries writ...
SQL is by far the most widely used and implemented query language. Yet, on some key features, such as correlated queries and NULL value semantics, many implementations diverge or contain bugs. We leverage recent advances in the formalization of SQL and query compilers to develop DBCert, the first mechanically verified compiler from SQL queries writ...
Stochastic approximation algorithms are iterative procedures which are used to approximate a target value in an environment where the target is unknown and direct observations are corrupted by noise. These algorithms are useful, for instance, for root-finding and function minimization when the target function or model is not directly known. Origina...
There are several bias mitigators that can reduce algorithmic bias in machine learning models but, unfortunately, the effect of mitigators on fairness is often not stable when measured across different data splits. A popular approach to train more stable models is ensemble learning. Ensembles, such as bagging, boosting, voting, or stacking, have be...
Reinforcement learning algorithms solve sequential decision-making problems in probabilistic environments by optimizing for long-term reward. The desire to use reinforcement learning in safety-critical settings inspires a recent line of work on formally constrained reinforcement learning; however, these methods place the implementation of the learn...
Automated machine learning makes it easier for data scientists to develop pipelines by searching over possible choices for hyperparameters, algorithms, and even pipeline topologies. Unfortunately, the syntax for automated machine learning tools is inconsistent with manual machine learning, with each other, and with error checks. Furthermore, few to...
JSON is a popular data format used pervasively in web APIs, cloud computing, NoSQL databases, and increasingly also machine learning. JSON Schema is a language for declaring the structure of valid JSON data. There are validators that can decide whether a JSON document is valid with respect to a schema. Unfortunately, like all instance-based testing...
Cloud computing has made the resources needed to execute large-scale in-memory distributed computations widely available. Specialized programming models, e.g., MapReduce, have emerged to offer transparent fault tolerance and fault recovery for specific computational patterns, but they sacrifice generality. In contrast, the Resilient X10 programming...
Machine-learning automation tools, ranging from humble grid-search to hyperopt, auto-sklearn, and TPOT, help explore large search spaces of possible pipelines. Unfortunately, each of these tools has a different syntax for specifying its search space, leading to lack of portability, missed relevant points, and spurious points that are inconsistent w...
Stan is a popular probabilistic programming language with a self-contained syntax and semantics that is close to graphical models. Unfortunately, existing embeddings of Stan in Python use multi-line strings. That approach forces users to switch between two different language styles, with no support for syntax highlighting or simple error reporting...
Chatbots are reactive applications with a conversational interface. They are usually implemented as compositions of client-side components and cloud-hosted services, including artificial-intelligence technology. Unfortunately, programming such reactive multi-tier applications with traditional programming languages is cumbersome. This paper introduc...
There is a paradigm shift in web-based services towards conversational user interfaces. Companies increasingly offer conversational interfaces, or chatbots, to let their customers or employees interact with their services in a more flexible and mobile manner. Unfortunately, this new paradigm faces a major problem, namely toxic content in user input...
There are various probabilistic modeling and inference packages for Python. Unfortunately, they either put probabilistic models in Python strings and thus lack integration benefits, or have leaky abstractions and thus are hard to code or debug. This paper introduces Yaps, which overcomes these issues by reinterpreting Python syntax to give it Stan...
Deep probabilistic programming combines deep neural networks (for automatic hierarchical representation learning) with probabilistic models (for principled handling of uncertainty). Unfortunately, it is difficult to write deep probabilistic models, because existing programming frameworks lack concise, high-level, and clean ways to express them. To...
Machine learning has transformed domains like vision and translation, and is now increasingly used in science, where the correctness of such code is vital. Python is popular for machine learning, in part because of its wealth of machine learning libraries, and is felt to make development faster; however, this dynamic language has less support for e...
We tackle the problem of automatically generating chatbots from Web API specifications using embedded natural language metadata, focusing on the intent classification subtask. One of the main challenges for such a use case comes from the lack of a sufficiently representative training sample for utterance classification, which hinders the traditiona...
Companies want to offer chat bots to their customers and employees which can answer questions, enable self-service, and showcase their products and services. Implementing and maintaining chat bots by hand costs time and money. Companies typically have web APIs for their services, which are often documented with an API specification. This paper pres...
Designing and prototyping new features is important in many industrial projects. Functional programming and formal verification tools can prove valuable for that purpose, but lead to challenges when integrating with existing product code or when planning technology transfer.
This article reports on our experience using the Coq proof assistant as a...
Cloud computing has made the resources needed to execute large-scale in-memory distributed computations widely available. Specialized programming models, e.g., MapReduce, have emerged to offer transparent fault tolerance and fault recovery for specific computational patterns, but they sacrifice generality. In contrast, the Resilient X10 programming...
Algebras based on combinators, i.e., variable-free, have been proposed as a better representation for query compilation and optimization. A key benefit of combinators is that they avoid the need to handle variable shadowing or accidental capture during rewrites. This simplifies both the optimizer specification and its correctness analysis, but the...
We present Q*cert, a platform for the specification, verification, and implementation of query compilers written using the Coq proof assistant. The Q*cert platform is open source and includes some support for SQL and OQL, and for code generation to Spark and Cloudant. It internally relies on familiar database intermediate representations, notably t...
Humans and computers increasingly converse via natural language. Those conversations are moving from today's simple question answering and command-and-control to more complex dialogs. Developers must specify those dialogs. This paper explores how to assist developers in this specification. We map out the staggering variety of applications for human...
In the course of building a compiler from business rules to a database run-time, we encounter the need for a type system that includes a class hierarchy and subtyping in the presence of complex record operations. Since our starting point is based on structural typing and targets a data-centric language, we develop an approach inspired by Wadler’s w...
Businesses that receive events in the form of messages and react to them quickly can take advantage of opportunities and avoid risks as they occur. Since quick reactions are important, event processing middleware is a core technology in many businesses. However, the need to act quickly must be balanced against the need to act profitably, and the be...
X10 is a high-performance, high-productivity programming language aimed at large-scale distributed and shared-memory parallel applications. It is based on the Asynchronous Partitioned Global Address Space (APGAS) programming model, supporting the same fine-grained concurrency mechanisms within and across shared-memory nodes.
We demonstrate that X10...
A software transactional memory system is described which utilizes decomposed software transactional memory instructions as well as runtime optimizations to achieve efficient performance. The decomposed instructions allow a compiler with knowledge of the instruction semantics to perform optimizations which would be unavailable on traditional softwa...
A method for information flow tracking is provided using, for example, a functional programming language based on lambda calculus, λI. The method provides a unified information-tracking framework that supports multiple, interdependent dimensions of information. An expressive policy-specification system is separated from the underlying information-f...
We present a formal small-step structural operational semantics for a large
fragment of X10, unifying past work. The fragment covers multiple places,
mutable objects on the heap, sequencing, \code{try/catch}, \code{async},
\code{finish}, and \code{at} constructs. This model accurately captures the
behavior of a large class of concurrent, multi-plac...
Main Memory Map Reduce (M3R) is a new implementation of the Hadoop Map Reduce
(HMR) API targeted at online analytics on high mean-time-to-failure clusters.
It does not support resilience, and supports only those workloads which can fit
into cluster memory. In return, it can run HMR jobs unchanged -- including jobs
produced by compilers for higher-l...
This dissertation introduces a framework enabling the dynamic verification of expressive specifications. Inspired by formal verification methods, this framework supports assertion , framing, and separation contracts. Assertion contracts specify what code should do, whereas framing contracts specify what code must not do. Separation contracts, inspi...
We report on our experience implementing a lightweight, fully verified relational database management system (RDBMS). The functional specification of RDBMS behavior, RDBMS implementation, and proof that the implementation meets the specification are all written and verified in Coq. Our contributions include: (1) a complete specification of the rela...
This paper presents λ I , a language for dynamic tracking of information flow across multiple, interdependent dimensions of information. Typical dimensions of interest are integrity and confidentiality. λ I supports arbitrary domain-specific policies that can be developed independently. ?I treats information-flow metadata as a first-class entity an...
We present a new approach for constructing and verifying higher-order, imperative programs using the Coq proof assistant. We build on the past work on the Ynot system, which is based on Hoare Type Theory. That original system was a proof of concept, where every program verification was accomplished via laborious manual proofs, with much code devote...
We present a new approach for constructing and verifying higher-order, imperative programs using the Coq proof assistant. We build on the past work on the Ynot system, which is based on Hoare Type Theory. That original system was a proof of concept, where every program verification was accomplished via laborious manual proofs, with much code devote...
This paper presents I, a language for dynamic tracking of information flow across multiple, interdependent dimen- sions of information. Typical dimensions of interest are in- tegrity and confidentiality. I supports arbitrary domain- specific policies that can be developed independently. I treats information-flow metadata as a first-class entity and...
We describe an axiomatic extension to the Coq proof assistant, that supports writing, reasoning about, and extracting higher-order, dependently-typed programs with side-effects . Coq already includes a powerful functional language that supports dependent types, but that language is limited to pure, total functions. The key contribution of our exten...
We describe an axiomatic extension to the Coq proof assistant, that supports writing, reasoning about, and extracting hig her-order, dependently-typed programs with side-effects. Coq already in- cludes a powerful functional language that supports dependent types, but that language is limited to pure, total functions . The key contribution of our ex...
Provenance describes how an object came to be in its present state. Intelligence dossiers, medical records and corporate financial reports capture provenance information. Many of these applications call for security, but existing security models are not up to the task. Provenance is a causality graph with annotations. The causality graph connects t...
Atomic blocks allow programmers to delimit sections of code as 'atomic', leaving the language's implementation to enforce atomicity. Existing work has shown how to implement atomic blocks over word-based transactional memory that provides scalable multi-processor performance without requiring changes to the basic structure of objects in the heap. H...
Most security models are designed to protect data. Some also deal with traditional metadata. Provenance metadata introduces additional complexity, as does the delicate interactions between provenance metadata and the data it describes. We designed a security model for provenance metadata. Our requirements were derived from potential users. The secu...
One of the important tasks of exception handling is to restore program state and invariants. Studies suggest that this is often done incorrectly. We introduce a new language construct that integrates automated memory recovery with exception handling. When an exception occurs, memory can be automatically restored to its previous state. We also provi...
We describe an axiomatic extension to the Coq proof assistant, that supports writing, reasoning about, and extracting higher-order, dependently-typed programs with side-effects. Coq already in-cludes a powerful functional language that supports dependent types, but that language is limited to pure, total functions. The key contribution of our exten...