Andrew Arnold

Andrew Arnold
Carnegie Mellon University | CMU · Machine Learning Department

About

37
Publications
5,766
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
979
Citations
Citations since 2017
20 Research Items
541 Citations
2017201820192020202120222023020406080100120140
2017201820192020202120222023020406080100120140
2017201820192020202120222023020406080100120140
2017201820192020202120222023020406080100120140

Publications

Publications (37)
Preprint
Full-text available
Large language models have achieved high performance on various question answering (QA) benchmarks, but the explainability of their output remains elusive. Structured explanations, called entailment trees, were recently suggested as a way to explain and inspect a QA system's answer. In order to better generate such entailment trees, we propose an a...
Preprint
Full-text available
People frequently interact with information retrieval (IR) systems, however, IR models exhibit biases and discrimination towards various demographics. The in-processing fair ranking methods provide a trade-offs between accuracy and fairness through adding a fairness-related regularization term in the loss function. However, there haven't been intui...
Preprint
Full-text available
Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks. However, such models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency. To alleviate this issue, we propose to jointly distill and quantize the model...
Preprint
Full-text available
Recently, prompt-based learning for pre-trained language models has succeeded in few-shot Named Entity Recognition (NER) by exploiting prompts as task guidance to increase label efficiency. However, previous prompt-based methods for few-shot NER have limitations such as a higher computational complexity, poor zero-shot ability, requiring manual pro...
Preprint
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fine-tuned for various downstream tasks. However, when deployed in the real world, a PTLM-based model must deal with data from a new domain that deviates from what the PTLM was initially trained on, or newly emerged data that contains out-of-distributio...
Preprint
Full-text available
Pretrained Language Models (PLM) have established a new paradigm through learning informative contextualized representations on large-scale text corpus. This new paradigm has revolutionized the entire field of natural language processing, and set the new state-of-the-art performance for a wide variety of NLP tasks. However, though PLMs could store...
Preprint
Full-text available
The performance of state-of-the-art neural rankers can deteriorate substantially when exposed to noisy inputs or applied to a new domain. In this paper, we present a novel method for fine-tuning neural rankers that can significantly improve their robustness to out-of-domain data and query perturbations. Specifically, a contrastive loss that compare...
Preprint
Full-text available
A commonly observed problem with the state-of-the art abstractive summarization models is that the generated summaries can be factually inconsistent with the input documents. The fact that automatic summarization may produce plausible-sounding yet inaccurate summaries is a major concern that limits its wide application. In this paper we present an...
Article
The SLIF project combines text-mining and image processing to extract structured information from biomedical literature.SLIF extracts images and their captions from published papers. The captions are automatically parsed for relevant biological entities (protein and cell type names), while the images are classified according to their type (e.g., mi...
Article
Full-text available
Slif uses a combination of text-mining and image processing to extract information from figures in the biomedical literature. It also uses innovative extensions to traditional latent topic modeling to provide new ways to traverse the literature. Slif provides a publicly available searchable database (http://slif.cbi.cmu.edu). Slif originally focuse...
Conference Paper
In this paper we explore the usefulness of various types of publication-related metadata, such as citation networks and curated databases, for the task of identifying genes in academic biomedical publications. Specifically, we examine whether knowing something about which genes an author has previously written about, combined with information about...
Conference Paper
In this work we try to bridge the gap often encountered by researchers who nd themselves with few or no labeled examples from their desired target domain, yet still have ac- cess to large amounts of labeled data from other related, but distinct source domains, and seemingly no way to transfer knowledge from one to the other. Experimentally, we focu...
Conference Paper
Full-text available
Many ranking models have been proposed in information re- trieval, and recently machine learning techniques have also been applied to ranking model construction. Most of the existing meth- ods do not take into consideration the fact that significant di er- ences exist between queries, and only resort to a single function in ranking of documents. In...
Conference Paper
We present a novel hierarchical prior struc- ture for supervised transfer learning in named entity recognition, motivated by the common structure of feature spaces for this task across natural language data sets. The problem of transfer learning, where information gained in one learning task is used to improve perfor- mance in another related task,...
Conference Paper
The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. While previous work has studied the supervised version of this problem, we study the more challenging case of unsupervised transductive transfer learning, where no labeled data...
Conference Paper
Full-text available
The need for mining causality, beyond mere statistical correla- tions, for real world problems has been recognized widely. Many of these applications naturally involve temporal data, which raises the challenge of how best to leverage the temporal information for causal modeling. Recently graphical modeling with the concept of "Granger causality", b...
Article
Full-text available
Automated learning environments collect large amounts of information on the activities of their students. Unfortunately, analyzing and interpreting these data manually can be tedious and requires substantial training and skill. Although automatic techniques do exist for mining data, the results are often hard to interpret or incorporate into existi...
Article
Full-text available
Students in two classes in the fall of 2004 making extensive use of online courseware were logged as they visited over 500 different "learning pages" which varied in length and in difficulty. We computed the time spent on each page by each student during each session they were logged in. We then modeled the time spent for a particular visit as a fu...
Article
In this paper we examine the problem of domain adap-tation for protein name extraction. First we define the gen-eral problem of transfer learning and the particular sub-problem of domain adaptation. We then describe some cur-rent state of the art supervised and transductive approaches involving support vector machines and maximum entropy models. Us...
Article
Full-text available
Feature induction is used to reduce the complexity of the model search space of a Bayes network. The Bayes net is used to model student behavior in an on-line course. Specifically, the frequency of student self-assessments is used to predict quiz performance. By moving most of the search from the model space to the feature space, prior knowledge an...
Article
Full-text available
We present a method for predicting disk access response times given a trace of previous disk activity using a linear regression model. We build two models and corresponding features to de- scribe difierent situations, the request-based model, in which we have all the information about the prior requests, and trace-based model, in which we predict t...
Article
The need for modeling causality, beyond mere statistical correlations, for meaningful application of data mining to real world problems has been recognized widely. The frame- work of Bayesian networks, along with the related causal networks, is well suited for the modeling of causal structure, and its applicability to various application domains ha...

Network

Cited By