About
37
Publications
5,766
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
979
Citations
Citations since 2017
Publications
Publications (37)
Large language models have achieved high performance on various question answering (QA) benchmarks, but the explainability of their output remains elusive. Structured explanations, called entailment trees, were recently suggested as a way to explain and inspect a QA system's answer. In order to better generate such entailment trees, we propose an a...
People frequently interact with information retrieval (IR) systems, however, IR models exhibit biases and discrimination towards various demographics. The in-processing fair ranking methods provide a trade-offs between accuracy and fairness through adding a fairness-related regularization term in the loss function. However, there haven't been intui...
Large-scale pre-trained sequence-to-sequence models like BART and T5 achieve state-of-the-art performance on many generative NLP tasks. However, such models pose a great challenge in resource-constrained scenarios owing to their large memory requirements and high latency. To alleviate this issue, we propose to jointly distill and quantize the model...
Recently, prompt-based learning for pre-trained language models has succeeded in few-shot Named Entity Recognition (NER) by exploiting prompts as task guidance to increase label efficiency. However, previous prompt-based methods for few-shot NER have limitations such as a higher computational complexity, poor zero-shot ability, requiring manual pro...
Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fine-tuned for various downstream tasks. However, when deployed in the real world, a PTLM-based model must deal with data from a new domain that deviates from what the PTLM was initially trained on, or newly emerged data that contains out-of-distributio...
Pretrained Language Models (PLM) have established a new paradigm through learning informative contextualized representations on large-scale text corpus. This new paradigm has revolutionized the entire field of natural language processing, and set the new state-of-the-art performance for a wide variety of NLP tasks. However, though PLMs could store...
The performance of state-of-the-art neural rankers can deteriorate substantially when exposed to noisy inputs or applied to a new domain. In this paper, we present a novel method for fine-tuning neural rankers that can significantly improve their robustness to out-of-domain data and query perturbations. Specifically, a contrastive loss that compare...
A commonly observed problem with the state-of-the art abstractive summarization models is that the generated summaries can be factually inconsistent with the input documents. The fact that automatic summarization may produce plausible-sounding yet inaccurate summaries is a major concern that limits its wide application. In this paper we present an...
The SLIF project combines text-mining and image processing to extract structured information from biomedical literature.SLIF extracts images and their captions from published papers. The captions are automatically parsed for relevant biological entities (protein and cell type names), while the images are classified according to their type (e.g., mi...
Slif uses a combination of text-mining and image processing to extract information from figures in the biomedical literature. It also uses innovative extensions to traditional latent topic modeling to provide new ways to traverse the literature. Slif provides a publicly available searchable database (http://slif.cbi.cmu.edu).
Slif originally focuse...
In this paper we explore the usefulness of various types of publication-related metadata, such as citation networks and curated databases, for the task of identifying genes in academic biomedical publications. Specifically, we examine whether knowing something about which genes an author has previously written about, combined with information about...
In this work we try to bridge the gap often encountered by researchers who nd themselves with few or no labeled examples from their desired target domain, yet still have ac- cess to large amounts of labeled data from other related, but distinct source domains, and seemingly no way to transfer knowledge from one to the other. Experimentally, we focu...
Many ranking models have been proposed in information re- trieval, and recently machine learning techniques have also been applied to ranking model construction. Most of the existing meth- ods do not take into consideration the fact that significant di er- ences exist between queries, and only resort to a single function in ranking of documents. In...
We present a novel hierarchical prior struc- ture for supervised transfer learning in named entity recognition, motivated by the common structure of feature spaces for this task across natural language data sets. The problem of transfer learning, where information gained in one learning task is used to improve perfor- mance in another related task,...
The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. While previous work has studied the supervised version of this problem, we study the more challenging case of unsupervised transductive transfer learning, where no labeled data...
The need for mining causality, beyond mere statistical correla- tions, for real world problems has been recognized widely. Many of these applications naturally involve temporal data, which raises the challenge of how best to leverage the temporal information for causal modeling. Recently graphical modeling with the concept of "Granger causality", b...
Automated learning environments collect large amounts of information on the activities of their students. Unfortunately, analyzing and interpreting these data manually can be tedious and requires substantial training and skill. Although automatic techniques do exist for mining data, the results are often hard to interpret or incorporate into existi...
Students in two classes in the fall of 2004 making extensive use of online courseware were logged as they visited over 500 different "learning pages" which varied in length and in difficulty. We computed the time spent on each page by each student during each session they were logged in. We then modeled the time spent for a particular visit as a fu...
In this paper we examine the problem of domain adap-tation for protein name extraction. First we define the gen-eral problem of transfer learning and the particular sub-problem of domain adaptation. We then describe some cur-rent state of the art supervised and transductive approaches involving support vector machines and maximum entropy models. Us...
Feature induction is used to reduce the complexity of the model search space of a Bayes network. The Bayes net is used to model student behavior in an on-line course. Specifically, the frequency of student self-assessments is used to predict quiz performance. By moving most of the search from the model space to the feature space, prior knowledge an...
We present a method for predicting disk access response times given a trace of previous disk activity using a linear regression model. We build two models and corresponding features to de- scribe difierent situations, the request-based model, in which we have all the information about the prior requests, and trace-based model, in which we predict t...
The need for modeling causality, beyond mere statistical correlations, for meaningful application of data mining to real world problems has been recognized widely. The frame- work of Bayesian networks, along with the related causal networks, is well suited for the modeling of causal structure, and its applicability to various application domains ha...