Jude W. Shavlik’s research while affiliated with University of Wisconsin–Madison and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (251)


Detecting Semantic Uncertainty by Learning Hedge Cues in Sentences Using an HMM: Natural Language Processing and Beyond
  • Chapter

November 2017

·

25 Reads

Xiujun Li

·

Wei Gao

·

Jude W. Shavlik

Detecting speculative assertions is essential to distinguish semantically uncertain information from the factual ones in text. This is critical to the trustworthiness of many intelligent systems that are based on information retrieval and natural language processing techniques, such as question answering or information extraction. We empirically explore three fundamental issues of uncertainty detection: (1) the predictive ability of different learning methods on this task; (2) whether using unlabeled data can lead to a more accurate model; and (3) whether closed-domain training or cross-domain training is better. For these purposes, we adopt two statistical learning approaches to this problem: the commonly used bag-of-words model based on Naive Bayes, and the sequence labeling approach using a Hidden Markov Model (HMM). We empirically compare between our two approaches as well as externally compare with prior results on the CoNLL-2010 Shared Task 1. Overall, our results are promising: (1) on Wikipedia and biomedical datasets, the HMM model improves over Naive Bayes up to 17.4% and 29.0%, respectively, in terms of absolute F score; (2) compared to CoNLL-2010 systems, our best HMM model achieves 62.9% F score with MLE parameter estimation and 64.0% with EM parameter estimation on Wikipedia dataset, both outperforming the best result (60.2%) of the CoNLL-2010 systems, but our results on the biomedical dataset are less impressive; (3) when the expression ability of a model (e.g., Naive Bayes) is not strong enough, cross-domain training is helpful, and when a model is powerful (e.g., HMM), cross-domain training may produce biased parameters; and (4) under Maximum Likelihood Estimation, combining the unlabeled examples with the labeled helps. © 2018 by World Scientific Publishing Co. Pte. Ltd. All rights reserved.


Learning Relational Dependency Networks for Relation Extraction

July 2017

·

14 Reads

·

7 Citations

Lecture Notes in Computer Science

We consider the task of KBP slot filling – extracting relation information from newswire documents for knowledge base construction. We present our pipeline, which employs Relational Dependency Networks (RDNs) to learn linguistic patterns for relation extraction. Additionally, we demonstrate how several components such as weak supervision, word2vec features, joint learning and the use of human advice, can be incorporated in this relational framework. We evaluate the different components in the benchmark KBP 2015 task and show that RDNs effectively model a diverse set of features and perform competitively with current state-of-the-art relation extraction methods.


Learning Relational Dependency Networks for Relation Extraction

July 2016

·

22 Reads

·

5 Citations

We consider the task of KBP slot filling -- extracting relation information from newswire documents for knowledge base construction. We present our pipeline, which employs Relational Dependency Networks (RDNs) to learn linguistic patterns for relation extraction. Additionally, we demonstrate how several components such as weak supervision, word2vec features, joint learning and the use of human advice, can be incorporated in this relational framework. We evaluate the different components in the benchmark KBP 2015 task and show that RDNs effectively model a diverse set of features and perform competitively with current state-of-the-art relation extraction.


Effectively Creating Weakly Labeled Training Examples via Approximate Domain Knowledge

December 2015

·

17 Reads

·

7 Citations

Lecture Notes in Computer Science

·

Jose Picado

·

·

[...]

·

Jude Shavlik

One of the challenges to information extraction is the requirement of human annotated examples, commonly called gold-standard examples. Many successful approaches alleviate this problem by employing some form of distant supervision, i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision methods rely on a hand-coded background knowledge that explicitly looks for patterns in text. For example, they assume all sentences containing Person X and Person Y are positive examples of the relation married(X, Y). In this work, we take a different approach – we infer weakly supervised examples for relations from models learned by using knowledge outside the natural language task. We argue that this method creates more robust examples that are particularly useful when learning the entire information-extraction model (the structure and parameters). We demonstrate on three domains that this form of weak supervision yields superior results when learning structure compared to using distant supervision labels or a smaller set of gold-standard labels.


What About Statistical Relational Learning?

December 2015

·

43 Reads

·

1 Citation

Communications of the ACM

This chapter presents background on SRL models on which our work is based on. We start with a brief technical background on first-order logic and graphical models. In Sect. 2.2, we present an overview of SRL models followed by details on two popular SRL models. We then present the learning challenges in these models and the approaches taken to solve them in literature. In Sect. 2.3.3, we present functional-gradient boosting, an ensemble approach, which forms the basis of our learning approaches. Finally, We present details about the evaluation metrics and datasets we used.


Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems

August 2015

·

63 Reads

·

16 Citations

While the use of machine learning methods in clinical decision support has great potential for improving patient care, acquiring standardized, complete, and sufficient training data presents a major challenge for methods relying exclusively on machine learning techniques. Domain experts possess knowledge that can address these challenges and guide model development. We present Advice-Based-Learning (ABLe), a framework for incorporating expert clinical knowledge into machine learning models, and show results for an example task: estimating the probability of malignancy following a non-definitive breast core needle biopsy. By applying ABLe to this task, we demonstrate a statistically significant improvement in specificity (24.0% with p=0.004) without missing a single malignancy.


Gradient-based boosting for statistical relational learning: the Markov logic network and missing data cases

July 2015

·

34 Reads

·

44 Citations

Machine Learning

Recent years have seen a surge of interest in Statistical Relational Learning (SRL) models that combine logic with probabilities. One prominent and highly expressive SRL model is Markov Logic Networks (MLNs), but the expressivity comes at the cost of learning complexity. Most of the current methods for learning MLN structure follow a two-step approach where first they search through the space of possible clauses (i.e. structures) and then learn weights via gradient descent for these clauses. We present a functional-gradient boosting algorithm to learn both the weights (in closed form) and the structure of the MLN simultaneously. Moreover most of the learning approaches for SRL apply the closed-world assumption, i.e., whatever is not observed is assumed to be false in the world. We attempt to open this assumption. We extend our algorithm for MLN structure learning to handle missing data by using an EM-based approach and show this algorithm can also be used to learn Relational Dependency Networks and relational policies. Our results in many domains demonstrate that our approach can effectively learn MLNs even in the presence of missing data.


Fig. 1. 
Fig. 2.
Fig. 3.
Fig. 4. 
Support Vector Machines for Differential Prediction
  • Conference Paper
  • Full-text available

September 2014

·

145 Reads

·

45 Citations

Lecture Notes in Computer Science

Machine learning is continually being applied to a growing set of fields, including the social sciences, business, and medicine. Some fields present problems that are not easily addressed using standard machine learning approaches and, in particular, there is growing interest in differential prediction. In this type of task we are interested in producing a classifier that specifically characterizes a subgroup of interest by maximizing the difference in predictive performance for some outcome between subgroups in a population. We discuss adapting maximum margin classifiers for differential prediction. We first introduce multiple approaches that do not affect the key properties of maximum margin classifiers, but which also do not directly attempt to optimize a standard measure of differential prediction. We next propose a model that directly optimizes a standard measure in this field, the uplift measure. We evaluate our models on real data from two medical applications and show excellent results.

Download

Elementary

September 2014

·

22 Reads

·

47 Citations

Researchers have approached knowledge-base construction (KBC) with a wide range of data resources and techniques. The authors present Elementary, a prototype KBC system that is able to combine diverse resources and different KBC techniques via machine learning and statistical inference to construct knowledge bases. Using Elementary, they have implemented a solution to the TAC-KBP challenge with quality comparable to the state of the art, as well as an end-to-end online demonstration that automatically and continuously enriches Wikipedia with structured data by reading millions of webpages on a daily basis. The authors describe several challenges and their solutions in designing, implementing, and deploying Elementary. In particular, the authors first describe the conceptual framework and architecture of Elementary to integrate different data resources and KBC techniques in a principled manner. They then discuss how they address scalability challenges to enable Web-scale deployment. The authors empirically show that this decomposition-based inference approach achieves higher performance than prior inference approaches. To validate the effectiveness of Elementary?s approach to KBC, they experimentally show that its ability to incorporate diverse signals has positive impacts on KBC quality.


Relational One-Class Classification: A Non-Parametric Approach

June 2014

·

22 Reads

·

18 Citations

Proceedings of the AAAI Conference on Artificial Intelligence

One-class classification approaches have been proposed in the literature to learn classifiers from examples of only one class. But these approaches are not directly applicable to relational domains due to their reliance on a feature vector or a distance measure. We propose a non- parametric relational one-class classification approach based on first-order trees. We learn a tree-based distance measure that iteratively introduces new relational features to differentiate relational examples. We update the distance measure so as to maximize the one-class classification performance of our model. We also relate our model definition to existing work on probabilistic combination functions and density estimation. We experimentally show that our approach can discover relevant features and outperform three baseline approaches. Copyright © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.


Citations (78)


... The results obtained with experiments (E 1 ), (E 2 ) and (E 3 ) confirm findings in the literature regarding the relevance of mass density (Davis et al., 2005(Davis et al., , 2007Ferreira et al., 2011;Woods et al., 2010Woods et al., , 2011, and also show that good classifiers can be obtained to predict outcome (with a high percentage of correctly classified instances and reasonable values of precision and recall, according to F). ...

Reference:

Predicting Malignancy from Mammography Findings and Image-Guided Core Biopsies
Learning a New View of a Database: With an Application in Mammography
  • Citing Chapter
  • August 2007

... The imbalance between the potential number of positive and negative examples makes this a potential problem to approach from the perspective of statistical relational learn-ing (SRL). The authors apply the state-of-the-art statistical relational learning system: BoostSRL 5 based on the relational functional gradient boosting algorithm [13]. Gradi-ent boosted tree learners generally set up the problem in the form of learning a series of regression trees, where each tree is a relatively weak learner that fits toward correcting the error of the previous. ...

Boosted statistical relational learners: From benchmarks to data-driven medicine
  • Citing Chapter
  • January 2014

... This model uses a lot of unlabeled data with a diverse set of linguistic attributes to solve the problem of a small labeled corpus, but it cannot solve the problem of a long context. Ameet Soni et al. [9] introduced a framework for relation extraction based on relational dependency networks (RDNs) that uses many features such as word2vec, collaborative learning, poor supervision, and human advice for learning linguistic patterns. The results show that weak supervision and word2vec did not significantly improve performance. ...

Learning Relational Dependency Networks for Relation Extraction
  • Citing Conference Paper
  • July 2017

Lecture Notes in Computer Science

... Odom et al. (2015a) adapted the preferencesbased advice to extract the mentions of adverse drug events from medical abstracts. Soni et al. (2016) extended this work to a more general relation extraction task from text documents. MacLeod et al. (2016) adapted the class imbalance approach for learning to predict rare diseases from survey data. ...

Learning Relational Dependency Networks for Relation Extraction
  • Citing Article
  • July 2016

... We consider two variations for a corpus – extracting positive sentences from the actual training/testing corpus (CDS) (i.e., newswire documents ) versus using sentences from external data sources (EDS) (e.g., Wikipedia articles). @BULLET Knowledge-based weak supervision (KWS) – Natarajan et al. (2014) showed that we can encode the " world knowledge " of domain experts, who have some inherent rules for identifying positive training examples during manual annotation (e.g., " home teams are more likely to win a game " for a sports corpus). Using these rules, we can automatically generate new positive examples that simulate the human expert's annotations in a training corpus. ...

Effectively Creating Weakly Labeled Training Examples via Approximate Domain Knowledge
  • Citing Chapter
  • December 2015

Lecture Notes in Computer Science

... For Q3, two additional relational methods were implemented and applied to two larger relational datasets, Movielens and YAGO3-10, for comparison. The first method OCC (Khot et al., 2014) is a one-class classifier for relational data that uses a distance metric based on first-order trees. These trees are learned one by one on the relational data points (using Tilde). ...

Relational One-Class Classification: A Non-Parametric Approach
  • Citing Article
  • June 2014

Proceedings of the AAAI Conference on Artificial Intelligence

... Additionally, the technical processing required with machine learning and expertise will require interdisciplinary collaboration of clinicians with engineers, computer scientists, or other related researchers. Clinicians benefit from the use of machine learning, and the performance of machine learning models improves with the addition of expert knowledge from clinicians [69,70]. While machine learning can improve physical activity monitoring in research settings and the market for more affordable open-source accelerometers is increasing, these technical and cost constraints may limit the use of machine learning and accelerometry in clinical practice. ...

Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems
  • Citing Article
  • August 2015