Christos Christodoulopoulos

Christos Christodoulopoulos
University of Illinois, Urbana-Champaign | UIUC · Department of Computer Science

PhD

About

43
Publications
10,235
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,007
Citations
Introduction
I'm a postdoc at the Cognitive Computation Group at the University of Illinois Urbana-Champaign. I work with Dan Roth on extending Semantic Role Labeling (SRL) and Cindy Fisher on computational models of language acquisition.
Additional affiliations
September 2013 - present
University of Illinois, Urbana-Champaign
Position
  • PostDoc Position
Education
January 2010 - August 2013
The University of Edinburgh
Field of study
  • Computational Linguistics
September 2007 - September 2008
The University of Edinburgh
Field of study
  • Artificial Intelligence
September 2002 - September 2007
University of Piraeus
Field of study
  • Digital Systems and Technology Education

Publications

Publications (43)
Article
Full-text available
The ability to generalize well is one of the primary desiderata for models of natural language processing (NLP), but what ‘good generalization’ entails and how it should be evaluated is not well understood. In this Analysis we present a taxonomy for characterizing and understanding generalization research in NLP. The proposed taxonomy is based on a...
Preprint
Full-text available
Extracting structured and grounded fact triples from raw text is a fundamental task in Information Extraction (IE). Existing IE datasets are typically collected from Wikipedia articles, using hyperlinks to link entities to the Wikidata knowledge base. However, models trained only on Wikipedia have limitations when applied to web domains, which ofte...
Conference Paper
This paper describes the augmentation of an existing corpus of child-directed speech. The resulting corpus is a gold-standard labeled corpus for supervised learning of semantic role labels in adult-child dialogues. Semantic role labeling (SRL) models assign semantic roles to sentence constituents, thus indicating who has done what to whom (and in w...
Preprint
Full-text available
The ability to generalise well is one of the primary desiderata of natural language processing NLP). Yet, what `good generalisation' entails and how it should be evaluated is not well understood, nor are there any common standards to evaluate it. In this paper, we aim to lay the ground-work to improve both of these issues. We present a taxonomy for...
Preprint
We introduce ReFinED, an efficient end-to-end entity linking model which uses fine-grained entity types and entity descriptions to perform linking. The model performs mention detection, fine-grained entity typing, and entity disambiguation for all mentions within a document in a single forward pass, making it more than 60 times faster than competit...
Preprint
Accurate evidence retrieval is essential for automated fact checking. Little previous research has focused on the differences between true and false claims and how they affect evidence retrieval. This paper shows that, compared with true claims, false claims more frequently contain irrelevant entities which can distract evidence retrieval model. A...
Preprint
Full-text available
Fact verification has attracted a lot of attention in the machine learning and natural language processing communities, as it is one of the key methods for detecting misinformation. Existing large-scale benchmarks for this task have focused mostly on textual sources, i.e. unstructured information, and thus ignored the wealth of information availabl...
Preprint
Automatic unreliable news detection is a research problem with great potential impact. Recently, several papers have shown promising results on large-scale news datasets with models that only use the article itself without resorting to any fact-checking mechanism or retrieving any supporting evidence. In this work, we take a closer look at these da...
Preprint
Full-text available
The task of Natural Language Inference (NLI) is widely modeled as supervised sentence pair classification. While there has been a lot of work recently on generating explanations of the predictions of classifiers on a single piece of text, there have been no attempts to generate explanations of classifiers operating on pairs of sentences. In this pa...
Preprint
We present the results of the first Fact Extraction and VERification (FEVER) Shared Task. The task challenged participants to classify whether human-written factoid claims could be Supported or Refuted using evidence retrieved from Wikipedia. We received entries from 23 competing teams, 19 of which scored higher than the previously published baseli...
Conference Paper
Full-text available
In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as SUPPORT...
Article
Full-text available
Knowledge-based question answering relies on the availability of facts, the majority of which cannot be found in structured sources (e.g. Wikipedia info-boxes, Wikidata). One of the major components of extracting facts from unstructured text is Relation Extraction (RE). In this paper we propose a novel method for creating distant (weak) supervision...
Article
Full-text available
Unlike other tasks and despite recent interest, research in textual claim verification has been hindered by the lack of large-scale manually annotated datasets. In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,441 claims generated by alt...
Article
Full-text available
Many real world systems need to operate on heterogeneous information networks that consist of numerous interacting components of different types. Examples include systems that perform data analysis on biological information networks; social networks; and information extraction systems processing unstructured data to convert raw text to knowledge gr...
Article
Full-text available
We introduce a method for transliteration generation that can produce transliterations in every language. Where previous results are only as multilingual as Wikipedia, we show how to use training data from Wikipedia as surrogate training for any language. Thus, the problem becomes one of ranking Wikipedia languages in order of suitability with resp...
Article
Commas and the surrounding sentence structure often express relations that are essential to understanding the meaning of the sentence. This paper proposes a set of relations commas participate in, expanding on previous work in this area, and develops a new dataset annotated with this set of labels. We identify features that are important to achieve...
Conference Paper
Full-text available
Commas and the surrounding sentence structure often express relations that are essential to understanding the meaning of the sentence. This paper proposes a set of relations commas participate in, expanding on previous work in this area, and develops a new dataset annotated with this set of labels. We identify features that are important to achieve...
Conference Paper
Full-text available
Most events described in a news article are background events ‐ only a small number are noteworthy, and a even smaller number serve as the trigger for writing of that article. Although these events are difficult to identify, they are crucial to NLP tasks such as first story detection, document summarization and event coreference, and to many applic...
Conference Paper
Full-text available
Syntactic bootstrapping is the hypothesis that learners can use the preliminary syntactic structure of a sentence to identify and characterise the meanings of novel verbs. Previous work has shown that syntactic bootstrapping can begin using only a few seed nouns (Connor et al., 2010; Connor et al., 2012). Here, we relax their key assumption: rather...
Conference Paper
Full-text available
Nearly all work in unsupervised grammar induction aims to induce unlabeled dependency trees from gold part-of-speech- tagged text. These clean linguistic classes provide a very important, though unrealistic, inductive bias. Conversely, induced clusters are very noisy. We show here, for the first time, that very limited human supervision (three freq...
Article
Full-text available
We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Finally we present a statistical analysis of the corpora collected and a detailed comparis...
Conference Paper
Full-text available
Statistical parsers trained on labeled data suffer from sparsity, both grammatical and lexical. For parsers based on strongly lexicalized grammar formalisms (such as CCG, which has complex lexical categories but simple combinatory rules), the problem of sparsity can be isolated to the lexicon. In this paper, we show that semi-supervised Viterbi-EM...
Thesis
Computational approaches to linguistic analysis have been used for more than half a century. The main tools come from the field of Natural Language Processing (NLP) and are based on rule-based or corpora-based (supervised) methods. Despite the undeniable success of supervised learning methods in NLP, they have two main drawbacks: on the practical s...
Conference Paper
Full-text available
Most unsupervised dependency systems rely on gold-standard Part-of-Speech (PoS) tags, either directly, using the PoS tags instead of words, or indirectly in the back-off mechanism of fully lexicalized models (Headden et al., 2009).
Conference Paper
Full-text available
In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type...
Conference Paper
Full-text available
Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP. Many different methods have been proposed, yet comparisons are difficult to make since there is little consensus on evaluation framework, and many papers evaluate against only one or two competitor systems. Here we evaluate seven different POS induction...
Article
Full-text available
Group formation in e-learning environments is a chal-lenging area. In this paper we present a web-based group formation tool that supports the instructor to create both homogeneous and heterogeneous groups based on up to three criteria and the learner to negotiate the grouping. Moreover, the instructor is allowed to manually group stu-dents based o...
Conference Paper
Full-text available
In this paper we present a Web-based group formation tool that supports the instructor to automatically create both homogeneous and heterogeneous groups based on up to three criteria and the learner to negotiate the grouping. Moreover, the instructor is allowed to manually group learners based on specific criteria. A discriminative feature of this...
Conference Paper
Full-text available
This paper investigates the problem of target localizing by a communicating robotic swarm in an unknown environment. Robots have to collaboratively search for the target while avoiding obstacles in their way. Emphasis is given on how physical constraints such as obstacles and communication links affect the swarm's operation. Finally, simulation res...
Article
Full-text available
Designing tools that support group formation is a challenging goal for both the areas of adaptive and collaborative e-learning environments. Group formation may be used for a variety of purposes such as for grouping students that could potentially benefit from cooperation based on their individual characteristics or needs, for mediating peer help b...

Network

Cited By