John Chen’s scientific contributions

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (4)


Cross document person name disambiguation using entity
  • Article

May 2012

·

37 Reads

·

7 Citations

·

John Chen

·

Rohini Srihari

Given an ambiguous person name as input, a cross-document person name disambiguation system clus-ters documents so that each cluster contains all and only those documents referring to the same person. In this paper we present our approach to this task. We introduce novel features based on topic models and also document-level entity profiles—sets of informa-tion that are collected for each ambiguous person in the entire document. We also introduce a modified term frequency-inverse document frequency (TF-IDF) weighting scheme to represent entities in a vector-space model (VSM). Disambiguation is then performed via single-link hierarchical agglomerative clustering. Ex-periments show that an average F-measure of 94.03% is achieved using our proposed enhanced VSM model. This is an improvement over previous best results on the same test corpora.


Unsupervised Russian POS Tagging with Appropriate Context

August 2011

·

49 Reads

Lecture Notes in Computer Science

Li Yang

·

Erik Peterson

·

John Chen

·

[...]

·

Rohini Srihari

While adopting the contextualized hidden Markov model (CHMM) framework for unsupervised Russian POS tagging, we investigate the possibility of utilizing the left, right, and unambiguous context in the CHMM framework. We propose a backoff smoothing method that incorporates all three types of context into the transition probability estimation during the expectation-maximization process. The resulting model with this new method achieves overall and disambiguation accuracies comparable to a CHMM using the classic backoff smoothing method for HMM-based POS tagging from [17]. Keywordsunsupervised Russian part-of-speech tagging–CHMM–left–right–unambiguous context–transition probability–expectation-maximization (EM)


Confidence measures and thresholding in coreference resolution

November 2009

·

7 Reads

This volume brings together revised versions of a selection of papers presented at the Sixth International Conference on “Recent Advances in Natural Language Processing” (RANLP) held in Borovets, Bulgaria, 27–29 September 2007. These papers cover a wide variety of Natural Language Processing (NLP) topics: ontologies, named entity extraction, translation and transliteration, morphology (derivational and inflectional), part-of-speech tagging, parsing (incremental processing, dependency parsing), semantic role labeling, word sense disambiguation, temporal representations, inference and metaphor, semantic similarity, coreference resolution, clustering (topic modeling, topic tracking), summarization, cross-lingual retrieval, lexical and syntactic resources, multi-modal processing. The aim of this volume is to present new results in NLP based on modern theories and methodologies, making it of interest to researchers in NLP and, more specifically, to those who work in Computational Linguistics, Corpus Linguistics, and Machine Translation.


Automatically Extracting Nominal Mentions of Events with a Bootstrapped Probabilistic Classifier

January 2006

·

25 Reads

·

12 Citations

Most approaches to event extraction focus on mentions anchored in verbs. However, many mentions of events surface as noun phrases. Detecting them can increase the recall of event extraction and provide the foundation for detecting relations between events. This paper describes a weakly- supervised method for detecting nominal event mentions that combines techniques from word sense disambiguation (WSD) and lexical acquisition to create a classifier that labels noun phrases as denoting events or non-events. The classifier uses boot- strapped probabilistic generative models of the contexts of events and non-events. The contexts are the lexically-anchored se- mantic dependency relations that the NPs appear in. Our method dramatically im- proves with bootstrapping, and comfort- ably outperforms lexical lookup methods which are based on very much larger hand- crafted resources.

Citations (2)


... There are multiple ways to define the similarity function φ(e m , c t ) based on features, including the name string matching, document surface, entity context, concept, KB link features, profiling, topic, popularity, etc. [246], [50], [2], [38], [37], [27], [147], [216], [207], [102], [245], [2], [26], [168]. In general, if the features have been constructed, some traditional similarity metrics such as cosine or Jaccard can be used. ...

Reference:

Machine Learning with World Knowledge: The Position and Survey
Cross document person name disambiguation using entity
  • Citing Article
  • May 2012

... Although most of these works show a certain awareness of the linguistic distinction between event and result nominalizations, none of them applies this distinction in their systems. The notion of event appears in the work of Creswell et al. (2006), in which a classifier that distinguishes between nominal mentions of events and non-events is presented. Their distinction is not comparable to our event and result distinction for one main reason, however: they do not focus on nominalizations but on nouns in general, and therefore the difficulty in distinguishing events from non-events among all types of nouns is less than distinguishing between event and result nominalizations, which, as has been seen, are highly ambiguous. ...

Automatically Extracting Nominal Mentions of Events with a Bootstrapped Probabilistic Classifier
  • Citing Conference Paper
  • January 2006