A Framework for Integrating Deep and Shallow Semantic Structures in Text Mining.
ABSTRACT Recent work in knowledge representation undertaken as part of the Semantic Web initiative has enabled a common infrastructure (Resource De- scription Framework (RDF) and RDF Schema) for sharing knowledge of on- tologies and instances. In this paper we present a framework for combining the shallow levels of semantic description commonly used in MUC-style informa- tion extraction with the deeper semantic structures available in such ontologies. The framework is implemented within the PIA project software called Ontol- ogy Forge. Ontology Forge offers a server-based hosting environmentfor ontolo- gies, a server-side information extraction system for reducing the effort of writ- ing annotations and a many-featured ontology/annotation editor. We discuss the knowledge framework, some features of the system and summarize results from extended named entity experiments designed to capture instances in texts using support vector machine software.
- SourceAvailable from: upenn.edu[Show abstract] [Hide abstract]
ABSTRACT: Abstract This paper introduces an approach,to sentiment analysis which,uses,support,vector machines (SVMs) to bring together diverse sources of po- tentially pertinent information, including several fa- vorability measures for phrases and adjectives and, where available, knowledge of the topic of the text. Models using the features introduced are fur- ther combined,with unigram,models,which have been shown,to be effective in the past (Pang et al., 2002) and lemmatized versions of the unigram models. Experiments on movie review data from Epinions.com demonstrate that hybrid SVMs which combine,unigram-style feature-based SVMs with those based on real-valued favorability measures obtain superior performance, producing the best re- sults yet published using this data. Further experi- ments using a feature set enriched with topic infor- mation on a smaller dataset of music reviews hand- annotated for topic are also reported, the results of which suggest that incorporating topic information into such models may also yield improvement.Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing , EMNLP 2004, A meeting of SIGDAT, a Special Interest Group of the ACL, held in conjunction with ACL 2004, 25-26 July 2004, Barcelona, Spain; 01/2004
- [Show abstract] [Hide abstract]
ABSTRACT: Text Mining is a relatively new area of research, very interesting for both computational linguists and data miners. It involves collecting and analyzing quantities of textual data by domain experts, whose main task is the manual revision of markup. We describe a suite of tools used to simplify the process: the Parmenides System that consists of data warehouse, ontology, semi-automatic information extraction and data mining tools. Here we focus on the Annotation Editor which incorporates linguistic tools that initialize the markup automatically.
- [Show abstract] [Hide abstract]
ABSTRACT: Text mining has an important role to play in aiding ex-perts to construct domain specific ontologies by highlighting the im-portant classes, properties and relations that occur within large text collections. In this paper we propose a systematic framework for dis-covery of ontological types using typing information complemented with statistical filtering. Preliminary experiments are conducted on three corpora in the domain of molecular biology and results show that the top level types we obtain closely meet the intuitions and ex-pectations of domain experts.