Mining knowledge from natural language texts using fuzzy associated concept mapping

Knowledge Management Research Centre, Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hum, Kowloon, Hong Kong
Information Processing & Management 01/2008; DOI: 10.1016/j.ipm.2008.05.002
Source: DBLP

ABSTRACT Natural Language Processing (NLP) techniques have been successfully used to automatically extract information from unstructured text through a detailed analysis of their content, often to satisfy particular information needs. In this paper, an automatic concept map construction technique, Fuzzy Association Concept Mapping (FACM), is proposed for the conversion of abstracted short texts into concept maps. The approach consists of a linguistic module and a recommendation module. The linguistic module is a text mining method that does not require the use to have any prior knowledge about using NLP techniques. It incorporates rule-based reasoning (RBR) and case based reasoning (CBR) for anaphoric resolution. It aims at extracting the propositions in text so as to construct a concept map automatically. The recommendation module is arrived at by adopting fuzzy set theories. It is an interactive process which provides suggestions of propositions for further human refinement of the automatically generated concept maps. The suggested propositions are relationships among the concepts which are not explicitly found in the paragraphs. This technique helps to stimulate individual reflection and generate new knowledge. Evaluation was carried out by using the Science Citation Index (SCI) abstract database and CNET News as test data, which are well known databases and the quality of the text is assured. Experimental results show that the automatically generated concept maps conform to the outputs generated manually by domain experts, since the degree of difference between them is proportionally small. The method provides users with the ability to convert scientific and short texts into a structured format which can be easily processed by computer. Moreover, it provides knowledge workers with extra time to re-think their written text and to view their knowledge from another angle.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Choosing the optimal terms to represent a search engine query is not trivial, and may involve an iterative process such as relevance feedback, repeated unaided attempts by the user or the automatic suggestion of additional terms, which the user may select or reject. This is particularly true of a multimedia search engine which searches on concepts as well as user-input terms, since the user is unlikely to be familiar with all the system-known concepts. We propose three concept suggestion strategies: suggestion by normalised textual matching, by semantic similarity, and by the use of a similarity matrix. We have evaluated these three strategies by comparing machine suggestions with the suggestions produced by professional annotators, using the measures of micro- and macro- precision and recall. The semantic similarity strategy outperformed the use of a similarity matrix at a range of thresholds. Normalised textual matching, which is the simplest strategy, performed almost as well as the semantic similarity one on recall-based measures, and even better on precision-based and F-based measures.
    Computer Science and Information Technology, 2009. IMCSIT '09. International Multiconference on; 11/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: Being more competitive is routine in the aeronautical sector. Airline competitiveness is affected by such factors as time, price, reliability, availability, safety, technology, quality, and information management. To remain competitive, airlines must promptly identify and correct failures found in their fleet. This study aims at reducing the time spent on identifying and correcting such failures logged. Utilizing Text Mining techniques during the pre-processing phase, our study processes an extensive database of events from commercial regional jets. The result is a unique list of keywords that describes each reported failure. Later, an Artificial Neural Network (ANN) identifies and classifies failure patterns, yielding a respective disposition for a given failure pattern. Approximately five years of historical data was used to build and validate the present model. Results obtained were promising.
    Journal of Intelligent Information Systems 06/2012; 38(3). · 0.83 Impact Factor