STUDI ANALISIS METODE-METODE PARSING DAN INTERPRETASI SEMANTIK PADA NATURAL LANGUAGE PROCESSING

Article · January 2004with270 Reads
Source: OAI
Abstract
Three main processes in Natural Language Processing are syntax analysis or parsing, semantic interpretation and contextual interpretation. This paper discuss about the first and the second of these processes. Parsing is the recognition of the sentence structure based on a grammar and a lexicon. Parsing can be done in either top-down or bottom-up methods, each has its own advantages and disadvantages. Top-down parsers can not handle grammar with left-recursion, where bottom-up parsers can not handle grammar with empty production. The best parsers combine these two approaches. Semantic interpretation is the process of mapping a sentence into its context-independent meaning representation called logical form. There are two processes needed in building logical form, the first is to identify the semantic roles that each word and phrase plays in the sentence, the second is to choose the correct sense of each word to build a plausible sentence, which called word-sense disambiguation. The semantic roles may be represented using predicate-argument relations or using the thematic roles, word-sense disambiguation can be done by selectional restrictions or by context-activation. Abstract in Bahasa Indonesia : Tiga proses utama yang dilakukan dalam pengolahan bahasa alami ialah analisa sintaksis, interpretasi semantik dan interpretasi kontekstual. Makalah ini akan membahas mengenai proses yang pertama dan kedua. Analisa sintaksis atau parsing ialah proses penentuan struktur sebuah kalimat berdasarkan grammar dan lexicon tertentu. Parsing dapat dilakukan secara top-down maupun bottom-up, masing-masing memiliki kelebihan dan kekurangannya sendiri. Top-down parsing tidak dapat menangani grammar dengan left-recursion, sedangkan bottom-up parsing tidak dapat menangani grammar dengan empty production. Karena itu metode parsing yang terbaik ialah yang dapat menggabungkan kedua cara ini. Interpretasi semantik ialah proses penerjemahan sebuah kalimat menjadi bentuk representasi artinya yang umum disebut logical form tanpa memperhatikan konteks. Dua proses utama yang diperlukan dalam membentuk logical form ialah penentuan peran tiap kata dan frase dalam kalimat, serta pemilihan arti kata yang tepat untuk membentuk kalimat yang masuk akal. Peranan kata-kata dan frase dalam kalimat dapat direpresentasikan dalam bentuk predikat-argumen biasa ataupun menggunakan thematic roles. Sedangkan proses pemilihan arti kata yang tepat dapat dilakukan dengan selectional restrictions ataupun context activation. Kata kunci: pengolahan bahasa alami, analisa sintaksis, interpretasi semantik, grammar, lexicon.
    • "Suciadi conducting research on syntactic and semantic analysis on interpretation of NLP. Suciadi find that there is no method of parsing are ideal for all kinds of problems in NLP, thematic roles further clarify the role of each element of a sentence, and wordsense hierarchy used by the selectional restrictions by very helpful in doing the process wordsense disambiguation [5]. Margaretha, et al., found that the Latent Semantic Analysis (LSA) are able to understand the various semantic information with relatively little data on training [6]. "
    [Show abstract] [Hide abstract] ABSTRACT: Document image recognition can be used to help translate ancient documents written in javanese character. If the ancient documents mentioned written in latin character, it can be read by young people in Indonesia today for various purposes and in an effort to help preserve the rich culture especially in Javanese literature. One of the problems in the document image recognition for Javanese literature is how can make words from syllabel sequence which are result of Javanese character recognition into the correct words in the Javanese language rules, because there are not space in the rules of the Javanese character written.This paper describes Widiarti-Winarko algorithm that can be used to grouping syllables Javanese language combined with the ability of lucene as a software to create a dictionary of Javanese words. The dictionary used to check whether the output words are the correct form. Results from the test in the output of document image recognition in the two pages Hamong Tani book, with the source data dictionary maker from all of pages Hamong Tani book, the system gave a words found by the truth of the formation of words in context sentences of 62.96%, and 75% found the word correctly in the Javanese language. By looking at the magnitude of the percentage of the truth of the formation of words in context sentences are still below 70%, it still needs to be improvements in the algorithm.
    Full-text · Conference Paper · Jul 2012
  • Conference Paper · Aug 2015
  • [Show abstract] [Hide abstract] ABSTRACT: In Indonesia, patient complaints are recorded in the form of free-text data or a narrative text by the doctor when taking the medical history or conducting the medical interview. This text, although recorded in electronic medical records (EMR), is difficult to process computationally because the computer does not recognize natural language. The structure of the Indonesian language differs from that of English. Moreover, the language of patient complaints is structured differently from the Indonesian language in general. It does not consist of the S-P-O-K (Subject-Predicate-Object-Adverb) structures that are used in Indonesian sentences. Moreover, there is a wide range of local languages in Indonesia. Based on data on patient complaints obtained from physicians, this study develops production rules for mapping patient complaints. The aim of the study is to develop a parsing method that automatically maps patient complaints from an unstructured text into a structured text that can be recognized by the computer. In the parsing process developed in this research, a narrative text that has been split into words and/or separated phrases/clauses is used to conduct a suitability search of the lexicon. The lexicon that exceeded the minimum suitability value (threshold) and the highest (maximum) suitability value was selected as the candidate for the lexicon. This study was conducted with consideration for the important information in the free text of patient complaints and could be used subsequently to support a wide variety of clinical decisions.
    Conference Paper · Nov 2015