Ruli Manurung’s research while affiliated with University of Indonesia and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (8)


DBpedia Entities Expansion in Automatically Building Dataset for Indonesian NER
  • Conference Paper

October 2016

·

321 Reads

·

15 Citations

·

Ruli Manurung

·

Named Entity Recognition (NER) plays a significant role in Information Extraction (IE). In English, the NER systems have achieved excellent performance, but for the Indonesian language, the systems still need a lot of improvement. To create a reliable NER system using machine learning approach, a massive dataset to train the classifier is a must. Several studies have proposed methods in automatically building dataset for Indonesian NER using Indonesian Wikipedia articles as the source of the dataset and DBpedia as the reference in determining entity types automatically. The objective of our research is to improve the quality of the automatically tagged dataset. We proposed a new method in using DBpedia as the referenced named entities. We have created some rules in expanding DBpedia entities corpus for category person, place, and organization. The resulting training dataset is trained using Stanford NER tool to build an Indonesian NER classifier. The evaluation shows that our method improves recall significantly but has lower precision compared to the previous research.


Fig. 1. Argument Component Annotation Scheme (Stab & Gurevych, 2014a) 
Utilizing Word Vector Representation for Classifying Argument Components in Persuasive Essays
  • Article
  • Full-text available

January 2016

·

349 Reads

·

2 Citations

Aside from the proper usage of grammar, diction and punctuation, a good essay must have cohesion and coherence. In persuasive essay, argumentative discourse is important as the parameter to see the cohesion and coherence among the arguments. An argument is characterized by one's stance (claim) which is strengthened with facts (premises) to complete the validity of the stance. Ideally, claims have to be followed by premises either they support or attack the claims. In this paper, we try to identify 4 kinds of argument components (major claim, claim, premise, and non-argumentative) using some predefined features and measure the performance of word vector representation utilization in identifying argument components. We also present the results of our initial experiment by using deep learning to classify the argument components.

Download

Utilizing Word Vector Representation for Classifying Argument Components in Persuasive Essays

January 2016

·

119 Reads

·

2 Citations

International Journal of Engineering & Technology

Aside from the proper usage of grammar, diction and punctuation, a good essay must have cohesion and coherence. In persuasive essay, argumentative discourse is important as the parameter to see the cohesion and coherence among the arguments. An argument is characterized by one's stance (claim) which is strengthened with facts (premises) to complete the validity of the stance. Ideally, claims have to be followed by premises either they support or attack the claims. In this paper, we try to identify 4 kinds of argument components (major claim, claim, premise, and non-argumentative) using some predefined features and measure the performance of word vector representation utilization in identifying argument components. We also present the results of our initial experiment by using deep learning to classify the argument components.


Fig. 1. Prototype vector mi where i = {4, 5, 6} cannot be BMU for xi, as d (m, mi) ≥ 2 · d (xi, m). Thus, these prototype vectors can be ignored in finding BMU.
Fig. 2. While looking for the BMU, m2 is found closer to xi. Changing the hypersphere's centroid from m(left) to m2(right) shrinks the radius.
SOM Training Optimization Using Triangle Inequality

January 2016

·

366 Reads

Advances in Intelligent Systems and Computing

Triangle inequality optimization is one of several strategies on the k-means algorithm that can reduce the search space in finding the nearest prototype vector. This optimization can also be applied towards Self-Organizing Maps training, particularly during finding the best matching unit in the batch training approach. This paper investigates various implementations of this optimization and measures the efficiency gained on various datasets, dimensions, maps, cluster size and density. Our experiments on synthetic and real life datasets show that the number of comparisons can be reduced to 24 % and the running time can also reduced to between 63 and 87 %.


Knowledge representation system for copula sentence in Bahasa Indonesia based on Web Ontology Language (OWL)

October 2015

·

36 Reads

·

1 Citation

Now the knowledge source in natural language text are available in large quantities. There is an increasing need of knowledge representation, then it would require the knowledge processing on text automatically. The previous research has built on knowledge representation system of natural language text that is OWLizr. However, this study has not been able to handle the knowledge representation in concepts that describe a particular object. This paper developed a knowledge representation system in copula sentences containing concepts in Bahasa Indonesia. If the concept can be handled then the relationship between concepts with components of existing ontology can defined. This supports ontology engineering process that build domain ontology by enrich knowledge of the existing knowledge base. This system combine NLP (Natural Language Processing) techniques and OWL (Web Ontology Language) to model the knowledge contained in the copula sentence. The testing in this system is by unit testing and testing on collection of sentences which adapted from Wikipedia. The results of this research are system can represent knowledge in copula sentences containing concepts in Bahasa Indonesia. The system can generate output in OWL knowledge base that can share and reuse for other systems.


Measuring the Structural and Conceptual Similarity of Folktales using Plot Graphs

January 2015

·

56 Reads

This paper presents an approach to organizing folktales based on a data structure called a plot graph, which captures the narrative flow of events in a folktale. The similarity between two folktales can be computed as the structural similarity between their corresponding plot graphs. This is performed using the well-known Needleman-Wunsch algorithm. To test the efficacy of this approach, experiments are carried out using a small collection of 24 folktales grouped into 5 categories based on the Aarne-Thompson index. The best result is obtained by combining the proposed structural-based similarity measure with a more conventional bag of words vector space model, where 19 out of the 24 folktales (79.16%) yield higher average similarity with folktales within their respective categories as opposed to across categories.


Automatic Identification of Age-Appropriate Ratings of Song Lyrics

January 2015

·

43 Reads

·

3 Citations

This paper presents a novel task, namely the automatic identification of ageappropriate ratings of a musical track, or album, based on its lyrics. Details are provided regarding the construction of a dataset of lyrics from 12,242 tracks across 1,798 albums along with age-appropriate ratings obtained from various web resources, along with results from various text classification experiments. The best accuracy of 71.02% for classifying albums by age groups is achieved by combining vector space model and psycholinguistic features.


Automatic Wayang Ontology Construction using Relation Extraction from Free Text

January 2014

·

20 Reads

·

9 Citations

This paper reports on our work to automatically construct and populate an ontology of wayang (Indonesian shadow puppet) mythology from free text using relation extraction and relation clustering. A reference ontology is used to evaluate the generated ontology. The reference ontology contains concepts and properties within the wayang character domain. We examined the influence of corpus data variations, threshold value variations in the relation clustering process, and the usage of entity pairs or entity pair types during the feature extraction stages. The constructed ontology is examined using three evaluation methods, i.e. cluster purity (CP), instance knowledge (IK), and relation concept (RC). Based on the evaluation results, the proposed method generates the best ontology when using a consolidated corpus, the threshold value in relation clustering is 1, and entity pairs are used during feature extraction.

Citations (5)


... A more recent work 12 , implemented CNN to recognize insufficiently supported arguments. Another work focused on the implementation of word vector representation and Long Short Term Memory unit to identify argument components 13 . ...

Reference:

It Takes Two To Tango: Modification of Siamese Long Short Term Memory Network with Attention Mechanism in Recognizing Argumentative Relations in Persuasive Essay
Utilizing Word Vector Representation for Classifying Argument Components in Persuasive Essays
  • Citing Article
  • January 2016

International Journal of Engineering & Technology

... It also uses deep neural architectures that are best for contextual entity linking, constituency parsing, writing style recognition, sentiment analysis, etc. [46]. Target words are explored through Neural Network, whose hidden layer encodes the word representation [47]. Word2vec is implemented through two models that are Skip-gram and Continuous Bag of Words (CBOW). ...

Utilizing Word Vector Representation for Classifying Argument Components in Persuasive Essays

... However, the need for an annotated corpus makes it not applicable for many languages. In ( Sanabila & Manurung, 2014), they worked on automatic synset extraction from free text. Their methodology is to retrieve the candidate relation patterns and then cluster them based on same semantic tendency which works well as long as the included text patterns are all known to the system. ...

Automatic Wayang Ontology Construction using Relation Extraction from Free Text
  • Citing Conference Paper
  • January 2014

... The highlight is on topic classification. Maulidyani and Manurung [8] explore automatic identification of age-appropriate ratings of song lyrics. The research highlight is the same as this study, but this study works on Indonesian, while Maulidyani and Manurung work in English. ...

Automatic Identification of Age-Appropriate Ratings of Song Lyrics
  • Citing Article
  • January 2015