I Made Suwija Putra’s research while affiliated with Sepuluh Nopember Institute of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (5)


SNLI Indo: A Recognizing Textual Entailment Dataset in Indonesian Derived from the Stanford Natural Language Inference Dataset
  • Article

December 2023

·

11 Reads

·

6 Citations

Data in Brief

I Made Suwija Putra

·

Daniel Siahaan

·

Recognizing textual entailment (RTE) is an essential task in natural language processing (NLP). It is the task of determining the inference relationship between text fragments (premise and hypothesis), of which the inference relationship is either entailment (true), contradiction (false), or neutral (undetermined). The most popular approach for RTE is neural networks, which has resulted in the best RTE models. Neural network approaches, in particular deep learning, are data-driven and, consequently, the quantity and quality of the data significantly influences the performance of these approaches. Therefore, we introduce SNLI Indo, a large-scale RTE dataset in the Indonesian language, which was derived from the Stanford Natural Language Inference (SNLI) corpus by translating the original sentence pairs. SNLI is a large-scale dataset that contains premise-hypothesis pairs that were generated using a crowdsourcing framework. The SNLI dataset is comprised of a total of 569,027 sentence pairs with the distribution of sentence pairs as follows: 549,365 pairs for training, 9,840 pairs for model validation, and 9,822 pairs for testing. We translated the original sentence pairs of the SNLI dataset from English to Indonesian using the Google Cloud Translation API. The existence of SNLI Indo addresses the resource gap in the field of NLP for the Indonesian language. Even though large datasets are available in other languages, in particular English, the SNLI Indo dataset enables a more optimal development of deep learning models for RTE in the Indonesian language.





Deteksi Kesamaan Teks Jawaban pada Sistem Test Essay Online dengan Pendekatan Neural Network
  • Article
  • Full-text available

December 2021

·

206 Reads

·

5 Citations

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

E-learning is an online learning system that applies information technology in the teaching process. E-learning used to facilitate information delivery, learning materials and online test or assignments. The online test in evaluating students’ abilities can be multiple choice or essay. Online test with essay answers is considered the most appropriate method for assessing the results of complex learning activities. However, there are some challenges in evaluating students essay answers. One of the challenges is how to make sure the answers given by students are not the same as other students answers or 'copy-paste'. This study makes a similarity detection system (Similarity Checking) for students' essay answers that are automatically embedded in the e-learning system to prevent plagiarism between students. In this paper, we use Artificial Neural Network (ANN), Latent Semantic Index (LSI), and Jaccard methods to calculate the percentage of similarity between students’ essays. The essay text is converted into array that represents the frequency of words that have been preprocessed data. In this study, we evaluate the result with mean absolute percentage error (MAPE) approach, where the Jaccard method is the actual value. The experimental results show that the ANN method in detecting text similarity has closer performance to the Jaccard method than the LSI method and this shows that the ANN method has the potential to be developed in further research.

Download

Citations (3)


... One example, annotation errors were found in the CONLL-2003 corpus, a corpus commonly used in NLP tasks in English [6]. Meanwhile, Indonesian is still developing a corpus for basic NLP tasks [7], [8]. Several researchers have developed a corpus for POS Tagging, including Lim [9] and Fu [10]. ...

Reference:

Annotation Error Detection and Correction for Indonesian POS Tagging Corpus
SNLI Indo: A Recognizing Textual Entailment Dataset in Indonesian Derived from the Stanford Natural Language Inference Dataset
  • Citing Article
  • December 2023

Data in Brief

... Examining models' performance on fine-grained and subjective semantic relations can explore and unveil their underlying linguistic competence and biases, contributing to the ongoing effort to interpret and refine their behaviour. In order for the models to be better in understanding language dynamics, we think that the development of specialized evaluation datasets is a foundational step towards benchmarking and improving their ability to disentangle complex semantic phenomena (Putra et al., 2024). ...

Recognizing textual entailment: A review of resources, approaches, applications, and challenges
  • Citing Article
  • September 2023

ICT Express

... Tokenization is a tokenization step that breaks text into words [25] which allows a more in-depth analysis of each word in the text. The result of tokenization is a series of tokens in each sentence. ...

Deteksi Kesamaan Teks Jawaban pada Sistem Test Essay Online dengan Pendekatan Neural Network

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)