Maria das Graças Volpe Nunes

Maria das Graças Volpe Nunes
University of São Paulo | USP · Institute of Mathematical and Computer Sciences (ICMC) (São Carlos)

About

164
Publications
16,413
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,570
Citations
Citations since 2017
6 Research Items
492 Citations
2017201820192020202120222023020406080100
2017201820192020202120222023020406080100
2017201820192020202120222023020406080100
2017201820192020202120222023020406080100

Publications

Publications (164)
Article
Full-text available
Com o avanço da área de Processamento de Linguagem Natural (PLN), corpora são recursos que têm tido um lugar de destaque. Mais do que subsidiar estudos linguísticos, eles constituem as bases para o treinamento de modelos de Aprendizagem de Máquina e para o desenvolvimento de aplicações computacionais de ponta. Particularmente, há grande necessidade...
Conference Paper
Full-text available
This paper presents the project of a large multi-genre treebank for Brazilian Portuguese, called Porttinari. We address relevant research questions in its construction and annotation, reporting the work already done. The treebank is affiliated with the “Universal Dependencies” international model, widely adopted in the area, and must be the basis f...
Article
The large amount of data available in social media, forums and websites motivates researches in several areas of Natural Language Processing, such as sentiment analysis. The popularity of the area due to its subjective and semantic characteristics motivates research on novel methods and approaches for classification. Hence, there is a high demand f...
Conference Paper
Full-text available
We report in this paper the coreference annotation process of the CSTNews corpus as part of a collective task of the IberEval 2017 conference. The annotated corpus is composed of 140 news texts written in Brazilian Portuguese language and counts with several annotation layers, including annotations in the morphosyntax/syntax, semantics, and discour...
Article
Full-text available
Text normalization techniques based on rules, lexicons or supervised training requiring large corpora are not scalable nor domain interchangeable, and this makes them unsuitable for normalizing user-generated content (UGC). Current tools available for Brazilian Portuguese make use of such techniques. In this work we propose a technique based on dis...
Conference Paper
Recently, spell checking (or spelling correction systems) has regained attention due to the need of normalizing user-generated content (UGC) on the web. UGC presents new challenges to spellers, as its register is much more informal and contains much more variability than traditional spelling correction systems can handle. This paper proposes two ne...
Research
Full-text available
Em 1990 the Portuguese-speaking countries have signed an agreement on the reform of the Portuguese language orthography. The implementation of this reform was scheduled for the period between 2008 and 2012, subsequently postponed to 2016. In this work we describe the adaptation process of the Brazilian Portuguese dictionary embedded in the Unitex s...
Conference Paper
This paper presents some results on lexicon-based classification of sentiment polarity in web reviews of products written in Brazilian Portuguese. They represent a first step towards a robust opinion miner from reviews of technology products. The evaluation shows the performance of 3 different sentiment lexicons combined with simple strategies. It...
Technical Report
Full-text available
O hunsrückisch constitui hoje a variedade de alemão mais falada no Brasil. Este trabalho tem como objetivo construir um corpus alinhado bilíngue alemão hunsrückischportuguês brasileiro, e a partir dele, obter um léxico bilíngue que possa ser utilizado na construção de um sistema de tradução automática estatística (SMT) entre as duas línguas. Apesar...
Article
Full-text available
The number of citations received by authors in scientific journals has become a major parameter to assess individual researchers and the journals themselves through the impact factor. A fair assessment therefore requires that the criteria for selecting references in a given manuscript should be unbiased with regard to the authors or journals cited....
Article
The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore...
Article
a b s t r a c t Establishing metrics to assess machine translation (MT) systems automatically is now crucial owing to the widespread use of MT over the web. In this study we show that such evaluation can be done by modeling text as complex networks. Specifically, we extend our previous work by employing additional metrics of complex networks, whose...
Article
Topological and dynamic features of complex networks have proven in recent years to be suitable for capturing text characteristics, with various applications in natural language processing. In this article we show that texts with positive and negative opinions can be distinguished from each other when represented as complex networks. The distinctio...
Article
Full-text available
Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach...
Article
Full-text available
Este artigo faz uma breve apresentação do Núcleo Interinstitucional de Linguística Computacional (NILC), que é um dos principais grupos brasileiros dedicado a pesquisas na área de Processamento de Línguas Naturais, particularmente do português brasileiro. Após apresentar um breve histórico de sua formação, mostramos como as atuais áreas de pesquisa...
Article
Full-text available
The number of citations received by authors in scientific journals has become a major parameter to assess individual researchers and the journals themselves through the impact factor. A fair assessment therefore requires that the criteria for selecting references in a given manuscript should be unbiased with respect to the authors or the journals c...
Article
Full-text available
Motivated by governmental, commercial and academic interests, and due to the growing amount of information, mainly online, automatic text summarization area has experienced an increasing number of researches and products, which led to a countless number of summarization methods. In this paper, we present a comprehensive comparative evaluation of th...
Article
Due to idiosyncrasies in their syntax, semantics or frequency, Multiword Expressions (MWEs) have received special attention from the NLP community, as the methods and techniques developed for the treatment of simplex words are not necessarily suitable for them. This is certainly the case for the automatic acquisition of MWEs from corpora. A lot of...
Article
Full-text available
Topological and dynamic features of complex networks have proven to be suitable for capturing text characteristics in recent years, with various applications in natural language processing. In this article we show that texts with positive and negative opinions can be distinguished from each other when represented as complex networks. The distinctio...
Article
Full-text available
Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic r...
Article
Full-text available
A fusão de sentenças é uma tarefa que consiste em produzir, a partir de um conjunto de sentenças relacionadas, uma única sentença que resume as informações comuns apresentadas no conjunto. Essa tarefa é de grande interesse em diversas aplicações do Processamento de Língua Natural (PLN), tais como a Sumarização Automática, a Tradução Automática, os...
Article
Automatic summarization of texts is now crucial for several information retrieval tasks owing to the huge amount of information available in digital media, which has increased the demand for simple, language-independent extractive summarization strategies. In this paper, we employ concepts and metrics of complex networks to select sentences for an...
Article
This paper presents a Portuguese sentence fusion model. Sentence fusion is a text-to-text generation task which takes a set of similar sentences as input and combines these into a single output sentence. This process is of extreme relevance in many NLP applications, for instance, to treat redundancies in Multidocument Summarization by fusing inform...
Article
Full-text available
Apresentamos neste artigo o processo de desenvolvimento e avaliação de um analisador discursivo automático para o português brasileiro. Seguindo a Teoria de Estruturação Retórica, o DiZer é um sistema simbólico baseado na ocorrência de marcadores textuais, fazendo uso de templates discursivos extraídos de um corpus de textos científicos para identi...
Conference Paper
Full-text available
Motivated by governmental, commercial and academic interests, automatic text summarization area has experienced an increasing number of researches and products, which led to a countless number of summarization methods. In this paper, we present a comprehensive comparative evaluation of the main automatic text summarization methods based on rhetoric...
Conference Paper
Full-text available
In this paper we present experiments concerned with automatically learning bilingual resources for machine translation: bilingual dictionaries and transfer rules. The experiments were carried out with Brazilian Portuguese (pt), English (en) and Spanish (es) texts in two parallel corpora: pt–en and pt–es. They were designed to investigate the releva...
Conference Paper
Full-text available
Identifying similar text passages plays an important role in many applications in NLP, such as paraphrase generation, automatic summarization, etc. This paper presents some experiments on detecting and clustering similar sentences of texts in Brazilian Portuguese. We propose an evalution framework based on an incremental and unsupervised clustering...
Article
Full-text available
Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author charac...
Conference Paper
Full-text available
Although it has been always thought that Word Sense Disambiguation (WSD) can be useful for Machine Translation, only recently efforts have been made towards integrating both tasks to prove that this assumption is valid, particularly for Statistical Machine Translation (SMT). While different approaches have been proposed and results started to conve...
Article
Full-text available
This paper presents a freely available online lexical align-ment tool based on the LIHLA lexical aligner. LIHLA aligns tokens, words and multiword units based on language-independent heuristics (cognates, position, etc.) and auto-matically built language-dependent resources (bilingual dic-tionaries). VisualLIHLA allows the online usage, visualiza-t...
Article
Full-text available
The availability of machine-readable bilingual linguistic resources is cru-cial not only for machine transla-tion but also for other applications such as cross-lingual information re-trieval. However, the building of such resources demands extensive manual work. This paper describes a methodology to build automatically bilingual dictionaries and tr...
Article
Full-text available
This paper presents a modeling technique of texts as complex networks and the investigation of the correlation between the properties of such networks and author characteristics. In an experiment with several books from eight authors, we show that the networks produced for each author tend to have specific features, which indicates that complex net...
Article
Full-text available
In this letter the authors discuss the relationship between structure and random walk dynamics in directed complex networks, with an emphasis on identifying whether a topological hub is also a dynamical hub. They establish the necessary conditions for networks to be topologically and dynamically fully correlated (e.g., word adjacency and airport ne...
Conference Paper
Full-text available
We present a novel approach to the word sense disambiguation problem which makes use of corpus-based evidence com- bined with background knowledge. Em- ploying an inductive logic programming algorithm, the approach generates expres- sive disambiguation rules which exploit several knowledge sources and can also model relations between them. The ap-...
Article
Full-text available
We describe an approach to the automatic crea-tion of a sense tagged corpus intended to train a word sense disambiguation (WSD) system for English-Portuguese machine translation. The ap-proach uses parallel corpora, translation diction-aries and a set of straightforward heuristics. In an evaluation with nine corpora containing 10 am-biguous verbs,...
Article
Full-text available
This paper presents the challenge of Natural Language Processing, in particular, the case of Portuguese language in the scope of Computer Science and its disciplines. Questions related to natural language processing are associated to the challenges of knowledge access, information management in data intensive repositories, and the complex and inter...
Article
Full-text available
Translation lexicons are one of the most important linguistic resources for machine translation. However, this bilingual set of word and multiword correspondences requires a lot of manual work to be built. This paper describes a method to automatically build translation lexicons. The lexicons are built by extracting knowledge from PoS-tagged and le...
Article
Full-text available
We describe two systems participating of the English Lexical Sample task in SemEval- 2007. The systems make use of Inductive Logic Programming for supervised learning in two different ways: (a) to build Word Sense Disambiguation (WSD) models from a rich set of background knowledge sources; and (b) to build interesting features from the same knowled...
Article
Full-text available
In this article we address the usefulness of linguistic-independent methods in extrac- tive Automatic Summarization, arguing that linguistic knowledge is not only useful, but may be necessary to improve the in- formativeness of automatic extracts. An as- sessment of four diverse AS methods on Brazilian Portuguese texts is presented to support our c...
Conference Paper
Full-text available
We describe two systems participating of the English Lexical Sample task in SemEval-2007. The systems make use of Inductive Logic Programming for supervised learning in two different ways: (a) to build Word Sense Disambiguation (WSD) models from a rich set of background knowledge sources; and (b) to build interesting features from the same knowledg...
Article
Full-text available
Previous efforts in complex networks research focused mainly on the topological features of such networks, but now also encompass the dynamics. In this Letter we discuss the relationship between structure and dynamics, with an emphasis on identifying whether a topological hub, i.e. a node with high degree or strength, is also a dynamical hub, i.e....
Conference Paper
Full-text available
In this paper, we present and analyze the results of the application of a text summarization system – GistSumm – to the task of monolingual question answering at CLEF 2006 for Portuguese texts. We hypothesized that topic-oriented summarization techniques could be able to produce more accurate answers. However, our results showed that there is a big...
Conference Paper
Feature engineering is known as one of the most important challenges for knowledge acquisition, since any inductive learning system depends upon an efficient representation model to find good solutions to a given problem. We present an NLP-driven constructive learning method for building features based upon noun phrases structures, which are suppo...
Conference Paper
Full-text available
We propose a strategy to support Word Sense Disambigua- tion (WSD) which is designed speciflcally for multilingual applications, such as Machine Translation. Co-occurrence information extracted from the translation context, i.e., the set of words which have already been translated, is used to deflne the order in which disambiguation rules produced...
Conference Paper
Full-text available
The identification of the correct sense of a word is neces- sary for many tasks in automatic natural language processing like ma- chine translation, information retrieval, speech and text processing. Au- tomatic Word Sense Disambiguation (WSD) is difficult and accuracies with state-of-the art methods are substantially lower than in other areas of t...
Article
Full-text available
In spite of its potential for bidirectionality, Extensible Dependency Grammar (XDG) has so far been used almost exclusively for parsing. This paper represents one of the first steps towards an XDG-based inte-grated generation architecture by tackling what is arguably the most basic among generation tasks: lexicalization. Herein we present a constra...
Conference Paper
Full-text available
The ability to access embedded knowledge makes complex networks extremely promising for natural language processing, which normally requires deep knowledge representation that is not accessible with first-order statistics. In this paper, we demonstrate that features of complex networks, which have been shown to correlate with text quality, can be u...
Conference Paper
Full-text available
It is generally agreed that the ultimate goal of research into Word Sense Disambiguation (WSD) is to provide a technology which can benefit applications; however, most of the work in this area has focused on the development of application-independent models. Taking Machine Translation as the application, we argue that this strategy is not appropria...
Conference Paper
Full-text available
This paper presents a summary evaluation method based on a complex network measure. We show how to model summaries as complex networks and establish a possible correlation between summary quality and the measure known as dynamics of the network growth. It is a generic and language independent method that enables easy and fast comparative evaluation...
Conference Paper
Full-text available
This paper presents the review and evaluation of DiZer – an automatic discourse analyzer for Brazilian Portuguese. Based on Rhetorical Structure Theory, DiZer is a symbolic analyzer that makes use of linguistic patterns learned from a corpus of scientific texts to identify and build the discourse structure of texts. DiZer evaluation shows satisfact...
Article
Full-text available
The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual single-word and multi-word correspondences, translation rules) demands extensive manual work, and,...
Conference Paper
Full-text available
We present a statistical generative model for unsupervised learning of verb argument structures. The model was used to automatically induce the argument structures for the 1,500 most frequent verbs of English. In an evalua- tion carried out for a representative sample of verbs, more than 90% of the in- duced argument structures were judged correct...
Chapter
We present a system that applies Argumentative Zoning (AZ) (Teufel and Moens, 2002), a method of determining argumentative structure in texts, to the task of advising novice graduate writers on their writing. For this task, it is important to automatically determine the rhetorical/argumentative status of a given sentence in the text. On the basis o...
Article
Full-text available
While it is generally agreed that Word Sense Dis-ambiguation (WSD) is an application-dependent task, the great majority of systems pursue applica-tion-independent approaches. We propose a strat-egy to support WSD for Machine Translation which is designed specifically for this application. It relies on the analysis of co-occurrences in the context t...
Article
Full-text available
This paper presents a statistical generative model for unsupervised learning of verb argument structures. The model is based on the noisy-channel model and is trained with the Expectation-Maximization algorithm. The model was used to induce the argument structures for the 1.500 most frequent verbs in English. The evaluation of a sample of this verb...
Article
Full-text available
We investigate the use of ILP for the task of Word Sense Disambiguation (WSD) in two different ways: (a) as a stand-alone c onstructor of models for WSD; and (b) to build interesting features, which can then u sed by standard model-builder such as SVM. Experiments examining a multilingual WSD task in the context of English- Portuguese machine trans...
Book
Since 1993, PROPOR Workshops have become an important forum for - searchers involved in the Computational Processing of Portuguese,both written and spoken. This PROPOR Workshop follows previous workshops held in 1993 (Lisbon, Portugal), 1996 (Curitiba, Brazil), 1998 (Porto Alegre, Brazil), 1999 ´ (Evora, Portugal), 2000 (Atibaia, Brazil) and 2003 (...
Article
Full-text available
In this paper we describe LIHLA, a lexical aligner which uses bilingual probabilistic lexicons generated by a freely available set of tools (NATools) and language-independent heuristics to find links between single words and multiword units in sentence-aligned parallel texts. The method has achieved an alignment error rate of 22.72% and 44.49% on E...
Conference Paper
Full-text available
This work documents the project and development of various computational linguistic resources that support the Brazilian Portuguese language according to the formal methodology used by the corpus processing system called UNITEX. The delivered resources include computational lexicons, libraries to access compressed lexicons, and additional tools to...
Article
Concepts of complex networks have been used to obtain metrics that were correlated to text quality established by scores assigned by human judges. Texts produced by high-school students in Portuguese were represented as scale-free networks (word adjacency model), from which typical network features such as the in/outdegree, clustering coefficient a...
Article
Full-text available
Concepts of complex networks have been used to obtain metrics that were correlated to text quality established by scores assigned by human judges. Texts produced by high-school students in Portuguese were represented as scale-free networks (word adjacency model), from which typical network features such as the in/outdegree, clustering coefficient a...
Conference Paper
This paper focuses on how multiparadigm – namely, constraint, object-oriented and higher-order – programming can be drawn upon not only to specify multiparameterized linguistic realization engines but also and above all to rationalize their configuration into full-fledged generation modules for specific language-application pairs. We describe Manat...
Article
Full-text available
This paper presents a statistical generative model for unsupervised learning of verb argument structures. The model is based on the noisy-channel model and is trained with the Expectation-Maximization algorithm. The model was used to induce the argument structures for the 1.500 most frequent verbs in English. The evaluation of a sample of this verb...
Article
Full-text available
This paper describes the automatic generation and the evaluation of sets of rules for word sense disambiguation (WSD) in machine translation. The ultimate aim is to identify high-quality rules that can be used as knowledge sources in a relational WSD model. The evaluation was carried out both automatically, by means of four objective measures (erro...