Rafael Dueire Lins

Rafael Dueire Lins
Federal University of Pernambuco | UFPE · Center of Informatics (CIn)

Ph.D. in Computing

About

311
Publications
103,496
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,058
Citations
Introduction
Lins pioneering contributions encompass the Lambda-Calculus with explicit substitutions, the solution to cyclic reference counting, the problem of back-to-front interference in documents. being the pioneer in document engineering and digital libraries in Latin America. Lins is a founding member of the doctorates in Computer Science (1990) and in Electrical Engineering (2000) both at Federal University of Pernambuco, in which he supervised 48 M.Sc and 11 Ph.D. theses.
Additional affiliations
December 1986 - present
Federal University of Pernambuco
Position
  • Full Professor in Computing
December 1986 - present
Federal University of Pernambuco
January 1986 - December 1998
University of Kent
Education
October 1984 - December 1986
The University of Kent at Canterbury
Field of study
  • Computer Science
March 1979 - July 1982
Federal University of Pernambuco
Field of study
  • Electronics

Publications

Publications (311)
Chapter
Full-text available
The Covid-19 pandemic has forced the use of means of interaction and project management in an iterative and non-face-to-face manner. Virtual interaction and monitoring tools were evidenced as a solution in this context. The legislation requires companies that manufacture computer goods in the region of the Manaus Free Trade Zone (ZFM), Amazonas, Br...
Article
Full-text available
Smartphones with an in-built camera are omnipresent today in the life of over eighty percent of the world’s population. They are very often used to photograph documents. Document binarization is a key process in many document processing platforms. This paper assesses the quality, file size and time performance of sixty-eight binarization algorithms...
Article
Full-text available
The intrinsic features of documents, such as paper color, texture, aging, translucency, the kind of printing, typing or handwriting, etc., are important with regard to how to process and enhance their image. Image binarization is the process of producing a monochromatic image having its color version as input. It is a key step in the document proce...
Chapter
The recent Time-Quality Binarization Competitions have shown that no single binarization algorithm is good for all kinds of document images and that the time elapsed in binarization varies widely between algorithms and also depends on the document features. On the other hand, document applications for portable devices have space and processing limi...
Article
Full-text available
Social presence is an essential construct of the well-known Community of Inquiry (CoI) model, which is created to support design, facilitation, and analysis of asynchronous online discussions. Social presence focuses on the extent to which participants of online discussions can see each other as “real persons” in computer-mediated communication. In...
Chapter
The ICDAR 2021 Time-Quality Binarization Competition assessed the performance of 12 new and 49 other previously published binarization algorithms for scanned document images. Four test sets of “real-world” documents with different features were used. For each test set, the top twenty algorithms in the quality of the resulting two-tone images had th...
Conference Paper
Full-text available
At the DIB platform (https://dib.cin.ufpe.br) there is a rich material on Document Image Binarization, including test sets, publications and Calls for Competitors. Please enroll there.
Conference Paper
Full-text available
Supervised machine learning models have been widely used to address the classification of messages in online discussions. Supervised learning algorithms require a large set of annotated data to accurately create a predictive model. However, data annotation is a complex task due to three factors: (i) depends on specialists to accurately label data;...
Chapter
Full-text available
This paper investigates emerging roles in the context of the community of inquiry model. The paper reports the results of a study that demonstrated the application of epistemic network and clustering analyses to reveal the roles that different students assumed during an asynchronous course with online discussions. The proposed method highlights the...
Article
Full-text available
This paper investigates the impact of the use of data from different educational contexts in the automatic classification of online discussion messages according to cognitive presence, an essential construct of the community of inquiry model. In particular, this paper analyzed online discussion messages written in Brazilian Portuguese from two diff...
Article
Full-text available
Organizations increasingly need to develop new products and services on an continuous basis to maintain and increase their competitive advantage over the competition in the markets in which they operate. Multifunctional teams performs brainstorming generating ideas for new projects based on the proposed strategic objectives, creating a portfolio of...
Article
Full-text available
Organizations increasingly need to develop new products and services on an continuous basis to maintain and increase their competitive advantage over the competition in the markets in which they operate. Multifunctional teams performs brainstorming generating ideas for new projects based on the proposed strategic objectives, creating a portfolio of...
Conference Paper
Postgraduate degrees are one of the most important propellers of all areas of science. M.Sc. and Ph.D. theses witness the important developments and provide a solid and global account of research projects. This paper describes a platform developed with the aim of generating digital libraries of theses and dissertations. Printed theses have to be sc...
Conference Paper
This paper details the features and the methodology adopted in the construction of the CNN-corpus, a test corpus for single document extractive text summarization of news articles. The current version of the CNN-corpus encompasses 3,000 texts in English, and each of them has an abstractive and an extractive summary. The corpus allows quantitative a...
Conference Paper
Document Camera digitalization devices are low-cost, easy to use, produce good quality images, are able to digitalize pages of bound books without damaging their spine, etc. On the other hand, they may bring two serious problems. The first one appears if the document to be digitalized is printed on glossy paper. The paper reflects the different ill...
Conference Paper
Full-text available
The DocEng'19 Competition on Extractive Text Summarization assessed the performance of two new and fourteen previously published extractive text sumarization methods. The competitors were evaluated using the CNN-Corpus, the largest test set available today for single document extractive summarization.
Conference Paper
Binarization algorithms are an important step in most document analysis and recognition applications. Many aspects of the document affect the performance of binarization algorithms, such as paper texture and color, noises such as the back-to-front interference, stains, and even the type and color of the ink. This work focuses on determining how...
Conference Paper
The ICDAR 2019 Time-Quality Binarization Competition assessed the performance of seventeen new together with thirty previously published binarization algorithms. The quality of the resulting two-tone image and the execution time were assessed. Comparisons were on both in “real-world” and synthetic scanned images, and in documents photographed...
Article
Full-text available
This paper presents a network-based approach to uncovering the relationship between the elements of social and cognitive presences in a community of inquiry. The paper demonstrates how epistemic network analysis (ENA) can provide new qualitative and quantitative insights into the students' development of social and critical thinking skills in commu...
Conference Paper
Full-text available
This paper presents a method for automated content analysis of students’ messages in asynchronous discussions written in Portuguese. In particular, the paper looks at the problem of coding discussion transcripts for the levels of cognitive presence, a key construct in a widely used Community of Inquiry model of online learning. Although there are t...
Conference Paper
Full-text available
In this tutorial, we consider important aspects (algorithms, approaches, considerations) for tagging both unstructured and structured text for downstream use. This includes summarization, in which text information is compressed for more efficient archiving, searching, and clustering. In the tutorial, we focus on the topic of automatic text summariz...
Article
Automatic Text Summarization is the process of creating a compressed representation of one or more related documents, keeping only the most valuable information. The extractive approach for summarization is the most studied and aims to generate a compressed version of a document by identifying, ranking, and selecting the most relevant sentences or...
Article
Full-text available
Monochromatic documents claim for much less computer bandwidth for network transmission and storage space than their color or even grayscale equivalent. The binarization of historical documents is far more complex than recent ones as paper aging, color, texture, translucidity, stains, back-to-front interference, kind and color of ink used in handwr...
Preprint
Full-text available
Monochromatic documents claim for much less computer bandwidth for network transmission and storage space than their color or even grayscale equivalent. The binarization of historical documents is far more complex than recent ones as paper aging, color, texture, translucidity, stains, back-to-front interference, kind and color of ink used in handwr...
Conference Paper
Image binarization is a technique widely used for documents as monochromatic documents claim for far less space for storage and computer bandwidth for network transmission than their color or even grayscale equivalent. Paper color, texture, aging, translucidity, kind and color of ink used in handwritting, printing process, digitalization process, e...
Article
Paraphrase identification consists in the process of verifying if two sentences are semantically equivalent or not. It is applied in many natural language tasks, such as text summarization, information retrieval, text categorization, and machine translation. In general, methods for assessing paraphrase identification perform three steps. First, the...
Book
This book constitutes the thoroughly refereed post-conference proceedings of the 11th International Workshop on Graphics Recognition, GREC 2015, held in Nancy, France, in August 2015. The 10 revised full papers presented were carefully reviewed and selected from 19 initial submissions. They contain both classical and emerging topics of Graphics Rec...
Article
David Turner´s contributions to functional programming language design and implementation were seminal. He is perhaps best known for his pioneering work in combinator graph reduction and for the design and implementation of an influential series of pure, non-strict, functional programming languages: SASL, KRC and Miranda. David invented or co-inven...
Conference Paper
Automatic single-document summarization is a process that receives a single input document and outputs a condensed version with only the most relevant information. This paper proposes an unsupervised concept-based approach for singledocument summarization using Integer Linear Programming (ILP). Such an approach maximizes the coverage of the importa...
Conference Paper
Mobile devices such as smart phones and tablets are omnipresent in modern societies. Such devices allow browsing the Internet. This paper briefly describes two tools for news article summarization in mobile devices that attempts to automatically collect and sieve the most important information of news article in WebPages.
Conference Paper
The existing automatic text summarization systems whenever applied to web-pages of news articles show poor performance as the text is encapsulated within a HTML page. This paper takes advantage of the link identification and content extraction techniques. The results show the validity of such a strategy.
Conference Paper
This paper presents a new method for improving the cohesiveness of summaries generated by extractive summarization systems. The solution presented attempts to improve the legibility and cohesion of the generated summaries through coreference resolution. It is based on a post-processing step that binds dangling coreference to the most important enti...
Conference Paper
Some of the recent state-of-the-art systems for Automatic Text Summarization rely on the concept-based approach using Integer Linear Programming (ILP), mainly for multi-document summarization. A study on the suitability of such an approach to single-document summarization is still missing, however. This work presents an assessment of several method...
Article
The volume of text data has been growing exponentially in the last years, mainly due to the Internet. Automatic Text Summarization has emerged as an alternative to help users find relevant information in the content of one or more documents. This paper presents a comparative analysis of eighteen shallow sentence scoring techniques to compute the im...
Article
World Wide Web applications need to use, constantly update, and maintain large webgraphs for executing several tasks, such as calculating the web impact factor, finding hubs and authorities, performing link analysis by webometrics tools, and ranking webpages by web search engines. Such webgraphs need to use a large amount of main memory, and, frequ...
Article
The degree of similarity between sentences is assessed by sentence similarity methods. Sentence similarity methods play an important role in areas such as summarization, search, and categorization of texts, machine translation, etc. The current methods for assessing sentence similarity are based only on the similarity between the words in the sente...
Conference Paper
Full-text available
The need for automatic generation of summaries gained importance with the unprecedented volume of information available in the Internet. Automatic systems based on extractive summarization techniques select the most significant sentences of one or more texts to generate a summary. This article makes use of Machine Learning techniques to assess the...
Conference Paper
An efficient way to automatically classify documents may be provided by automatic text summarization, the task of creating a shorter text from one or several documents. This paper presents an assessment of the 15 most widely used methods for automatic text summarization from the text classification perspective. A naive Bayes classifier was used sho...
Conference Paper
Text summarization is the process of automatically creating a shorter version of one or more text documents. This paper presents a qualitative and quantitative assessment of the 22 state-of-the-art extractive summarization systems using the CNN corpus, a dataset of 3,000 news articles.
Conference Paper
Full-text available
There has been lately a number of catastrophic events of landslides and mudslides in the mountainous region of Rio de Janeiro, Brazil. Those were caused by intense rain in localities where there was unplanned occupation of slopes of hills and mountains. Thus, it became imperative creating an inventory of landslide risk areas in densely populated ci...
Article
Human vision plays a very important role in the perception of the environment, communication and interaction between individuals. Machine vision is increasingly being embedded in electronic devices, as cameras are used with the function of perceiving the environment and identifying the elements inserted in a scene. Real-time image processing and pa...
Article
Full-text available
Pointwise-supported generalized wavelets are introduced, based on Dirac, doublet and further derivatives of delta. A generalized biorthogonal analysis leads to standard Taylor series and new Dual-Taylor series that may be interpreted as Laurent Schwartz distributions. A Parseval-like identity is also derived for Taylor series, showing that Taylor s...
Chapter
The Semantic Web, proposed by Berners-Lee, aims to make explicit the meaning of the data available on the Internet, making it possible for Web data to be processed both by people and intelligent agents. The Semantic Web requires Web data to be semantically classified and annotated with some structured representation of knowledge, such as ontologies...
Article
Full-text available
This article presents a simple non-destructive method to test if strokes or pieces of text written with ballpoint pens of the same or different manufacturers were added to a document at a later stage. This work was motivated by a real-world fraud case, in which a student was under suspicion of having altered the content of an exam paper. The propos...
Conference Paper
Text “underlining” is a practice of many interested readers, but it may be seen as a noise inserted by the user that damages the physical integrity of a document. This paper presents two different algorithms for underline removal. The first one addresses the case of monochromatic document images. The second algorithm is applied to remove the underl...
Conference Paper
Full-text available
Sentence similarity is used to measure the degree of likelihood between sentences. It is used in many natural language applications, such as text summarization, information retrieval, text categorization, and machine translation. The current methods for assessing sentence similarity represent sentences as vectors of bag of words or the syntactic in...
Article
Relation extraction (RE) aims at finding the way entities, such as person, location, organization, date, etc., depend upon each other in a text document. Ontology Population, Automatic Summarization, and Question Answering are fields in which relation extraction offers valuable solutions. A relation extraction method based on inductive logic progra...
Article
Full-text available
The text data available on the Internet is not only huge in volume, but also in diversity of subject, quality and idiom. Such factors make it infeasible to efficiently scavenge useful information from it. Automatic text summarization is a possible solution for efficiently addressing such a problem, because it aims to sieve the relevant information...
Conference Paper
Full-text available
Text summarization is the process of creating a shorter version of one or more text documents. Automatic text summarization has become an important way of finding relevant information in large text libraries or in the Internet. Extractive text summarization techniques select entire sentences from documents according to some criteria to form a summa...
Conference Paper
The more complete the training set of an optical character recognition platform, the greater the chances of obtaining a better precision in transcription. The development of a database for such purpose is a task of paramount effort as it is performed manually and must be as extensive as possible in order to potentially cover all words in a language...
Article
The Semantic Web, proposed by Berners-Lee, aims to make explicit the meaning of the data available on the Internet, making it possible for Web data to be processed both by people and intelligent agents. The Semantic Web requires Web data to be semantically classified and annotated with some structured representation of knowledge, such as ontologies...
Conference Paper
Full-text available
Detecting the age of a document is an important subject for forensic purposes. As the paper ages, its color changes depending on a number of factors such as its original color, storage conditions, environment temperature, humidity, etc. In Brazil, documents such and birth and wedding certificates during the second half of the 20th century used stan...
Conference Paper
Document images digitalized with cameras are framed with parts of the background where the document lied on. In the case of images acquired with portable digital cameras such background may be complex, of non-uniform colors and texture. This paper presents a new algorithm designed to remove the background of document images acquired using portable...
Conference Paper
Full-text available
Pictures of documents have non-uniform illumination causing shading which may yield to bad quality image for human visualization and unsuitable for some image processing algorithms. Most algorithms do not consider the scenario in which documents have large non-uniform regions such as photographs and illustrations. This paper proposes an algorithm t...