
Nuno C. Marques- PhD
- Professor (Assistant) at Universidade NOVA de Lisboa
Nuno C. Marques
- PhD
- Professor (Assistant) at Universidade NOVA de Lisboa
About
77
Publications
19,731
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
515
Citations
Introduction
For any request, please send me a direct email: nmmfct.unl.pt .
Professor Auxiliar & Senior Researcher
DI-FCT/UNL & NOVA LINCS
https://www.cienciavitae.pt/pt/EA19-6C92-8928
Current institution
Additional affiliations
September 2001 - present
Publications
Publications (77)
The accurate estimation of dam natural frequencies and their evolution over time can be very important for dynamic behaviour analysis and structural health monitoring. However, automatic modal parameter estimation from ambient vibration measurements on dams can be challenging, e.g., due to the influence of reservoir level variations, operational ef...
The accurate estimation of dam natural frequencies and their evolution over time can be very important for dynamic behaviour analysis and structural health monitoring. However, automatic modal parameter estimation from ambient vibration measurements on dams can be challenging, e.g., due to the influence of reservoir level variations, operational ef...
For a favorable prognosis of breast cancer, early diagnosis is essential. The histopathological analysis is considered the gold standard to indicate the type of cancer. Histopathology consists of analyzing characteristics of the lesions through tissue sections stained with Hematoxylin and Eosin. During the last years, there is much interest in deve...
Speech therapy games present a relevant application of business intelligence to real-world problems. However many such models are only studied in a research environment and lack the discussion on the practical issues related to their deployment. In this article, we depict the main aspects that are critical to the deployment of a real-time sound rec...
Financial data is increasingly made available in high quantities and in high quality for companies that trade in the stock market. However, such data is generally made available comprising many distinct financial indicators and most of these indicators are highly correlated and non-stationary. Computational tools for visualizing the huge diversity...
In order to develop computer tools for speech therapy that reliably classify speech productions, there is a need for speech production corpora that characterize the target population in terms of age, gender, and native language. Apart from including correct speech productions, in order to characterize the target population, the corpora should also...
Many children with speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game, which is controlled by the children's voices in real time, with the purpose of helping children on practicing the production of European Portuguese (EP) sibilant consonants. For this, the game uses a sibilant consonant cla...
The diagnosis of breast cancer in early stage is essential for successful treatment. Detection can be performed in several ways, the most common being through mammograms. The projections acquired by this type of examination are directly affected by the composition of the breast, which density can be similar to the suspicious masses, being a challen...
O diagnóstico de câncer de mama em estágio inicial é essencial para o sucesso do tratamento. A detecção pode ser realizada de várias maneiras, sendo a mais comum com a utilização das mamografias. As projeções adquiridas por esse tipo de exame dificultam a detecção de massas, pois a composição da mama possui uma densidade semelhante às massas suspei...
Many children suffering from speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game that is controlled by the children’s voices in real time and that allows children to practice the European Portuguese sibilant consonants. For this, the game uses a sibilant consonant classifier. Since the game do...
The comprehension of social network phenomena is closely related to data visualization. However, even with only hundreds of nodes, the visualization of dense networks is usually difficult. The strategy adopted in this work is data reduction using communities. Community detection in social network analysis is a very important issue and in particular...
A detecção do câncer de mama, tradicionalmente, é efetuada pela análise manual de mamografias, o que demanda tempo e um nível de concentração elevado do especialista devido as diferenças tênues entre os tecidos doentes e saudáveis. O objetivo deste trabalho, é o desenvolvimento de uma aplicação de auxílio ao diagnóstico de massas em mamografias dig...
Matrix and data manipulation programming languages are an essential tool for data analysts. However, these languages are often unstructured and lack modularity mechanisms. This article presents a knowledge discovery approach for studying manifestations of the lack of modularity support in that sort of languages. The study is focused on Matlab, as a...
Matrix and data manipulation programming languages are an essential tool for data analysts. However, these languages are often unstructured and lack modularity mechanisms. This paper presents a business intelligence approach for studying the manifestations of lack of modularity support in that kind of languages. The study is focused on MATLAB as a...
Financial data provides a valuable up-to-date knowledge of the world economy. However, it is presented in extremely large data volumes, in diverse formats, and is constantly being updated at a high speed. The Ramex-Forum algorithm is oriented to guide financial experts in finding new and relevant information. We present a sensitivity analysis and n...
Corpus of Portuguese two-word phrases, manually classified in the interval [-3, 3]. The interval has the following meaning: [-3, -1] negative phrase, [0] neutral phrase, [3, 1] positive phrase.
Dataset described in the paper "Finding Compositional Rules for Determining the Semantic Orientation of Phrases", on section 2.1.
The semantic compositionality principle states that the meaning of an expression can be determined by its parts and the way they are put together. Based on that principle, this paper presents a method for finding the set of compositional rules that best explain the positive, negative, and neutral semantic orientation (SO) of two-word phrases, in te...
The Internet of things promises a continuous flow of data where traditional database and data-mining methods cannot be applied. This paper presents improvements on the Ubiquitous Self-Organized Map (UbiSOM), a novel variant of the well-known Self-Organized Map (SOM), tailored for streaming environments. This approach allows ambient intelligence sol...
This paper addresses the problem of classifying news headlines into sentiment categories. Using a supervised approach, we train a classifier for classifying each news headline as positive, negative, or neutral. A news headline is considered positive if it is associated with good things, negative if it is associated with bad things, and neutral in t...
We present a sensibility analysis and new visualizations using an improved version of the Ramex-Forum algorithm applied to the study of the petroleum production chain. Different combinations of parameters and new ways to visualize data will be used. Results will highlight the importance of Ramex-Forum and its proper parameterizations for analyzing...
The Internet of Things promises a continuous flow of data where traditional database and data-mining methods cannot be applied. This paper presents a novel variant of the well-known Self-Organized Map (SOM), called Ubiquitous SOM (UbiSOM), that is being tailored for streaming environments. This approach allows ambient intelligence solutions using m...
In social network analysis the identification of communities and the discovery of brokers is a very important issue. Community detection typically uses partition techniques. In this work the information extracted from social networking goes beyond cohesive groups, enabling the discovery of brokers that interact between communities. The partition is...
Java implementation of the algorithm described in the paper «A Bootstrapping Algorithm for Learning the Polarity of Words», on section "3 The Polarity Propagation Algorithm".
Last version: https://github.com/i000313/phd.polarity.propagation
Java implementation of an algorithm proposed for learning the polarity of words. The algorithm is described in the paper «A Bootstrapping Algorithm for Learning the Polarity of Words», on section "3 The Polarity Propagation Algorithm".
Short description:
Given, a) a graph of synonym and antonym words; b) a small list of «positive», «negative», a...
This data set is a list of 524 Portuguese words manually classify as:
positive (1), negative (-1), neutral (0), ambiguous (A), and unknown (U).
This data set was used on the evaluation of experiment reported on the «Determining the Polarity of Words through a Common Online Dictionary» paper.
The evaluation and data set are described in more detai...
The identification of the economic activities performed by a company and its recognition from the text in the company's web site, is a task that has not yet received much attention in text mining and business intelligence applications. In this paper, we present a system designed for recognising economic activities performed by companies from text o...
This position paper proposes a framework based on a feature clustering method using Emergent Self-Organizing Maps over streaming data (Ubi-SOM) and Ramex-Forum - a sequence pattern mining model for financial time series modeling based on observed instantaneous and long term relations over market data. The proposed framework aims at producing realis...
This paper presents work on the automatic creation of a polarity lexicon based on a lexical-semantic network. During this work, we noticed that the language registers of a relation should be considered in polarity propagation. After analysing the possible registers and performing some experiments, our intuition was confirmed – there are registers t...
Polarity lexicons are lists of words (or meanings) where each entry is labelled as positive, negative or neutral. These lists are not available for different languages and specific domains. This work proposes and evaluates a new algorithm to classify words as positive, negative or neutral, relying on a small seed set of words, a common dictionary a...
Ubiquitous Data Mining is a recent research topic that uses data mining tech- niques to extract useful knowledge from data continuously generated from devices with limited computational resources that move in time and space. The goal of this workshop is to convene researchers (from both academia and industry) who deal with machine learning and data...
Este artigo apresenta e avalia um sistema para extrair relações a partir de títulos de notícias. O sistema não requer que seja definido antecipadamente o conjunto de relações a extrair. Para tal, extrai relações do tipo (sujeito, verbo, objeto). São também extraídos determinados atributos dos elementos da relação, assim como a inter-relação entre d...
Traditional stock market analysis is based on the assumption of a stationary market behavior. The recent financial crisis was an example of the inappropriateness of such assumption, namely by detecting the presence of much higher variations than what would normally be expected by traditional models. Data stream methods present an alternative for mo...
We address the problem of mining data streams using Artificial Neural Networks (ANN). Usual data stream clustering models (eg. k-means) are too dependent on assumptions regarding cluster statistical properties (ie. number of clusters, cluster shape), while unsupervised ANN algorithms (Adaptive Resonant Theory - ART networks and Self-Organizing Maps...
Considerable attention has been given to polarity of words and the creation of large polarity lexicons. Most of the approaches rely on advanced tools like part-of-speech taggers and rich lexical resources such as WordNet. In this paper we show and examine the viability to create a moderate-sized polarity lexicon using only a common online dictionar...
Neuro-symbolic integration merges background knowledge and neural networks to provide a more effective learning system. It
uses the Core Method as a means to encode rules. However, this method has several drawbacks in dealing with rules that have
temporal extent. First, it demands some interface with the world which buffers the input patterns so th...
This paper presents a stop-loss - maximum return (SLMR) trading strategy based on improving the classic moving average technical indicator with neural networks. We propose
an improvement in the efficiency of the long term moving average by using the limited recursion in Elman Neural Networks,
jointly with hybrid neuro-symbolic neural network, while...
Recent results in hybrid neural networks using extended versions of the core method have shown that we can use background knowledge to guide back-propagation learning. This paper further explores this ideas by adding numeric functions to the encoded knowledge and using the traditional recursive Elman neural network model. An illustration of the pro...
Learning vector quantization (LVQ) is a supervised neural network method applicable in non-linear separation problems and
widely used for data classification. Existing LVQ algorithms are mostly focused on numerical data. This paper presents a batch
type LVQ algorithm used for classifying data with categorical values. The batch learning rules make p...
Learning vector quantization (LVQ) is a supervised learning algorithm for data classification. Since LVQ is based on prototype vectors, it is a neural network approach particularly applicable in non-linear separation problems. Existing LVQ algorithms are mostly focused on numerical data. This paper presents a batch type LVQ algorithm used for mixed...
This paper proposes an extension to the neuro-symbolic core method useful when observations are expressed by continuous values. Some theoretical results are presented regarding the learning process over these observations. An illustrative example is reported, demonstrating the problems of the original approach and justifying how this extension can...
Neural networks are one of the most efficient techniques for learning from scarce data. This property is very useful when trying to build a part-of-speech tagger. Available part-of-speech taggers need huge amounts of hand tagged text, but for Portuguese there is no such corpora available. In this paper we propose a neural network that, apparently,...
We report on an experiment where we inserted symbolic rules into a neural network during the training process. This was done to guide the learning and to help escape local minima. The rules are constructed by analysing the errors made by the network after training. This process can be repeated, which allows to improve the network performance again...
This paper presents FeaSANNT, an evolutionary feature selec-tion and weight training pro-cedure for neural network clas-sifi ers. FeaSANNT exploits the global nature of the evolution-ary search to avoid sub-optimal peaks of performance. The nov-elty of the method lies in the implementation of the embed-ded approach in an evolution-ary feature selec...
Part-of-speech tagging (POS) assigns grammatical tags (like noun, verb, etc.) to a word depending on its definition and its context. This is a first step before parsing may be applied. POS tagging and more generically word tagging, plays an important role in computational linguistics and in many information retrieval and text mining tasks. Neither...
We propose a method for a parallel implementation of the Self-Organizing Map (SOM) algorithm, widely used in data-mining. We call this method Hybrid in the sense that it combines the advantages of the common network-partition and data-partition approaches, and is par-ticularly effective when dealing with large maps. Based on the fact that a global...
In this paper, we propose a framework named UMT (User-profile Modeling based on Transactional data) for modeling user group profiles based on the transactional data. UMT is a generic framework for a pplication systems that keep the historical transactions of their users. In UMT, user group profiles consist of three types: basic information attribut...
A new machine learning approach is presented for automatic detection of Mediterranean water eddies from sea surface temperature
maps of the Atlantic Ocean. A pre-processing step uses Laws’ convolution kernels to reveal microstructural patterns of water
temperature. Given a map point, a numerical vector containing information on local structural pr...
Self-organizing maps (SOM) have been recognized as a powerful tool in data exploratoration, especially for the tasks of cluste ring on high dimensional data. However, clustering on categorical data is still a cha llenge for SOM. This paper aims to extend standard SOM to handle feature values of categorical type. A batch SOM algorithm (NCSOM) is pre...
This paper presents an application of the biolog-ically realistic JASTAP neural network model to classification tasks. The JASTAP neural network model is presented as an alternative to the basic multi–layer perceptron model. An evolutionary procedure previously applied to the simultaneous solution of feature selection and neural network training on...
In this paper we describe a five semester experiment on the introduction of Octave to teach computer programming to technical science students. We discuss the main advantages and disadvantages of this approach relatively to more traditional programming languages. After a qualitative and quantitative analysis of student evaluation results we argue t...
In this paper, a new approach to Mediterranean Water Eddy border detection is proposed. Kohonen self-organizing maps (SOM) are used as data mining tools to cluster image pixels through an unsupervised process. The clus- ters are visualized on the SOM internal map. From the visualization, the borders can be detected through an interactive way. As a...
In this paper, we study the computational cost of extracting character n-grams from a corpus. We propose an approach for reducing
this cost which is relevant especially for text mining and natural language applications. The underlying idea is to take under
consideration only n-grams occurring above a given frequency in a corpus. This approach is ap...
Data mining is usually associated with centralized data mining sys- tems. Here we present an approach to develop a data mining system in distributed environments. The main difficulty in this approach is the unrestricted sharing of information and dynamic integration of components. In this paper, we present a Web Service-based approach to solve thes...
It is current belief that POS-taggers need huge amounts of hand tagged text for training (in the order of 10/5 pretagged words). In this paper we show how to generate POS-taggers trained with no more than 10/4 hand tagger words. These taggers achieve precision results that are as good as the best performant state-of-the-art POS-taggers. We overcome...
The analysis of textual data may start by classifying words usinga predefined tag set. However, it is still a problem for
natural language text understanding the assignment of part-of-speech tags to words in unrestricted text (called POS-tagging).
Most part of current taggers require huge amounts of hand tagged text for training (in the order of 10...
In this paper we show how several nonindependent features can be conjugated for loglinear statistical modeling of subcategorization information. Having this in mind we will present a method for unsupervised learning of statistical loglinear models for words with the same subcategorization frame, using huge collections of fully automatically part-of...
We will describe the use of neural networks to perform part-of-speech (POS) disambiguation of textual corpora. Available part-of-speech taggers need huge amounts of hand tagged text, but for Portuguese there is no such corpora available. In this paper we propose a neural network that, apparently, is capable of overcoming the huge training corpus pr...
In this paper we will describe a process for mining syntactical verbal subcategorization, i.e. the information about the kind
of phrases or clauses a verb goes with. We will use a large text corpus having almost 10,000,000 tagged words as our resource
material. Loglinear modeling is used to analyze and automatically identify the subcategorization d...
In this paper we show how loglinear models can be used to cluster verbs based on their subcategorization preferences. We describe how the information about the phrases or clauses a verb goes with can be computationally learned from an automatically tagged corpus with 9,333,555 words. We will use loglinear modeling to describe the relation between t...
In this paper we will describe the work that is being cooperatively done by Portugal and Brazil. It uses Statistical Methods for Natural Language Processing. Namely, we will focus on the problem of Part-of-Speech (POS) Tagging. POS Tagging is a recent and successful technique for assigning each word in a sentence its correct POS tag. This technique...
Neural networks are one of the most efficient techniques for learning from scarce data. This property is very useful when trying to build a part-of-speech tagger. Available part-of-speech taggers need huge amounts of hand tagged text, but for Portuguese as well as for many other languages there are no such hand tagged corpora available. In this pap...
In this paper we show how a POS-tagger can be successfully adapted to a real world information retrieval system capable of extracting postal addresses from the Internet. We develop a particular tag-set for this system. Then we present and discuss the results acquired with the developed postal address tag-set. We conclude the paper by presenting a s...
Várias entidades têm desenvolvido um substancial esforço na elaboração e processamento de cor-pora de texto, com utilização de diferentes linguagens, ambientes de desenvolvimento e formatos de representação. Para tal, cada entidade envolvida no processo utilizou os recursos que entendeu serem mais adequados ao seu trabalho com a consequente redundâ...
In this paper we present a theoretical model for evaluating agent's belief. We model the agent's belief as a threshold function of agent's certainty. The main problem is that agent's certainty is unobservable. This model is based on results from different tests. A propagation algorithm is presented. We also discuss an example for determining a stat...
The present paper is intended for presenting a Portuguese lexicon acquisition and retrieval system, as well as a survey of its possible applications. This system incorporates a graphical interface, PLAIN 2 (Portuguese Lexicon Acquisition INterface), a word derivation machine, and a lexical database system. POLARIS is intended for use in real proble...