About
133
Publications
186,433
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,975
Citations
Introduction
Current institution
Publications
Publications (133)
This study examines the interplay between text summarization techniques and embeddings from Language Models (LMs) in constructing expert systems dedicated to the retrieval of legal precedents, with an emphasis on achieving cost-efficiency. Grounded in the growing domain of Artificial Intelligence (AI) in law, our research confronts the perennial ch...
Although the imbalanced learning problem is best known in the context of classification tasks, it also affects other areas of learning algorithms, such as regression. For regression, the problem is characterized by the existence of a continuous target variable domain and the need for models capable of making accurate predictions about rare events....
The e-commerce industry’s rapid growth, accelerated by the COVID-19 pandemic, has led to an alarming increase in digital fraud and associated losses. To establish a healthy e-commerce ecosystem, robust cyber security and anti-fraud measures are crucial. However, research on fraud detection systems has struggled to keep pace due to limited real-worl...
The importance of legal precedents in ensuring consistent jurisprudence is undisputed. Particularly in jurisdictions following the Common law, but even in Civil law systems, uniformity in case law requires adherence to precedents. However, with the growing volume of cases, manual identification becomes a bottleneck, prompting the need for automatio...
In recent years, the field of Topic Modeling (TM) has grown in importance due to the increasing availability of digital text data. TM is an unsupervised learning technique that helps uncover latent semantic structures in large sets of documents, making it a valuable tool for finding relevant patterns. However, evaluating the performance of TM algor...
The illicit activity in Blockchain reached an all-time high in 2021. In this work, we combined two machine learning techniques, Autoencoder (AE) and Extreme Gradient Boosting (XGBoost), to improve the performance of predicting illicit activity at the account level. The choice of autoencoding technique allows us to be able to detect new MOs (modus o...
Judges frequently rely their reasoning on precedents. Courts must preserve uniformity in decisions while, depending on the legal system, previous cases compel rulings. The search for methods to accurately identify similar previous cases is not new and has been a vital input, for example, to case-based reasoning (CBR) methodologies. This literature...
Competitive Intelligence allows an organization to keep up with market trends and foresee business opportunities. This practice is mainly performed by analysts scanning for any piece of valuable information in a myriad of dispersed and unstructured sources. Here we present MapIntel, a system for acquiring intelligence from vast collections of text...
Organized retail crime (ORC) is a significant issue for retailers, marketplace platforms, and consumers. Its prevalence and influence have increased fast in lockstep with the expansion of online commerce, digital devices, and communication platforms. Today, it is a costly affair, wreaking havoc on enterprises’ overall revenues and continually jeopa...
The objective of this article is to provide a comparative analysis of two novel genetic programming (GP) techniques, differentiable Cartesian genetic programming for artificial neural networks (DCGPANN) and geometric semantic genetic programming (GSGP), with state-of-the-art automated machine learning (AutoML) tools, namely Auto-Keras, Auto-PyTorch...
The generation of synthetic data can be used for anonymization, regularization, oversampling, semi-supervised learning, self-supervised learning, and several other tasks. Such broad potential motivated the development of new algorithms, specialized in data generation for specific data formats and Machine Learning (ML) tasks. However, one of the mos...
Due to the difficulties inherent in diagnostics and prognostics, maintaining machine health remains a substantial issue in industrial production. Current approaches rely substantially on human engagement, making them costly and unsustainable, especially in high-volume industrial complexes like fulfillment centers. The length of time that fulfillmen...
Apples are ranked third, after bananas and oranges, in global fruit production. Fresh apples are more likely to be appreciated by consumers during the marketing process. However, apples inevitably suffer mechanical damage during transport, which can affect their economic performance. Therefore, the timely detection of apples with surface defects ca...
Active learning (AL) is a well-known technique to optimize data usage in training, through the interactive selection of unlabeled observations, out of a large pool of unlabeled data, to be labeled by a supervisor. Its focus is to find the unlabeled observations that, once labeled, will maximize the informativeness of the training dataset, therefore...
Precedent is the cornerstone of the Common law system. Even in jurisdictions that follow Civil law, precedents constrain decisions when case law is sufficiently uniform. A systematic disregard of precedents makes judgments less coherent and the law less just. Nevertheless, relying on precedents can also make courts more efficient, whereas recent ad...
Fraud, corruption, and collusion are the most common types of crime in public procurement processes; they produce significant monetary losses, inefficiency, and misuse of the public treasury. However, empirical research in this area to detect these crimes is still insufficient. This article presents a systematic literature review focusing on the mo...
Judges frequently rely their reasoning on precedents. In every circumstance, courts must preserve uniformity in case law and, depending on the legal system, previous cases compel rulings. The search for methods to accurately identify similar previous cases is not new and has been a vital input, for example, to case-based reasoning (CBR) methodologi...
Judges frequently rely their reasoning on precedents. Courts must preserve uniformity in decisions while, depending on the legal system, previous cases compel rulings. The search for methods to accurately identify similar previous cases is not new and has been a vital input, for example, to case-based reasoning (CBR) methodologies. This literature...
Competitive Intelligence allows an organization to keep up with market trends and foresee business opportunities. This practice is mainly performed by analysts scanning for any piece of valuable information in a myriad of dispersed and unstructured sources. Here we present MapIntel, a system for acquiring intelligence from vast collections of text...
Precedents constitute the starting point of judges’ reasoning in national legal systems. Precedents are also an essential input for case-based reasoning (CBR) methodologies. Although considerable research has been done on CBR applied to legal practice, the precedent retrieval techniques are a relatively new and unexplored field of AI & Law. Only a...
Competitive Intelligence allows an organization to keep up with market trends and foresee business opportunities. This practice is mainly performed by analysts scanning for any piece of valuable information in a myriad of dispersed and unstructured sources. Here we present MapIntel, a system for acquiring intelligence from vast collections of text...
In the Machine Learning research community, there is a consensus regarding the relationship between model complexity and the required amount of data and computation power. In real world applications, these computational requirements are not always available, motivating research on regularization methods. In addition, current and past research have...
In the age of the data deluge there are still many domains and applications restricted to the use of small datasets. The ability to harness these small datasets to solve problems through the use of supervised learning methods can have a significant impact in many important areas. The insufficient size of training data usually results in unsatisfact...
In this study, we use panel data to analyse the impact of an R&D tax credit on R&D personnel, particularly the impact on Ph.D. holders allocation, comparing low R&D intensity firms with medium-high and high R&D intensity firms. The results show that, in medium-high and high R&D intensity firms, the R&D tax credit had a significant impact on allocat...
Learning from imbalanced data sets is known to be a challenging task. There are many proposals to tackle the challenge for classification problems, but regarding regression the solutions are few. In the context of regression, imbalanced learning means that there is a concern with the accurate prediction of the target values in a subset of the conti...
Public procurement fraud is a plague that produces significant economic losses in any state and society, but empirical studies to detect it in this area are still scarce. This article presents a review of the most recent literature on public procurement to identify techniques for fraud detection using Network Science. Applying the PRISMA methodolog...
Shopping through Live-Streaming Shopping Apps (LSSAs) as an emerging consumption phenomenon has increased dramatically in recent years, especially during the COVID-19 lockdown period. However, insufficient studies have focused on the psychological processes undergone in different customer demographics while shopping via LSSAs under pandemic conditi...
Fraud in public funding can have deleterious consequences for societies’ economic, social, and political well-being. Fraudulent activity associated with public procurement contracts accounts for losses of billions of euros every year. Thus, it is of utmost relevance to explore analytical frameworks that can help public authorities identify agents t...
In remote sensing, Active Learning (AL) has become an important technique to collect informative ground truth data “on-demand” for supervised classification tasks. Despite its effectiveness, it is still significantly reliant on user interaction, which makes it both expensive and time consuming to implement. Most of the current literature focuses on...
India has proven to be one of the most diverse and dynamic economic regions in the world. Its industry focuses predominantly on the service sector and immediate economic growth seems to steer India into the economic superpower. India's unique business landscape is felt at a regional level, where massive urbanization has become an unavoidable conseq...
Land cover maps are a critical tool to support informed policy development, planning, and resource management decisions. With significant upsides, the automatic production of Land Use/Land Cover maps has been a topic of interest for the remote sensing community for several years, but it is still fraught with technical challenges. One such challenge...
Traditional supervised machine learning classifiers are challenged to learn highly skewed data distributions as they are designed to expect classes to equally contribute to the minimization of the classifiers cost function. Moreover, the classifiers design expects equal misclassification costs, causing a bias for overrepresented classes. Different...
New technologies applied to transportation services in the city, enable the shift to sustainable transportation modes making bike-sharing systems (BSS) more popular in the urban mobility scenario. This study focuses on understanding the spatiotemporal station and trip activity patterns in the Lisbon BSS, based in 2018 data taken as the baseline, an...
Fraud in public funding can have deleterious consequences for the economic, social, and political well-being of societies. Fraudulent activity associated with public procurement contracts accounts for losses of billions of euros every year. Thus, it is of utmost relevance to explore analytical frameworks that can help public authorities identify ag...
Injuries have become devastating and often under-recognized public health concerns. In Canada, injuries are the leading cause of potential years of life lost before the age of 65. The geographical patterns of injury, however, are evident both over space and time, suggesting the possibility of spatial optimization of policies at the neighborhood sca...
Cities are moving towards new mobility strategies to tackle smart cities’ challenges such as carbon emission reduction, urban transport multimodality and mitigation of pandemic hazards, emphasising on the implementation of shared modes, such as bike-sharing systems. This paper poses a research question and introduces a corresponding systematic lite...
Wealth in the Greater Toronto Area (GTA) continues to grow each year as Toronto's consumer market and population increase. Using a machine learning segmentation based on self-organizing maps, this paper examines the demographics, socioeconomics, and expenditure consumption patterns of the GTA's consumers. The results suggest that SOM may contribute...
Owing to the convenience, reliability and contact-free feature of Mobile payment (M-payment), it has been diffusely adopted in China during the COVID-19 pandemic to reduce the direct and indirect contacts in transactions, allowing social distancing to be maintained and facilitating stabilization of the social economy. This paper aims to comprehensi...
Mobile payment (M-payment), as an emerging financial transaction method has been widely adopted in various contexts. In order to investigate the significance factors and espoused cultural moderators impacting users' M-payment continuance usage intention in China, this study proposes a comprehensive model integrating Unified Theory of Acceptance and...
Food delivery apps (FDAs) as an emerging online-to-offline mobile technology, have been widely adopted by catering businesses and customers. Especially, as they have provided two-way beneficial catering delivery services in rescuing catering enterprises and satisfying customers’ technological and mental exceptions under the COVID-19 global pandemic...
The field of data science has had a significant impact in both academia and industry, and with good reason [...]
The automatic production of land use/land cover maps continues to be a challenging problem, with important impacts on the ability to promote sustainability and good resource management. The ability to build robust automatic classifiers and produce accurate maps can have a significant impact on the way we manage and optimize natural resources. The d...
This paper implements the systematic literature review investigating the factors impacting on mobile payment adoption from user perspective. There are total 58 selected paper been analyzed through proposed five steps systematic literature review process. The results present that culture as an important factor impacting on user adoption intention, m...
Classification of imbalanced datasets is a challenging task for standard algorithms. Although many methods exist to address this problem in different ways, generating artificial data for the minority class is a more general approach compared to algorithmic modifications. SMOTE algorithm, as well as any other oversampling method based on the SMOTE m...
Massive open online courses (MOOCs), contribute significantly to individual empowerment because they can help people learn about a wide range of topics. To realize the full potential of MOOCs, we need to understand their factors of success, here defined as the use, user satisfaction, along the individual and organizational performance resulting fro...
Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versa...
This article presents an analysis of the global digital divide, based on data collected from 45 countries, including the ones belonging to the European Union, OECD, Brazil, Russia, India, and China (BRIC). The analysis shows that one factor can explain a large part of the variation in the seven ICT variables used to measure the digital development...
The interest in using information to improve the quality of living in large urban areas and the efficiency of its governance has been around for decades. Nevertheless, recent developments in information and communications technology have sparked new ideas in academic research, all of which are usually grouped under the umbrella term of Smart Cities...
Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versa...
Learning from imbalanced datasets is challenging for standard algorithms, as they are designed to work with balanced class distributions. Although there are different strategies to tackle this problem, methods that address the problem through the generation of artificial data constitute a more general approach compared to algorithmic modifications....
Classification of imbalanced datasets is a challenging task for standard algorithms. Although many methods exist to address this problem in different ways, generating artificial data for the minority class is a more general approach compared to algorithmic modifications. SMOTE algorithm and its variations generate synthetic samples along a line seg...
Learning from imbalanced datasets is a frequent but challenging task for standard classification algorithms. Although there are different strategies to address this problem, methods that generate artificial data for the minority class constitute a more general approach compared to algorithmic modifications. Standard oversampling methods are variati...
This paper analyzes the digital development of 110 countries and its relationship with economic development. Using factor analysis, we combined seven ICT-related variables into a single measure of digital development. This measure was then used as the dependent variable in an OLS model that allows non-linear effects, with the GDP per capita of coun...
In many remote-sensing projects, one is usually interested in a small number of land-cover classes present in a study area and not in all the land-cover classes that make-up the landscape. Previous studies in supervised classification of satellite images have tackled specific class mapping problem by isolating the classes of interest and combining...
In many remote sensing projects on land cover mapping, the interest is often in a sub-set of classes presented in the study area. Conventional multi-class classification may lead to a considerable training effort and to the underestimation of the classes of interest. On the other hand, one-class classifiers require much less training, but may overe...
E-learning systems are emerging in many settings of our society. Schools, universities, and several other organizations use these systems. E-learning systems allow learning anytime and anywhere. This medium may seem to be the answer to all learning barriers, but the effect of non-cognitive skills on the success of e-learning systems is yet to be ex...
E-learning systems are enablers in the learning process, strengthening their importance as part of the educational strategy. Understanding the determinants of e-learning success is crucial for defining instructional strategies. Several authors have studied e-learning implementation and adoption, and various studies have addressed e-learning success...
This paper addresses the international and internal digital divides that exist across and within the European member states according to the educational attainment of their populations. Our results suggest that even for those European countries that are outperforming their counterparts in terms of digital development, such as Finland, some internal...
E-learning systems have witnessed a usage and research increase in the past decade. This article presents the e-learning concepts ecosystem. It summarizes the various scopes on e-learning studies. Here we propose an e-learning theoretical framework. This theory framework is based upon three principal dimensions: users, technology, and services rela...
This is a book is a collection of articles that will be submitted as full papers to the AGILE annual international conference. These papers go through a rigorous review process and report original and unpublished fundamental scientific research. Those published cover significant research in the domain of geographic information science systems. This...
There is a clear belief among academics and policy makers about the importance of ICT for sustainable development and welfare. Thus, all across the world, a variety of strategies to promote the digital development have been proposed and implemented by national and international authorities. Simultaneously, academics have been dedicating their effor...
E-Learning systems play an important role in our society; they facilitate instructors in the teaching process and also enable learners to access knowledge. Although, e-learning is not the first concept that refers to the use computerized systems in the learning process. This paper describes a bibliometric study. In this paper, we present the e-lear...
Massive open online courses (MOOCs) are black swans. A black swan is an unexpected event that emerges from reality and alters the reality itself. MOOCs have affected supply and demand in higher education. MOOCs distribute knowledge, classes on many areas of expertise for free. Millions of users, from all over the world are enrolling MOOCs courses....
A plethora of national and regional applications need land-cover information covering large areas. Manual classification based on visual interpretation and digital per-pixel classification are the two most commonly applied methods for land-cover mapping over large areas using remote-sensing images, but both present several drawbacks. This paper tes...
Portugal is a country with a high per capita consumption of medical drugs. High levels of medication implies not only risk to the patient but also a strong burden to the National Health Service (Serviço Nacional de Saúde—SNS). Polymedication, according to many authors, is the consumption of at least five different drugs. Polymedication can have ser...
E-learning systems are widely used from academia to industry. The usage of e-learning systems raises new research contexts. Multiple collaborative learning systems were implemented to improve people interaction, communication, working, coordinating activities, socializing and learning. E-learning systems play an significant role in the learning act...
A problem that Portugal is facing, which needs urgent effective health policies, is the socio-economic differences and inequalities that arise in access to health care. In this study we used data from National Health Survey of 2005/2006 to investigate if socio-economic differences are related both to the frequency which health services are used and...
Our research analyses the digital divide within the European Union 27 between the years of 2008 and 2010. To accomplish this we use multivariate statistical methods, more specifically factor and cluster analysis, to address the European digital disparities. Our results lead to an identification of two latent dimensions and five groups of countries....
Clustering constitutes one of the most popular and important tasks in data analysis. This is true for any type of data, and geographic data is no exception. In fact, in geographic knowledge discovery the aim is, more often than not, to explore and let spatial patterns surface rather than develop predictive models. The size and dimensionality of the...
Our research aims to analyze the digital divide within the European Union 27 (EU-27). Hence we used a multivariate approach, more specifically Factor Analysis, to study the digital disparities between European Countries. Two latent dimensions on this subject were found. We also found statistical evidence that one of the dimensions on digital develo...
Not all wildfire ignitions result in burned areas of a similar size. The aim of this study was to explore whether there was
a size-dependent pattern (in terms of resulting burned area) of fire ignitions in Portugal. For that purpose we characterised
71,618 fire ignitions occurring in the country in the period 2001–2003, in terms of population densi...
Clustering constitutes one of the most popular and important tasks in data analysis. The size and dimensionality of the existing geospatial databases stress the need for efficient and robust spatial clustering algorithms. In this paper we present the GeoSOM suite as a spatial clustering tool. GeoSOM suite implements the GeoSOM algorithm, which allo...
Portugal has the highest density of wildfire ignitions among southern European countries. The ability to predict the spatial patterns of ignitions constitutes an important tool for managers, helping to improve the effectiveness of fire prevention, detection and firefighting resources allocation. In this study, we analyzed 127 490 ignitions that occ...
A new methodology is presented that measures density in urban systems. By combining highly detailed height measurements with, amongst others, topographical data we are able to quantify urban volume. This new approach is demonstrated in two separate case studies that relate to the temporal and spatial dimension of the urban environment, respectively...
The large amount of spatial data available today demands the use of data mining tools for its analysis. One of the most used
data mining techniques is clustering. Several methods for spatial clustering exist, but many consider space as just another
variable. We present in this paper a tool particularly suited for spatial clustering: the GeoSOM suit...
This paper presents a simple way to compensate the magnification effect of Self-Organizing Maps (SOM) when creating cartograms using Carto- SOM. It starts with a brief explanation of what a c artogram is, how it can be used, and what sort of metrics can be used to asses s its quality. The methodology for creating a cartogram with a SOM is then pres...
The basic idea of a cartogram is to distort a map. This distortion comes from the substitution of area for some other variable (in most examples population). The objective is to scale each region according to the value it represents for the new variable, while keeping the map recognizable. The use of cartograms is previous to the use of computerize...
The method proposed in this paper supports the UAV network path definition in an autonomously way, taking into consideration the density of the detected events at each moment, in each place. We use the self-organizing maps to detect event patterns in the field of view of the sensors, allowing unmanned aerial vehicles(UAV) path definition based on e...
To deal with the huge volume of information provided by remote sensing satellites, which produce images used for agriculture monitoring, urban planning, deforestation detection and so on, several algorithms for image classification have been proposed in the literature. This article compares two approaches, called Expectation-Maximization (EM) and S...
According to the statistics Portugal has the highest density of wildfire ignitions
among southern European countries. The ability to predict ignition occurrence
constitutes an important tool for managers, helping to improve the effectiveness
of fire prevention, detection and fire fighting resources allocation. In this study
we used a database with...