Inês Dutra

Inês Dutra
University of Porto | UP · Departamento de Ciência de Computadores

Ph.D. in CS (Bristol Univ, UK)

About

142
Publications
8,369
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
691
Citations
Introduction
My main research topic is logic programming. Based on logic programming, I've been working with: - knowledge representation of medical data - alerts for health systems - (statistical) relational learning: - inductive logic programming (learning first-order rules) - probabilistic inductive logic programming - explainable AI - parallelization of (probabilistic inductive) logic programming systems (for multi-cores and GPGPUs) Besides logic programming, I've been also involved in the development of ExpertBayes, a Bayesian network system that interacts with the user, and learns new networks from manually built networks, and its parallelization. Main applications: breast cancer diagnosis, heart pathology detection, bipolar disorder characterization, personalized diabetes monitoring.

Publications

Publications (142)
Article
Full-text available
One of the areas with the potential to be explored in quantum computing (QC) is machine learning (ML), giving rise to quantum machine learning (QML). In an era when there is so much data, ML may benefit from either speed, complexity or smaller amounts of storage. In this work, we explore a quantum approach to a machine learning problem. Based on th...
Article
Full-text available
Breast cancer is currently one of the main causes of death and tumoral diseases in women. Even if early diagnosis processes have evolved in the last years thanks to the popularization of mammogram tests, nowadays, it is still a challenge to have available reliable diagnosis systems that are exempt of variability in their interpretation. To this end...
Article
Full-text available
This work presents the mapping of the traveling salesperson problem (TSP) based in pseudo-Boolean constraints to a graph of the D-Wave Systems Inc. We first formulate the problem as a set of constraints represented in propositional logic and then resort to the SATyrus approach to convert the set of constraints to an energy minimization problem. Nex...
Article
Full-text available
Probabilistic inductive logic programming (PILP) is a statistical relational learning technique which extends inductive logic programming by considering probabilistic data. The ability to use probabilities to represent uncertainty comes at the cost of an exponential evaluation time when composing theories to model the given problem. For this reason...
Article
We implement a quantum binary classifier where given a dataset of pairs of training inputs and target outputs our goal is to predict the output of a new input. The script is based in a hybrid scheme inspired in an existing PennyLane's variational classifier and to encode the classical data we resort to PennyLane's amplitude encoding embedding templ...
Conference Paper
This work discusses a strategy named Map, Optimize and Learn (MOL) which analyzes how to change the representation of samples of a 2D dataset to generate useful patterns for classification tasks using Convolutional Neural Networks (CNN) architectures. The strategy is applied to a real-world scenario of children and teenagers with cardiac pathology...
Chapter
This paper presents an effort to timely handle 400+ GBytes of sensor data in order to produce Predictive Maintenance (PdM) models. We follow a data-driven methodology, using state-of-the-art python libraries, such as Dask and Modin, which can handle big data. We use Dynamic Time Warping for sensors behavior description, an anomaly detection method...
Chapter
Full-text available
Bipolar Disorder (BD) is chronic and severe psychiatric ill-ness presenting with mood alterations, including manic, hypomanic anddepressive episodes. Due to the high clinical heterogeneity and lack ofbiological validation, both BD treatment and diagnostic are still prob-lematic. Patients and clinicians would benefit from better clinical andbiologic...
Article
Full-text available
Quantum annealing provides a method to solve combinatorial optimization problems in complex energy landscapes by exploiting thermal fluctuations that exist in a physical system. This work introduces the mapping of a graph coloring problem based on pseudo-Boolean constraints to a working graph of the D-Wave Systems Inc. We start from the problem for...
Article
Background: The clinical decision-making process in pressure ulcer management is complex, and its quality depends on both the nurse's experience and the availability of scientific knowledge. This process should follow evidence-based practices incorporating health information technologies to assist health care professionals, such as the use of clin...
Preprint
BACKGROUND The clinical decision-making process in pressure ulcer management is complex, and its quality depends on both the nurse's experience and the availability of scientific knowledge. This process should follow evidence-based practices incorporating health information technologies to assist health care professionals, such as the use of clinic...
Chapter
Diabetes type I is a chronic disease that requires strict supervision. MyDiabetes is a utility application for diabetic users. This application served as basis to develop a logical unit, composed of logical rules, translated from medical protocols and guidelines, to advise the user. The data in the application is a source of knowledge about the use...
Preprint
Full-text available
Motivation: Traditional computational cluster schedulers are based on user inputs and run time needs request for memory and CPU, not IO. Heavily IO bound task run times, like ones seen in many big data and bioinformatics problems, are dependent on the IO subsystems scheduling and are problematic for cluster resource scheduling. The problematic resc...
Preprint
Quantum computers are different from binary digital electronic computers based on transistors. Common digital computing encodes the data into binary digits (bits), each of which is always in one of two definite states (0 or 1), quantum computation uses quantum bits (qubits). A circuit-based qubit quantum computer exists and is available for experim...
Conference Paper
Full-text available
We explore the feasibility of a database storage engine housing up to 307 billion genetic Single Nucleotide Polymorphisms (SNP) for online access. We evaluate database storage engines and implement a solution utilizing factors such as dataset size, information gain, cost and hardware constraints. Our solution provides a full feature functional mode...
Poster
We performed experiments with Markov chains by modeling them into a quantum algorithm capable of being simulated on a quantum computer. We used the Google Quantum Computing Playground, a GPU-accelerated quantum computer with a 3D quantum state visualization, a browser-based WebGL Chrome using the language QScript.
Conference Paper
Probabilistic Inductive Logic Programming (PILP) systems extend ILP by allowing the world to be represented using probabilistic facts and rules, and by learning probabilistic theories that can be used to make predictions. However, such systems can be inefficient both due to the large search space inherited from the ILP algorithm and to the probabil...
Data
Social contracts about cars and computersNaming is a hard problem in scienceCommon naming problems in programming and modelingBlacklisting confusing keywords in simulations of biologyUniquified names by versioning or by hashingPerspectives on naming from the humanitiesOnline referencesMini survey on improving namesNaming forms: debugging tools for...
Book
This book constitutes the thoroughly refereed post-conference proceedings of the 12fth International Conference on High Performance Computing in Computational Science, VECPAR 2016, held in Porto, Portugal, in June 2016. The 20 full papers presented were carefully reviewed and selected from 36 submissions. The papers are organized in topical section...
Article
Full-text available
Names in programming are vital for understanding the meaning of code and big data. We define code2brain (C2B) interfaces as maps in compilers and brains between meaning and naming syntax, which help to understand executable code. While working toward an Evolvix syntax for general-purpose programming that makes accurate modeling easy for biologists,...
Conference Paper
Markov Logic is an expressive and widely used knowledge representation formalism that combines logic and probabilities, providing a powerful framework for inference and learning tasks. Most Markov Logic implementations perform inference by transforming the logic representation into a set of weighted propositional formulae that encode a Markov netwo...
Conference Paper
Statistical data analysis methods are well known for their difficulty in handling large number of instances or large number of parameters. This is most noticeable in the presence of "big data", i.e., of data that are heterogeneous, and come from several sources, which makes their volume increase very rapidly. In this paper, we study popular and wel...
Article
Full-text available
We describe GPU implementations of the matrix recommender algorithms CCD++ and ALS. We compare the processing time and predictive ability of the GPU implementations with existing multi-core versions of the same algorithms. Results on the GPU are better than the results of the multi-core versions (maximum speedup of 14.8).
Article
The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a dataset consisting of 348 consecutive breast masses that underwent image guided core biopsy performed between October 2005 and December 2007 on 328 female subjects. W...
Article
While the use of machine learning methods in clinical decision support has great potential for improving patient care, acquiring standardized, complete, and sufficient training data presents a major challenge for methods relying exclusively on machine learning techniques. Domain experts possess knowledge that can address these challenges and guide...
Article
Bipolar Disorder (BD) is a chronic and disabling disease that usually appears around 20 to 30 years old. Patients who suffer with BD may struggle for years to achieve a correct diagnosis, and only 50% of them generally receive adequate treatment. In this work we apply a machine learning technique called Inductive Logic Programming (ILP) in order to...
Article
Full-text available
Probabilistic Inductive Logic Programming (PILP) is a rel- atively unexplored area of Statistical Relational Learning which extends classic Inductive Logic Programming (ILP). This work introduces SkILL, a Stochastic Inductive Logic Learner, which takes probabilistic annotated data and produces First Order Logic theories. Data in several domains suc...
Article
Full-text available
Relational learning algorithms mine complex databases for interesting patterns. Usually, the search space of patterns grows very quickly with the increase in data size, making it impractical to solve important problems. In this work we present the design of a relational learning system, that takes advantage of graphics processing units (GPUs) to pe...
Article
Interest in the Map Reduce programming model has been rekindled by Google in the past 10 years; its popularity is mostly due to the convenient abstraction for parallelization details this framework provides. State-of-the-art systems such as Google's, Hadoop or SAGA often provide added features like a distributed file system, fault tolerance mechani...
Book
In the past two decades, grid computing have fostered advances in several scientific domains by making resources available to a wide community and bridging scientific gaps. Grid infrastructures have been harnessing computational resources all around the world allowing all kinds of parallelisms to be explored. Other approaches to parallel and distri...
Chapter
The development and use of computerized decision-support systems in the domain of breast cancer has the potential to facilitate the early detection of disease as well as spare healthy women unnecessary interventions. Despite encouraging trends, there is much room for improvement in the capabilities of such systems to further alleviate the burden of...
Conference Paper
This work describes a methodology and strategies to allow better execution performance of grid applications that make use of files located outside of the grid. Our methodology consists of two phases. The worst phase, which is static, ranks machines in the grid according to file transfer sizes. The second phase uses this information to choose good m...
Article
Full-text available
Bayesian network structures are usually built using only the data and starting from an empty network or from a naive Bayes structure. Very often, in some domains, like medicine, a prior structure knowledge is already known. This structure can be automatically or manually refined in search for better performance models. In this work, we take Bayesia...
Book
This book constitutes the refereed proceedings of the 20th International Conference on Parallel and Distributed Computing, Euro-Par 2014, held in Porto, Portugal, in August 2014. The 68 revised full papers presented were carefully reviewed and selected from 267 submissions. The papers are organized in 15 topical sections: support tools environments...
Conference Paper
When mammography reveals a suspicious finding, a core needle biopsy is usually recommended. In 5% to 15% of these cases, the biopsy diagnosis is non-definitive and a more invasive surgical excisional biopsy is recommended to confirm a diagnosis. The majority of these cases will ultimately be proven benign. The use of excisional biopsy for diagnosis...
Conference Paper
Map-Reduce is a programming model that has its roots in early functional programming. In addition to producing short and elegant code for problems involving lists or collections, this model has proven very useful for large-scale highly parallel data processing. In this work, we present the design and implementation of a high-level parallel construc...
Conference Paper
We present the design and evaluation of a Datalog engine for execution in Graphics Processing Units (GPUs). The engine evaluates recursive and non-recursive Datalog queries using a bottom-up approach based on typical relational operators. It includes a memory management scheme that automatically swaps data between memory in the host platform (a mul...
Article
Full-text available
One of the main areas of research in logic programming is the design and implementation of sequential and parallel (constraint) logic programming systems. This research goes broadly from the design and specification of novel implementation technology to its actual evaluation in real life situations. A series of workshops on Implementations of Logic...
Conference Paper
We evaluated a population of 7199 children between 2 and 19 years old to study the relations between the observed demographic and physiological features in the occurrence of a pathological/non-pathological heart condition. The data was collected at the Real Hospital Português, Pernambuco, Brazil. We performed a feature importance study, with the ai...
Article
Full-text available
In this work we build the first BI-RADS parser for Portuguese free texts, modeled after existing approaches to extract BI-RADS features from English medical records. Our concept finder uses a semantic grammar based on the BIRADS lexicon and on iterative transferred expert knowledge. We compare the performance of our algorithm to manual annotation b...
Conference Paper
Grids are infrastructures that allow resources to be widely and wisely used around the world. With the ever increasing interest in grid computing, many applications have been developed that benefit from using hundreds or even thousands of resources made available by the grid infrastructures worldwide. Areas such as medicine, biology, astronomy, phy...
Data
Full-text available
The DigiScope project aims at developing a digitally en-hanced stethoscope capable of using state of the art technol-ogy in order to help physicians in their daily medical rou-tine. One of the main tasks of DigiScope is to build a repos-itory of auscultations (sound and medical related data). In this work, we present a preliminary analysis and stud...
Conference Paper
In this work we perform a detailed study of different or-scheduling strategies varying several parameters in two or-parallel systems, YapOr and ThOr, running on multi-core machines. Our results show that some kinds of applications are sensitive to the choice of scheduling strategy adopted. In particular, the choice of scheduling parameters mostly a...
Article
Breast screening is the regular examination of a woman's breasts to find breast cancer earlier. The sole exam approved for this purpose is mammography. Usually, findings are annotated through the Breast Imaging Reporting and Data System (BIRADS) created by the American College of Radiology. The BIRADS system determines a standard lexicon to be used...
Article
Full-text available
Digital stethoscopes are medical devices that can collect, store and sometimes transmit acoustic auscultation signals in a digital format. These can then be replayed, sent to a colleague for a second opinion, studied in detail after an auscultation, used for training or, as we envision it, can be used as a cheap powerful tool for screening cardiac...
Article
Full-text available
Most machine learning tools work with a single table where each row is an instance and each column is an attribute. Each cell of the table contains an attribute value for an instance. This representation prevents one important form of learning, which is, classification based on groups of correlated records, such as multiple exams of a single patien...
Conference Paper
Full-text available
Breast screening is the regular examination of a woman's breasts to find breast cancer in an initial stage. The sole exam approved for this purpose is mammography that, despite the existence of more advanced technologies, is considered the cheapest and most efficient method to detect cancer in a preclinical stage. We investigate, using machine lear...
Article
Traditional machine learning systems learn from non-relational data but in fact most of the real world data is relational. Normally the learning task is done using a single flat file, which prevents the discovery of effective relations among records. Inductive logic programming and statistical relational learning partially solve this problem. In th...
Article
Full-text available
In this work we show that combining physician rules and machine learned rules may improve the performance of a classifier that predicts whether a breast cancer is missed on percutaneous, image-guided breast core needle biopsy (subsequently referred to as "breast core biopsy"). Specifically, we show how advice in the form of logical rules, derived b...
Article
Full-text available
One of the main advantages of Logic Programming (LP) is that it provides an excellent framework for the parallel execution of programs. In this work we investigate novel techniques to efficiently exploit parallelism from real-world applications in low cost multi-core architectures. To achieve these goals, we revive and redesign the YapOr system to...
Article
The EELA (E-Infrastructure shared between Europe and Latin America) and EELA-2 (E-science grid facility for Europe and Latin America) projects, co-funded by the European Commission under FP6 and FP7, respectively, have been successful in building a high capacity, production-quality, scalable Grid Facility for a wide spectrum of applications (e.g. E...
Article
Full-text available
The EELA-2 (E-science grid facility for Europe and Latin America) project, cofounded by the European Commission, aims at consolidating the infrastructure started by EELA (E-infrastructure shared between Europe and Latin America) for the development of e-science between Europe and Latin America by building a bridge between grid computing initiatives...
Article
Full-text available
El proyecto EELA-2 (E-science grid facility for Europe and Latin America), cofinanciado por la Comisión Europea, busca consolidar la infraestructura iniciada por EELA (E-infrastructure shared between Europe and Latin America) para el desarrollo de la e-ciencia entre Europa y Latinoamérica a través del afianzamiento de iniciativas de computación en...
Article
Grid environments are dynamic and heterogeneous by nature, therefore requiring adaptive scheduling strategies. Reinforcement learning is an interesting and simple adaptive approach that may work well in actual grid environments. In this work, we employ reinforcement learning to classify available resources in a grid environment, giving support to t...
Conference Paper
In this work, we study the behaviour of different resource scheduling strategies when doing job orchestration in grid environments. We empirically demonstrate that scheduling strategies based on reinforcement learning are a good choice to improve the overall performance of grid applications and resource utilization.
Conference Paper
Aprendizado por reforço é uma técnica simples que possui aplicação em várias áreas. Um ambiente real de grid, em geral dinâmico e heterogêneo, oferece um ambiente interessante para sua aplicação. Neste trabalho, utilizamos esta técnica para classificar os nós disponíveis em um grid, dando suporte assim a dois algoritmos de escalonamento, AG e MQD....
Conference Paper
In this paper we present the architecture for the Personal Autonomic Desktop Manager, a self managing application designed to act on behalf of the user in several aspects: protection, healing, optimization and configuration. The overall goal of this research is to improve the correlation of the autonomic self* properties and doing so also enhance t...
Article
Full-text available
1. Abstract Multiagent systems (MAS) have been used to solve classes of problems for which one has limited expertise to propose a feasible solution. Usually, these problems are intrinsically distributed, very complex and/or involve extensive computations. We are particularly interested in multiagents systems supporting Computer-Supported Cooperativ...
Conference Paper
Full-text available
Speedup in distributed executions of Constraint Logic Programming (CLP) applications are directed related to a good constraint partitioning algorithm. In this work we study different mechanisms to distribute constraints to processors based on straightforward mechanisms such as Round-Robin and Block distribution, and on a more sophisticated automati...