About
64
Publications
8,568
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
756
Citations
Citations since 2017
Introduction
Publications
Publications (64)
During an epidemic, decision-makers in public health need accurate predictions of the future case numbers, in order to control the spread of new cases and allow efficient resource planning for hospital needs and capacities. In particular, considering that infectious diseases are spread through human-human transmissions, the analysis of spatio-tempo...
The detection of city hotspots from geo-referenced urban data is a valuable knowledge support for planners, scientists, and policymakers. However, the application of classic density-based clustering algorithms on multi-density data can produce inaccurate results. Since metropolitan cities are heavily characterized by variable densities, multi-densi...
The growth of data volume collected in urban contexts opens up to their exploitation for improving citizens’ quality-of-life and city management issues, like resource planning (water, electricity), traffic, air and water quality, public policy and public safety services. Moreover, due to the large-scale diffusion of GPS and scanning devices, most o...
This paper copes with the issue of extracting mobility patterns in a urban computing scenario. The computation is parallelized by partitioning the territory into a number of regions. In each region a computing node collects data from a set of local sensors, analyzes the data and coordinates with neighbor regions to extract the mobility patterns. We...
The success of Cloud Computing and the resulting ever growing of large data centers is causing a huge rise in electrical power consumption by hardware facilities and cooling systems. This results in an increment of operational costs of data centres, that is becoming a crucial issue to deal with. Consolidation of virtual machines (VM) is one of the...
The increasing pervasiveness of mobile devices favors the collection of large amounts of movement data that can be analyzed to extract knowledge, i.e. patterns, rules and regularities, from user trajectories. In this paper we present TPM, an integrated algorithm which supports the overall trajectory pattern discovery process for detecting user’s mo...
The widespread use of social media platforms allows scientists to collect huge amount of data posted by people interested in a given topic or event. This data can be analyzed to infer patterns and trends about people behaviors related to a topic or an event on a very large scale. Social media posts are often tagged with geographical coordinates or...
Recent research efforts in the field of urban computing aim to develop innovative services for citizens through the application of ubiquitous and pervasive computing paradigms in urban spaces. Smart city applications need to cope with a large number of involved users and devices. Since data and objects are strictly related to the territory on which...
Smart city and Internet of Things applications can benefit from the use of distributed computing architectures, due to the large number and pronounced territorial dispersion of the involved users and devices. In this context, a natural method to parallelize the computation is to consider the territory as partitioned into regions, e.g., city neighbo...
Sensor networks are an important technology for large-scale monitoring, that allow the collection of environmental measurement streaming data in remote areas. Such data constitute a valuable source of information to be exploited for better understanding natural phenomena. Moreover, in some cases streams of data must be analyzed in real time to prov...
The opportunity of using Cloud resources on a pay-as-you-go basis and the availability of powerful data centers and high bandwidth connections are speeding up the success and popularity of Cloud systems, which is making on-demand computing a common practice for enterprises and scientific communities. The reasons for this success include natural bus...
The analysis of very large data sources requires scalable systems to reduce execution time and make the Big Data paradigm viable. Cloud infrastructures can be effectively used as scalable computing and storage platforms for implementing high-performance data analysis services. Nubytics is a Software-as-a-Service (SaaS) system that exploits Cloud fa...
This paper presents a framework for analyzing and predicting the performances of a business process, based on historical data gathered during its past enactments. The framework hinges on an inductive-learning technique for
discovering a special kind of predictive process models, which can support the run-time prediction of a given performance measu...
Social media posts are often tagged with geographical coordinates or other information that allows identifying user positions, this way enabling mobility pattern analysis using trajectory mining techniques. This paper presents a methodology and discusses results of a study aimed at discovering behavior and mobility patterns of Instagram users who v...
The increasing pervasiveness of mobile devices along with the use of technologies like GPS, Wifi networks, RFID, and sensors, allows for the collections of large amounts of movement data. This amount of data can be analyzed to extract descriptive and predictive models that can be properly exploited to improve urban life. From a technological viewpo...
Mining data streams (DSs) is a very important research topic and has recently attracted a lot of attention, because in many cases data is generated by external sources so rapidly that it may become impossible to store it and analyze it offline. This chapter elaborates a distributed architecture for mining DSs generated from multiple and heterogeneo...
The chapter presents a Cloud-based framework that can be tailored to be used in different scenarios of urban planning and management occuring in Smart Cities. The focus is on the management of large-scale socio-geographic data obtained through the trajectories traced by smart objects. Our goal is to mine human activities and routines from this soci...
The increasing pervasiveness of mobile devices along with the use of technologies like GPS, Wifi networks, RFID, etc., allows for the collections of large amounts of movement data. This amount of information can be analyzed to extract descriptive and predictive models that can be profitable exploited to improve urban life. This paper presents an in...
In this paper we present a Cloud-based framework for urban computing that can be tailored to be used in different scenarios of urban planning and management that can occur in smart cities. The focus in the paper is on the management of large-scale socio-geographic data obtained through the trajectories followed by mobile devices. Our goal is to min...
In several scientific and business domains, very large data repositories are generated. To find interesting and useful information in those repositories, efficient data mining techniques and knowledge discovery processes must be used. The exploitation of data mining techniques in science helps scientists in hypothesis formation and gives them a sup...
Recently, the usage of innovative decision support systems (DSSs) for monitoring the subject's health status in the daily living is becoming a common practice to provide real aid in chronic patients' management. Rule-based implementations of such DSSs simulate the decision-making process described in clinical guidelines by also allowing new/existin...
Distributed data mining implements techniques for analyzing data on distributed computing systems by exploiting data distribution and parallel algorithms. The grid is a computing infrastructure for implementing distributed high-performance applications and solving complex problems, offering effective support to the implementation and use of data mi...
Mining@Home was recently designed as a distributed architecture for running data mining applications according to the “volunteer computing” paradigm. Mining@Home already proved its efficiency and scalability when used for the discovery of frequent itemsets from a transactional database. However, it can also be adopted in several different scenarios...
The empowerment of medical practices, in terms of consistency, effectiveness and efficiency, is being more and more pushed by the encoding of clinical practice guidelines in advanced decision support systems (DSSs). A prerequisite for their wide application is the guarantee of a high level of maintainability and reliability, with respect to differe...
Fault tolerance is an important issue in service oriented architectures like Grid and Cloud systems, where many and heterogeneous machines are used. In this paper we present a flexible failure handling framework which extends a service-oriented architecture for Distributed Data Mining previously proposed, addressing the requirements for handling fa...
The formalization and manipulation of complex and not yet assessed rules by clinicians are critical for Decision Support Systems (DSSs) performance in supporting remote monitoring of chronic patients. Sometimes, structural anomalies, such as inconsistency and redundancy, can occur. This work presents a novel system, named Consistency Checker, aimed...
Data mining tasks are often composed by multiple stages that may be linked to each other to form various execution flows. Moreover, data mining tasks are often distributed since they involve data and tools located over geographically distributed environments, like the Grid. Therefore, it is fundamental to exploit effective formalisms, such as workf...
This paper presents the design and the implementation of an architecture for the analysis of data streams in distributed environments. In particular, data stream analysis has been carried out for the computation of items and item sets that exceed a frequency threshold. The mining approach is hybrid, that is, frequent items are calculated with a sin...
Fault tolerance is an important issue in Grid computing, where many and heterogenous machines are used. In this paper we present a flexible failure handling framework which extends a service-oriented architecture for Distributed Data Mining previously proposed, addressing the requirements for fault tolerance in the Grid. The framework allows users...
In several scientific domains large data repositories are generated. To find interesting and useful information in those repositories, efficient data mining techniques must be used. Many scientific fields, such as astronomy, biology, medicine, chemistry and earth science, get advantages from data mining analysis. The exploitation of data mining tec...
The public resource computing paradigm is often used as a successful and low cost mechanism for the management of several
classes of scientific and commercial applications that require the execution of a large number of independent tasks. Public
computing frameworks, also known as “Desktop Grids”, exploit the computational power and storage facilit...
IntroductionApproachKnowledge Grid servicesData analysis servicesDesign of Knowledge Grid applicationsConclusions
References
SUMMARY In today's Grids, files are usually managed by Grid data management systems that are superimposed on existing file and storage systems. In this position paper, we analyze this predominant approach and argue that object-based file systems can be an alternative when adapted to the characteristics of a Grid environment. We describe how we are...
This paper describes how distributed data mining models, such as collective learning, ensemble learning, and meta-learning models, can be implemented as WSRF mining services by exploiting the Grid infrastructure. Our goal is to design a general distributed architectural model that can be exploited for different distributed mining algorithms deploye...
A novel approach for reconciling tuples stored as free text into an existing attribute schema is proposed. The basic idea
is to subject the available text to progressive classification, i.e., a multi-stage classification scheme where, at each intermediate stage, a classifier is learnt that analyzes the textual
fragments not reconciled at the end of...
A biclustering algorithm, based on a greedy technique and enriched with a local search strategy to escape poor local minima, is proposed. The algorithm starts with an initial random solution and searches for a locally optimal solution by successive transformations that improve a gain function. The gain function combines the mean squared residue, th...
Striping is a technique that distributes file content over multiple storage servers and thereby enables parallel ac- cess. In order to be able to provide a consistent view across file data and metadata operations, the file system has to track the layout of the file and know where the file ends and where it contains gaps. In this paper, we present a...
A parameter-free, fully-automatic approach to clustering high-dimensional categorical data is proposed. The technique is based on a two-phase iterative procedure, which attempts to improve the overall quality of the whole partition. In the first phase, cluster assignments are given, and a new cluster is added to the partition by identifying and spl...
Data mining often is a compute intensive and time requiring process. For this reason, several data mining systems have been
implemented on parallel computing platforms to achieve high performance in the analysis of large data sets. Moreover, when
large data repositories are coupled with geographical distribution of data, users and systems, more sop...
A co-clustering algorithm for large sparse binary data matrices, based on a greedy technique and enriched with a local search strategy to escape poor local maxima, is proposed. The algorithm starts with an initial random solution and searches for a locally optimal solution by successive transformations that improve a quality function which combines...
A novel approach for reconciling tuples stored as free text into an existing attribute schema is proposed. The basic idea is to subject the available text to progressive classification, i.e., a multi-stage classification scheme where, at each intermediate stage, a classifier is learnt that analyzes the textual fragments not reconciled at the end of...
We propose an incremental algorithm for clustering duplicate tuples in large databases, which allows to assign any new tuple t to the cluster containing the database tuples which are most similar to t (and hence are likely to refer to the same real-world entity t is associated with). The core of the approach is a hash-based indexing technique that...
We present a novel personalization engine that provides individualized access to Web contents/services by means of data mining
techniques. It associates adaptive content delivery and navigation support with form filling, a functionality that catches
the typical interaction of a user with a Web service, in order to automatically fill in its form fi...
Grid environments were originally designed for dealing with problems involving compute-intensive applications. Today, however, grids enlarged their horizon as they are going to manage large amounts of data and run business ap- plications supporting consumers and end users. To face these new challenges, grids must support adaptive data man- agement...
In today's Grids, les are usually managed by Grid data management systems that are superimposed on existing le and storage systems. In this position paper, we analyze this predominant approach and argue that object-based le systems can be an alternative when adapted to the characteristics of a Grid environment. We describe how we are solving the ch...