Philippe Fournier Viger

Philippe Fournier Viger
Shenzhen University · College of Computer Science and Software Engineering

Ph.D. in Computer Science

About

387
Publications
121,484
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,111
Citations
Introduction
Philippe Fournier-Viger, Ph.D. is a Distinguished Professor. His interests are data mining, algorithm design, pattern mining, sequence mining, big data, and applications. He is the founder of the popular SPMF data mining library, offering more than 230 algorithms, cited in more than 1000 research papers since 2010. He has also participated in more than 340 research papers, which have received more than 9000 citations (as of 2022/1). http://www.philippe-fournier-viger.com
Additional affiliations
November 2015 - present
Harbin Institute of Technology Shenzhen Graduate School
Position
  • Professor (Full)
Education
January 2006 - August 2010

Publications

Publications (387)
Article
Full-text available
Itemset mining is an important subfield of data mining, which consists of discovering interesting and useful patterns in transaction databases. The traditional task of frequent itemset mining is to discover groups of items (itemsets) that appear frequently together in transactions made by customers. Although itemset mining was designed for market b...
Conference Paper
Full-text available
High-utility itemset mining (HUIM) is an important data mining task with wide applications. In this paper, we propose a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discovers high-utility itemsets both in terms of execution time and memory. EFIM relies on two upper-bounds...
Article
Full-text available
Discovering unexpected and useful patterns in databases is a fundamental data mining task. In recent years, a trend in data mining has been to design algorithms for discovering patterns in sequential data. One of the most popular data mining tasks on sequences is sequential pattern mining. It consists of discovering interesting subsequences in a se...
Conference Paper
Full-text available
High-utility pattern mining is an important data mining task having wide applications. It consists of discovering patterns generating a high profit in databases. Recently, the task of high-utility sequential pattern mining has emerged to discover patterns generating a high profit in sequences of customer transactions. However, a well-known limitati...
Conference Paper
Full-text available
High-utility itemset mining is the task of discovering high-utility itemsets, i.e. sets of items that yield a high profit in a customer transaction database. High-utility itemsets are useful, as they provide information about profitable sets of items bought by customers to retail store managers, which can then use this information to take strategic...
Preprint
Discovering frequent trends in time series is a critical task in data mining. Recently, order-preserving matching was proposed to find all occurrences of a pattern in a time series, where the pattern is a relative order (regarded as a trend) and an occurrence is a sub-time series whose relative order coincides with the pattern. Inspired by the orde...
Article
Closed high utility itemsets (CHUIs) and maximal high utility itemsets (MaxHUIs) are two important concise representations of HUIs. Discovering these itemsets is important because they are lossless and compact, i.e., they provide a concise summary of all HUIs that can be orders of magnitude smaller. In addition, it can be more efficient to extract...
Chapter
Traffic flow clustering is a common task to analyze urban traffic using GPS data of urban vehicles. Existing density-based traffic flow clustering methods generally have two important problems, that is to not consider the characteristics of urban roads and not handle well different sizes of urban areas. In this paper, we propose a novel method, cal...
Chapter
Association rule mining is a popular data mining task for finding relationships between values from the itemsets that co-occur frequently in a transactional database. Association rule mining has many applications but the “support-confidence” framework it depends on is inadequate for many cases. In recent years, a generalised task called high utilit...
Article
Full-text available
Privacy Preserving Data Mining (PPDM) is an important research area in data mining, which aims at protecting the privacy during the data mining process so that personal data and sensitive information is not revealed to unauthorized persons. PPDM is a critical task as data often contain sensitive information about individuals that can easily comprom...
Article
Graphs, also known as networks, are an expressive data representation used in many domains. Numerous algorithms have been designed to find interesting patterns in graphs. However, they have at least one of two major drawbacks. First, most algorithms focus on finding complex subgraphs but ignore relationships between attributes. But attributes play...
Article
Full-text available
This paper proposes a sequential rules-based recommendation system, called STS-Rec. It addresses the main drawbacks of sequential patterns mining approaches for POI (Point of interest) recommendation by considering both temporal and social influences to perform short-term recommendations. STS-Rec first transforms mobility data into location sequenc...
Conference Paper
User guided proof development in interactive theorem proving is a manual and time consuming activity. For automating proof searching and optimization in a higher-order logic proof assistant, we provide two metaheuristic algorithms that are based on Fitness Dependent Optimizer (FDO) and Bat Algorithm (BA). In both metaheuristic algorithms, random pr...
Article
Full-text available
HUIM has been an important issue in recent years, particularly in basket-market analysis, since it identifies useful information or goods for decision-making. Numerous research focused on extracting high-utility itemsets from datasets, revealing a tremendous amount of pattern information. This approach is incapable of providing correct choices in a...
Article
High average-utility itemset mining consists of analyzing a quantitative customer transactional database to identify high average-utility itemsets (HAUIs), that is sets of items that have a high average utility (e.g. profit). Although important information about customers’ habits can be revealed by HAUIs, they can expose sensitive information. To a...
Article
Significant place mining in spatiotemporal trajectory data is a key task for mobile pattern mining, useful for supporting location-aware services. State-of-the-art trajectory clustering algorithms utilize a density-based distance measure. However, some major problems with this approach are that (1) results are often inaccurate, especially on data o...
Article
Nonoverlapping sequential pattern mining is an important type of sequential pattern mining (SPM) with gap constraints, which not only can reveal interesting patterns to users but also can effectively reduce the search space using the Apriori (anti-monotonicity) property. However, the existing algorithms do not focus on attributes of interest to use...
Preprint
One obvious defect of Extreme Learning Machine (ELM) is that the prediction performance of ELM is sensitive to the random initialization of input-layer weights and hidden-layer biases. GPRELM integrating Gaussian Process Regression (GPR) into ELM is a newly-proposed, simple and effective strategy to make ELM insensitive to the random initialization...
Preprint
For applied intelligence, utility-driven pattern discovery algorithms can identify insightful and useful patterns in databases. However, in these techniques for pattern discovery, the number of patterns can be huge, and the user is often only interested in a few of those patterns. Hence, targeted high-utility itemset mining has emerged as a key res...
Article
Full-text available
In this paper, an Observation Points Classifier Ensemble (OPCE) algorithm is proposed to deal with High‐Dimensional Imbalanced Classification (HDIC) problems based on data processed using the Multi‐Dimensional Scaling (MDS) feature extraction technique. First, dimensionality of the original imbalanced data is reduced using MDS so that distances bet...
Article
Full-text available
Discovering high utility sequences in a quantitative database is a popular data mining task. The goal is to enumerate all sequences of items (symbols) that have a high value for the user, as measured by a utility function. A representative application of high utility sequence mining is the identification of profitable sequences of purchases in tran...
Article
High occupancy pattern mining has been recently studied as an improved method for frequent pattern mining. It considers the proportion of each pattern in the transactions where the pattern occurred. The results of high occupancy pattern mining can be employed for automated control systems in order to make decisions. Meanwhile, the features of the d...
Article
This article presents a Bayesian attribute bagging-based extreme learning machine (BAB-ELM) to handle high-dimensional classification and regression problems. First, the decision-making degree (DMD) of a condition attribute is calculated based on the Bayesian decision theory, i.e., the conditional probability of the condition attribute given the de...
Preprint
Graphs are a popular data type found in many domains. Numerous techniques have been proposed to find interesting patterns in graphs to help understand the data and support decision-making. However, there are generally two limitations that hinder their practical use: (1) they have multiple parameters that are hard to set but greatly influence result...
Article
Full-text available
Discovering periodic patterns consists of identifying all sets of items (values) that periodically co-occur in a discrete sequence. Although traditional periodic pattern mining algorithms have multiple applications, they have two key limitations. First, they consider that a pattern is not periodic if the time difference between two of its successiv...
Article
Full-text available
High utility sequence mining is a popular data mining task, which aims at finding sequences having a high utility (importance) in a quantitative sequence database. Though it has several applications, state-of-the-art algorithms have one or more of the following limitations: (1) they rely on a utility function that tends to be biased toward finding...
Article
Full-text available
Repetitive sequential pattern mining (SPM) with gap constraints is a data analysis task that consists of identifying patterns (subsequences) appearing many times in a discrete sequence of symbols or events. By using gap constraints, the user can filter many meaningless patterns, and focus on those that are the most interesting for his needs. Howeve...
Article
Full-text available
Typhoons are one of the most destructive types of disasters. Several statistical models have been designed to predict their paths to reduce damage, casualties, and economic loss. To further increase prediction accuracy, two key challenges are (1) to extract better nonlinear 3D features of typhoons, which is hard due to their complex high-dimensiona...
Article
Full-text available
Malware poses a serious threat to the computers of individuals, enterprises and other organizations. In the Windows operating system (OS), Application Programming Interface (API) calls are an attractive and distinguishable feature for malware analysis and detection as they can properly reflect the actions of portable executable (PE) files. In this...
Article
HUIM (High utility itemset mining) is a key problem in data mining. The goal is to find itemsets having a high importance or profit in a database, to identify useful knowledge that can support decision-making. In recent years, many HUIM algorithms have been put forward. Among them, utility-list-based algorithms have become very popular as they are...
Article
Full-text available
Online learning is playing an increasingly important role in education. Massive open online course (MOOC) platforms are among the most important tools in online learning, and record historical learning data from an extremely large number of learners. To enhance the learning experience, a promising approach is to apply sequential pattern mining (SPM...
Article
Full-text available
High utility itemset mining (HUIM) is the task of finding all items set, purchased together, that generate a high profit in a transaction database. In the past, several algorithms have been developed to mine high utility itemsets (HUIs). However, most of them cannot properly handle the exponential search space while finding HUIs when the size of th...
Preprint
Full-text available
Economic-wise, a common goal for companies conducting marketing is to maximize the return revenue/profit by utilizing the various effective marketing strategies. Consumer behavior is crucially important in economy and targeted marketing, in which behavioral economics can provide valuable insights to identify the biases and profit from customers. Fi...
Article
Full-text available
Economic-wise, a common goal for companies conducting marketing is to maximize the return revenue/profit by utilizing the various effective marketing strategies. Consumer behavior is crucially important in economy and targeted marketing, in which behavioral economics can provide valuable insights to identify the biases and profit from customers. Fi...
Preprint
Full-text available
Recently, hyperbolic space has risen as a promising alternative for semi-supervised graph representation learning. Many efforts have been made to design hyperbolic versions of neural network operations. However, the inspiring geometric properties of this unique geometry have not been fully explored yet. The potency of graph models powered by the hy...
Book
This book presents an overview of how machine learning and data mining techniques are used for tracking and preventing diseases. It covers several aspects such as stress level identification of a person from his/her speech, automatic diagnosis of disease from X-ray images, intelligent diagnosis of Glaucoma from clinical eye examination data, predic...
Chapter
Full-text available
High utility itemset mining is a well-studied data mining task for analyzing customer transactions. It consists of finding the sets of items purchased together that yield a profit that is greater than a minutil threshold, set by the user. To find more precise patterns with purchase quantities, that task was recently generalized as high utility quan...
Article
Full-text available
For the last decade, social networking websites have boosted interaction among people through the use of digital communication such as chats, comments, discussion boards and exchange of documentation. This lead to mutual learning and sharing of all kind of information. This phenomenon has attracted many researchers and techniques aiming at discover...
Article
In contrast to frequent itemset mining (FIM) algorithms that focus on identifying itemsets with high occurrence frequency, high-utility itemset mining algorithms can reveal the most profitable sets of items in transaction databases. Several algorithms were proposed to perform the task efficiently. Nevertheless, most of them ignore the item categori...
Article
Nonoverlapping sequential pattern mining (SPM) is a type of SPM with gap constraints that can mine valuable information in sequences. One of the disadvantages of nonoverlapping SPM is that any characters can match with gap constraints. Hence, there can be a significant difference between the trend of a pattern and those of its occurrences. To tackl...
Conference Paper
High utility itemset mining is an important model in data mining. It involves discovering all itemsets in a quantitative transactional database that satisfy a user-specified minimum utility (minU til) constraint. M inU til controls the minimum value that an itemset must maintain in a database. Since the model evaluates an itemset's interestingness...
Article
Mining frequent itemsets in traditional databases and quantitative databases (QDBs) has drawn many researchers’ interest. Although many studies have been conducted on this topic, a major limitation of these studies is that they ignore the relationships between items. However, in real-life datasets, items are often related to each other through a ge...
Chapter
Sequential pattern mining is a key data mining task, where the aim is to find subsequences appearing frequently in sequences of items (symbols). To provide more flexibility and reveal more valuable patterns, sequential pattern mining with a periodic gap has emerged as an important extension. Algorithms for this task identify repetitive gapped subse...
Chapter
Full-text available
Discovering periodic patterns in data is an important data analysis task. A periodic pattern is a set of values that regularly appear together over time. Finding such patterns can be useful to understand the data and make predictions. However, most studies on periodic pattern mining have focused on identifying periodic patterns in a single discrete...
Chapter
Full-text available
Periodic itemset mining is the task of finding all the sets of items (events or symbols) that regularly appear in a sequence. One of the most important applications is customer behavior analysis, where a periodic itemset found in a sequence of customer transactions indicates that the customer regularly buys some items together. Using this informati...
Chapter
Pattern mining methods help to extract valuable information from a large dataset. The extraction of knowledge might result in the risk of privacy issues. Some potential information might disclosure the insights about customers’ behaviors. This leads us to the issue of privacy-preserving data mining (PPDM) that hides sensitive information as much as...
Chapter
Full-text available
Skyline frequent-utility itemsets (SFUIs) can provide more actionable information for decision-making with both frequency and utility considered. In this paper, the problem of mining SFUIs by filtering utilities from different perspectives is studied. First, filtering by frequency is considered. The max utility array (MUA) structure is designed, wh...
Article
Full-text available
High utility itemset mining is a popular pattern mining task, which aims at revealing all sets of items that yield a high profit in a transaction database. Although this task is useful to understand customer behavior, an important limitation is that high utility itemsets do not provide information about the purchase quantities of items. Recently, s...
Article
Full-text available
In recent years, HUIM (or a.k.a. high-utility itemset mining) can be seen as investigated in an extensive manner and studied in many applications especially in basket-market analysis and its relevant applications. Since current basket-market scenario also involves IoT equipment to collect information, i.e., sensor or smart devices, it is necessary...
Chapter
Full-text available
Episode rule mining is a popular data analysis task that aims at finding rules describing strong relationships between events (or symbols) in a sequence. Finding episode rules can help understanding the data or making predictions. However, traditional episode rule mining algorithms find rules that require a very strict ordering between events. To l...